Skip to main content
Back

Fundamental Concepts and Data Representation in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Variables and Data Types

Qualitative vs. Quantitative Variables

In statistics, variables are characteristics or properties that can take on different values. They are classified as either qualitative (categorical) or quantitative (numerical). Quantitative variables can be further divided into discrete and continuous types.

  • Qualitative Variables: Describe qualities or categories (e.g., color, type, major).

  • Quantitative Variables: Represent numerical values and can be measured or counted.

    • Discrete: Countable values (e.g., number of pets).

    • Continuous: Measurable values within a range (e.g., weight).

Example:

  • The number of pet dogs sold in a pet shop: Quantitative, Discrete

  • The math score of left-handed students: Quantitative, Discrete or Continuous (depending on scoring system)

  • The major of students in a university: Qualitative

  • The weight of a statistics textbook: Quantitative, Continuous

Types of Data: Cross-sectional vs. Time Series

Classification of Data by Time Dimension

Data can be classified based on how and when it is collected:

  • Cross-sectional Data: Collected at a single point in time across several subjects or locations.

  • Time Series Data: Collected over several time periods for the same subject or location.

Example:

  • Number of crimes in 5 cities for the year 2015: Cross-sectional

  • Monthly sales for a car company in UAE for 2013: Time series

Population vs. Sample

Understanding Populations and Samples

In statistics, a population is the entire group of interest, while a sample is a subset of the population used for analysis.

  • Population: All members of a defined group.

  • Sample: A selection from the population, used to make inferences about the whole.

Example:

  • The blood pressure of all students in a class: Population

  • The credit outstanding for 50 selected customers: Sample

  • The mark scored by all 30 students of a class: Population

  • The time spent in studying math by 15 students from a large university: Sample

Frequency Distributions and Graphical Representation

Constructing Frequency Distributions

A frequency distribution organizes data into classes or intervals, showing the number of observations in each class. This helps in understanding the distribution and patterns in the data.

  • Class Width: The difference between the upper and lower boundaries of a class.

  • Class Boundaries: The actual limits of the class intervals.

  • Class Midpoint (C.MP): The average of the upper and lower class boundaries.

Example Table:

Distance

Fr.

Re. Fr.

% Fr.

CL. Bou.

C.MP

0 - 2

5

0.19

19%

0 - 2.5

1

3 - 5

11

0.42

42%

2.5 - 5.5

4

6 - 8

5

0.19

19%

5.5 - 8.5

7

9 - 11

3

0.12

12%

8.5 - 11.5

10

12 - 14

2

0.08

8%

11.5 - 14.5

13

sum

26

1

100

Frequency Polygon: A line graph that shows the frequencies of each class midpoint.

Histogram: A bar graph representing the frequency distribution of a dataset.

Stem-and-Leaf Plots

Visualizing Data with Stem-and-Leaf Plots

A stem-and-leaf plot is a method of displaying quantitative data to show its shape and distribution. Each data value is split into a "stem" (all but the final digit) and a "leaf" (the final digit).

  • Helps in identifying the distribution, central tendency, and spread of the data.

  • Retains the original data values.

Example: For the data: 11, 15, 11, 83, 68, 79, 69, 78, 77, 88, 68, 63, 88, 78, 84, 10, 84, 77, 64, 70, 88, 82, 10, 80, 11, 66

  • Stems: 1, 2, 3, ..., 8

  • Leaves: List of final digits for each stem

Pie Charts

Representing Categorical Data with Pie Charts

A pie chart is a circular graph divided into sectors, each representing a category's proportion of the total.

  • Useful for visualizing the relative frequencies of categories.

  • Each sector's angle is proportional to the category's frequency.

Example Table:

Grades

Number of students

Re. FR.

% FR

A

8

0.15

15%

B

17

0.31

31%

C

21

0.39

39%

D

8

0.15

15%

Pareto Charts and Frequency Distributions for Categorical Data

Analyzing Categorical Data

A Pareto chart is a bar graph where categories are ordered by frequency, often used to highlight the most significant factors in a dataset.

  • Helps identify the most common categories.

  • Often used in quality control and business analytics.

Example Table:

Color

Frequency

White

6

Green

3

Silver

2

Grey

4

Dot Plots

Simple Visualization of Discrete Data

A dot plot is a simple graphical display of data using dots, where each dot represents one observation. It is especially useful for small datasets and for visualizing the distribution of discrete variables.

  • Each value is represented by a dot above its position on a number line.

  • Useful for identifying clusters, gaps, and outliers.

Example: Number of passengers in vehicles at a traffic light: 1, 2, 4, 5, 2, 1, 0, 1, 2, 3, 4, 2, 3, 4, 1, 1, 0, 1, 2, 2, 0, 0, 1, 2, 3, 4

Key Formulas and Concepts

  • Relative Frequency:

  • Percentage Frequency:

  • Class Midpoint:

  • Angle for Pie Chart Sector:

Summary Table: Types of Data and Graphical Representation

Type of Data

Graphical Representation

Example

Quantitative (Discrete)

Dot plot, Stem-and-leaf, Histogram

Number of pets

Quantitative (Continuous)

Histogram, Frequency polygon

Weight, Distance

Qualitative (Categorical)

Pareto chart, Pie chart, Bar chart

Car color, Student major

Additional info: Some explanations and context have been expanded for clarity and completeness, as the original material was in question format.

Pearson Logo

Study Prep