Skip to main content
Back

(Lecture 2) Types of Data and Graphical Summaries in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Section 2.1 Different Types of Data

Individuals, Samples, and Variables

In statistics, a population consists of individuals or subjects under study. A sample is a subset of the population, and variables are characteristics observed in a study.

  • Individuals: The subjects or entities being studied (e.g., people in the US Census 2000).

  • Variables: Characteristics measured or observed for each individual (e.g., state of residence, zip code, family size, annual income).

State

Zipcode

Family_Size

Annual_income

Florida

32716

8

200

Alabama

35236

5

800

Florida

32116

6

13500

Florida

33679

5

21000

Alabama

36374

4

21000

California

94565

1

23000

Types of Variables

Variables can be classified as either categorical (qualitative) or quantitative (numeric).

  • Categorical Variable: Each observation belongs to one of a set of distinct categories. Examples: Gender (Male/Female), Religious Affiliation (Catholic, Jewish, etc.), Type of Residence (Apartment, Condo, etc.), Belief in Life After Death (Yes/No), Payment Method.

  • Quantitative Variable: Observations take numerical values representing different magnitudes. Examples: Age, Number of Siblings, Annual Income.

Example: Breakfast Cereals Dataset

Name

Manufacturer

Target

Shelf

Calories

Sodium

100% Bran

Nabisco

adult

top

70

130

100% Natural Bran

Quaker Oats

adult

top

120

15

All-Bran

Kelloggs

adult

top

70

260

All-Bran Extra Fiber

Kelloggs

adult

top

50

140

Almond Delight

Ralston Purina

adult

top

110

200

Apple Cinnamon Cheerios

General Mills

child

bottom

110

125

Apple Jacks

Kelloggs

child

middle

110

125

  • Categorical Variables: Manufacturer, Target, Shelf

  • Quantitative Variables: Calories, Sodium

Main Features of Quantitative and Categorical Variables

  • Quantitative Variables: Key features are the center (typical value) and variability (spread) of the data. Example: Typical annual precipitation and its variation over years.

  • Categorical Variables: Key feature is the relative number of observations in each category. Example: Percentage of sunny days in a year.

Discrete and Continuous Quantitative Variables

  • Discrete Variable: Values obtained by counting; possible values are separate numbers (e.g., 0, 1, 2, 3, ...). Examples: Number of pets, number of children, number of foreign languages spoken, number of heads in three coin flips.

  • Continuous Variable: Values obtained by measuring; possible values form an interval. Examples: Height, weight, time to complete an assignment, travel time, distance traveled.

Distribution of a Variable

Definition and Description

The distribution of a variable describes how the observations fall (are distributed) across the range of possible values. Graphs and frequency tables are used to identify key features of a distribution.

Frequency Table

A frequency table lists possible values for a variable, along with the number of observations (frequency) and/or relative frequencies for each value.

Value

Frequency

Relative Frequency

Percentage

80.0

3

0.20

20.00%

85.0

0

0.00

0.00%

90.0

6

0.40

40.00%

95.0

2

0.13

13.33%

100.0

4

0.27

26.67%

Total

15

1.00

100.00%

Proportion and percentage (relative frequencies) are calculated as:

  • Proportion:

  • Percentage:

Section 2.2 Graphical Summaries of Data

Graphs for Categorical Variables

  • Pie Chart: A circle divided into slices, each representing a category. The size of each slice is proportional to the percentage of observations in that category.

  • Bar Graph: Displays a vertical (or horizontal) bar for each category. The height (or length) of each bar represents the frequency or percentage for that category.

  • Pareto Chart: A bar graph where categories are ordered by frequency, from tallest to shortest bar.

Features of Bar Graphs

  • Bars can be vertical or horizontal.

  • Bars are of uniform width and spacing.

  • Lengths represent frequency or relative frequency.

  • Graph should be well annotated with title, labels, and scale.

Graphs for Quantitative Variables

  • Dot Plot: Shows a dot for each observation placed above its value on a number line.

  • Stem-and-Leaf Plot: Portrays individual observations by splitting each value into a 'stem' and a 'leaf'. Useful for small to medium datasets. Always include a key to interpret the plot.

  • Histogram: Uses bars to portray the data. The range of data is divided into intervals of equal width, and the number of observations in each interval is counted. Bars are drawn over each interval with height equal to frequency or percentage.

Steps for Constructing a Histogram

  1. Divide the range of the data into intervals of equal width.

  2. Count the number of observations in each interval (frequency table).

  3. Label the horizontal axis with interval endpoints.

  4. Draw bars over each interval with height equal to frequency or percentage.

  5. Label and title the graph appropriately.

Stem-and-Leaf Plot Example

Stem

Leaves

1

9

2

3, 3, 5

3

4, 5, 7

4

0, 2, 5, 8, 9

Key: 1 | 9 = 19

Stem-and-leaf plots retain original data values and are similar to histograms in showing distribution shape.

Summary Table: Types of Variables

Type

Description

Examples

Categorical

Qualitative, in categories

Gender, Religion, Residence Type

Quantitative (Discrete)

Numeric, countable values

Number of pets, children, languages

Quantitative (Continuous)

Numeric, measurable values

Height, weight, time, distance

Example Application: In a survey of students' favorite ice cream flavors, a pie chart or bar graph can be used to display the proportion of each flavor chosen.

Pearson Logo

Study Prep