(Lecture 2) Types of Data and Graphical Summaries in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Section 2.1 Different Types of Data

Individuals, Samples, and Variables

In statistics, a population consists of individuals or subjects under study. A sample is a subset of the population, and variables are characteristics observed in a study.

Individuals: The subjects or entities being studied (e.g., people in the US Census 2000).
Variables: Characteristics measured or observed for each individual (e.g., state of residence, zip code, family size, annual income).

State	Zipcode	Family_Size	Annual_income
Florida	32716	8	200
Alabama	35236	5	800
Florida	32116	6	13500
Florida	33679	5	21000
Alabama	36374	4	21000
California	94565	1	23000

Types of Variables

Variables can be classified as either categorical (qualitative) or quantitative (numeric).

Categorical Variable: Each observation belongs to one of a set of distinct categories. Examples: Gender (Male/Female), Religious Affiliation (Catholic, Jewish, etc.), Type of Residence (Apartment, Condo, etc.), Belief in Life After Death (Yes/No), Payment Method.
Quantitative Variable: Observations take numerical values representing different magnitudes. Examples: Age, Number of Siblings, Annual Income.

Example: Breakfast Cereals Dataset

Name	Manufacturer	Target	Shelf	Calories	Sodium
100% Bran	Nabisco	adult	top	70	130
100% Natural Bran	Quaker Oats	adult	top	120	15
All-Bran	Kelloggs	adult	top	70	260
All-Bran Extra Fiber	Kelloggs	adult	top	50	140
Almond Delight	Ralston Purina	adult	top	110	200
Apple Cinnamon Cheerios	General Mills	child	bottom	110	125
Apple Jacks	Kelloggs	child	middle	110	125

Categorical Variables: Manufacturer, Target, Shelf
Quantitative Variables: Calories, Sodium

Main Features of Quantitative and Categorical Variables

Quantitative Variables: Key features are the center (typical value) and variability (spread) of the data. Example: Typical annual precipitation and its variation over years.
Categorical Variables: Key feature is the relative number of observations in each category. Example: Percentage of sunny days in a year.

Discrete and Continuous Quantitative Variables

Discrete Variable: Values obtained by counting; possible values are separate numbers (e.g., 0, 1, 2, 3, ...). Examples: Number of pets, number of children, number of foreign languages spoken, number of heads in three coin flips.
Continuous Variable: Values obtained by measuring; possible values form an interval. Examples: Height, weight, time to complete an assignment, travel time, distance traveled.

Distribution of a Variable

Definition and Description

The distribution of a variable describes how the observations fall (are distributed) across the range of possible values. Graphs and frequency tables are used to identify key features of a distribution.

Frequency Table

A frequency table lists possible values for a variable, along with the number of observations (frequency) and/or relative frequencies for each value.

Value	Frequency	Relative Frequency	Percentage
80.0	3	0.20	20.00%
85.0	0	0.00	0.00%
90.0	6	0.40	40.00%
95.0	2	0.13	13.33%
100.0	4	0.27	26.67%
Total	15	1.00	100.00%

Proportion and percentage (relative frequencies) are calculated as:

Proportion:
Percentage:

Section 2.2 Graphical Summaries of Data

Graphs for Categorical Variables

Pie Chart: A circle divided into slices, each representing a category. The size of each slice is proportional to the percentage of observations in that category.
Bar Graph: Displays a vertical (or horizontal) bar for each category. The height (or length) of each bar represents the frequency or percentage for that category.
Pareto Chart: A bar graph where categories are ordered by frequency, from tallest to shortest bar.

Features of Bar Graphs

Bars can be vertical or horizontal.
Bars are of uniform width and spacing.
Lengths represent frequency or relative frequency.
Graph should be well annotated with title, labels, and scale.

Graphs for Quantitative Variables

Dot Plot: Shows a dot for each observation placed above its value on a number line.
Stem-and-Leaf Plot: Portrays individual observations by splitting each value into a 'stem' and a 'leaf'. Useful for small to medium datasets. Always include a key to interpret the plot.
Histogram: Uses bars to portray the data. The range of data is divided into intervals of equal width, and the number of observations in each interval is counted. Bars are drawn over each interval with height equal to frequency or percentage.

Steps for Constructing a Histogram

Divide the range of the data into intervals of equal width.
Count the number of observations in each interval (frequency table).
Label the horizontal axis with interval endpoints.
Draw bars over each interval with height equal to frequency or percentage.
Label and title the graph appropriately.

Stem-and-Leaf Plot Example

Stem	Leaves
1	9
2	3, 3, 5
3	4, 5, 7
4	0, 2, 5, 8, 9

Key: 1 | 9 = 19

Stem-and-leaf plots retain original data values and are similar to histograms in showing distribution shape.

Summary Table: Types of Variables

Type	Description	Examples
Categorical	Qualitative, in categories	Gender, Religion, Residence Type
Quantitative (Discrete)	Numeric, countable values	Number of pets, children, languages
Quantitative (Continuous)	Numeric, measurable values	Height, weight, time, distance

Example Application: In a survey of students' favorite ice cream flavors, a pie chart or bar graph can be used to display the proportion of each flavor chosen.