Back(Lecture 2) Types of Data and Graphical Summaries in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Section 2.1 Different Types of Data
Individuals, Samples, and Variables
In statistics, a population consists of individuals or subjects under study. A sample is a subset of the population, and variables are characteristics observed in a study.
Individuals: The subjects or entities being studied (e.g., people in the US Census 2000).
Variables: Characteristics measured or observed for each individual (e.g., state of residence, zip code, family size, annual income).
State | Zipcode | Family_Size | Annual_income |
|---|---|---|---|
Florida | 32716 | 8 | 200 |
Alabama | 35236 | 5 | 800 |
Florida | 32116 | 6 | 13500 |
Florida | 33679 | 5 | 21000 |
Alabama | 36374 | 4 | 21000 |
California | 94565 | 1 | 23000 |
Types of Variables
Variables can be classified as either categorical (qualitative) or quantitative (numeric).
Categorical Variable: Each observation belongs to one of a set of distinct categories. Examples: Gender (Male/Female), Religious Affiliation (Catholic, Jewish, etc.), Type of Residence (Apartment, Condo, etc.), Belief in Life After Death (Yes/No), Payment Method.
Quantitative Variable: Observations take numerical values representing different magnitudes. Examples: Age, Number of Siblings, Annual Income.
Example: Breakfast Cereals Dataset
Name | Manufacturer | Target | Shelf | Calories | Sodium |
|---|---|---|---|---|---|
100% Bran | Nabisco | adult | top | 70 | 130 |
100% Natural Bran | Quaker Oats | adult | top | 120 | 15 |
All-Bran | Kelloggs | adult | top | 70 | 260 |
All-Bran Extra Fiber | Kelloggs | adult | top | 50 | 140 |
Almond Delight | Ralston Purina | adult | top | 110 | 200 |
Apple Cinnamon Cheerios | General Mills | child | bottom | 110 | 125 |
Apple Jacks | Kelloggs | child | middle | 110 | 125 |
Categorical Variables: Manufacturer, Target, Shelf
Quantitative Variables: Calories, Sodium
Main Features of Quantitative and Categorical Variables
Quantitative Variables: Key features are the center (typical value) and variability (spread) of the data. Example: Typical annual precipitation and its variation over years.
Categorical Variables: Key feature is the relative number of observations in each category. Example: Percentage of sunny days in a year.
Discrete and Continuous Quantitative Variables
Discrete Variable: Values obtained by counting; possible values are separate numbers (e.g., 0, 1, 2, 3, ...). Examples: Number of pets, number of children, number of foreign languages spoken, number of heads in three coin flips.
Continuous Variable: Values obtained by measuring; possible values form an interval. Examples: Height, weight, time to complete an assignment, travel time, distance traveled.
Distribution of a Variable
Definition and Description
The distribution of a variable describes how the observations fall (are distributed) across the range of possible values. Graphs and frequency tables are used to identify key features of a distribution.
Frequency Table
A frequency table lists possible values for a variable, along with the number of observations (frequency) and/or relative frequencies for each value.
Value | Frequency | Relative Frequency | Percentage |
|---|---|---|---|
80.0 | 3 | 0.20 | 20.00% |
85.0 | 0 | 0.00 | 0.00% |
90.0 | 6 | 0.40 | 40.00% |
95.0 | 2 | 0.13 | 13.33% |
100.0 | 4 | 0.27 | 26.67% |
Total | 15 | 1.00 | 100.00% |
Proportion and percentage (relative frequencies) are calculated as:
Proportion:
Percentage:
Section 2.2 Graphical Summaries of Data
Graphs for Categorical Variables
Pie Chart: A circle divided into slices, each representing a category. The size of each slice is proportional to the percentage of observations in that category.
Bar Graph: Displays a vertical (or horizontal) bar for each category. The height (or length) of each bar represents the frequency or percentage for that category.
Pareto Chart: A bar graph where categories are ordered by frequency, from tallest to shortest bar.
Features of Bar Graphs
Bars can be vertical or horizontal.
Bars are of uniform width and spacing.
Lengths represent frequency or relative frequency.
Graph should be well annotated with title, labels, and scale.
Graphs for Quantitative Variables
Dot Plot: Shows a dot for each observation placed above its value on a number line.
Stem-and-Leaf Plot: Portrays individual observations by splitting each value into a 'stem' and a 'leaf'. Useful for small to medium datasets. Always include a key to interpret the plot.
Histogram: Uses bars to portray the data. The range of data is divided into intervals of equal width, and the number of observations in each interval is counted. Bars are drawn over each interval with height equal to frequency or percentage.
Steps for Constructing a Histogram
Divide the range of the data into intervals of equal width.
Count the number of observations in each interval (frequency table).
Label the horizontal axis with interval endpoints.
Draw bars over each interval with height equal to frequency or percentage.
Label and title the graph appropriately.
Stem-and-Leaf Plot Example
Stem | Leaves |
|---|---|
1 | 9 |
2 | 3, 3, 5 |
3 | 4, 5, 7 |
4 | 0, 2, 5, 8, 9 |
Key: 1 | 9 = 19
Stem-and-leaf plots retain original data values and are similar to histograms in showing distribution shape.
Summary Table: Types of Variables
Type | Description | Examples |
|---|---|---|
Categorical | Qualitative, in categories | Gender, Religion, Residence Type |
Quantitative (Discrete) | Numeric, countable values | Number of pets, children, languages |
Quantitative (Continuous) | Numeric, measurable values | Height, weight, time, distance |
Example Application: In a survey of students' favorite ice cream flavors, a pie chart or bar graph can be used to display the proportion of each flavor chosen.