BackFoundations of Statistics: Data Types, Sampling, and Organizing Variables
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Data and Variables
Data and Subjects
Statistics is the science of collecting, analyzing, and interpreting data. Understanding the types of data and variables is fundamental to statistical analysis.
Data: Information gathered through experiments and studies.
Subjects: The entities or things that are measured in a study.
Types of Variables
Variables are characteristics or properties that can take on different values among subjects in a study. They are classified as either categorical or numerical (quantitative).
Categorical Variables: Variables whose data represent categories or groups.
Nominal Scale: Data measured on a scale where category values express no order or ranking. Example: Eye color, blood type, car brands.
Ordinal Scale: Data measured on a scale where ordering or ranking is implied. Example: Letter grades, education level, social class.
Numerical (Quantitative) Variables: Variables whose data represent a counted or measured quantity.
Discrete Numerical Values: Data that come from a counting process; only whole numbers are possible. Example: Number of phones sold at a store, number of pets in a house, number of books in a library.
Continuous Numerical Values: Data that come from a measuring process; can take any value within a range. Example: Height, weight, temperature.
Sample vs. Population
Definitions and Differences
In statistics, data can be collected from either a sample or a population. Understanding the distinction is crucial for proper data analysis.
Population: Contains all of the things that you want to study. Example: All USC students, all houses in a neighborhood.
Sample: Contains only a portion of a population of interest. Samples are analyzed to estimate characteristics of the entire population. Example: 100 students selected from all USC students, 10 trees selected from a whole forest.
Statistic: A number that describes a sample. Example: The average height of 100 students is 170 cm (statistic).
Parameter: A number that describes a population. Example: The average height of all USC students is 178 cm (parameter).
Biased Sample Designs
Convenience Sample: Items selected because they are easily available. This often leads to biased results. Example: A teacher surveys students in her class instead of the whole school.
Sampling Methods
Simple Random Sample
Every item in a sample size n has the same probability of being selected from the population.
Sample size (n): Number of items in the sample.
Population size (N): Total number of items in the population.
Example: A teacher puts all 30 students' names in a hat, mixes them, and pulls out 6 names. Each student has the same chance of being picked.
Systematic Random Sample
Involves picking every kth item from the population after a random start.
Example: If a school has 1000 students and you want a sample of 100, calculate k = 1000/100 = 10. Pick a random number between 1 and 10 (say, 4), then select students at positions 4, 14, 24, ... until you have 100 students.
Note: More efficient than simple random sampling in some cases.
Stratified Random Sample
Divides the population into groups (strata) based on shared characteristics, then randomly samples from each group.
Example: Randomly sample students by grade level, then randomly pick from each grade.
Clustered Random Sample
Divides the population into groups (clusters), then randomly selects some entire clusters and includes all members of those clusters in the sample.
Example: A school has 10 classrooms (clusters). You randomly pick 3 classrooms and survey all students in those classrooms.
Organizing Categorical Variables
Summary Table (Frequency Table)
A frequency table shows each category or value and how many times it appears in the data.
Example: How many times each color appears on screen.
Color | Frequency | Percentage |
|---|---|---|
Red | 2 | 20% |
Blue | 3 | 30% |
Green | 5 | 50% |
Total | 10 | 100% |
Contingency Table
A contingency table compares two or more categories at the same time, allowing you to study patterns that might appear between the variables. Counts can be written as frequency, percentage of the overall total, percentage of the row total, or percentage of the column total.
Example: How many boys and girls prefer cats or dogs.
Cats | Dogs | Total | |
|---|---|---|---|
Boys | 5 | 9 | 14 |
Girls | 4 | 2 | 6 |
Total | 9 | 11 | 20 |
Conditional means: The mean or percentage within a specific group or condition.