Back(Lecture 1) Introduction to Statistics: Data, Populations, and Statistical Reasoning
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Section 1.1 Using Data to Answer Statistical Questions
What are Statistics?
Statistics is a branch of applied mathematics focused on the collection, organization, analysis, and interpretation of data. It is widely used in various fields such as market research, consumer goods, political campaigns, sports, weather forecasting, medicine, and biodiversity conservation.
Statistics involves methods for gathering and analyzing data to answer investigative questions.
Data refers to information collected through experiments or surveys.
Example: In a study evaluating a low-carbohydrate diet, data might include participants' weights before and after the study, daily calorie intake, carbohydrate intake, body mass index, and demographic information such as gender.
The Statistical Investigative Process
The process of statistical investigation is systematic and involves several key steps to ensure reliable conclusions.
Formulate a statistical question: Define the question to be answered using data.
Collect data: Gather relevant data through experiments, surveys, or observational studies.
Analyze data: Use statistical methods to summarize and explore the data.
Interpret and communicate results: Draw conclusions and present findings in a clear manner.
Examples of statistical questions:
Would you be willing to pay higher prices to protect the environment?
Can we predict the winner of an election using an exit poll?
How effective is a new drug for treating depression in primary care patients?
Did a TV advertisement increase the sales of a new coffee product?
Main Components of Statistics
Statistics for answering a statistical question can be divided into three main components:
Design: Planning how to obtain data that will address the question of interest.
Description: Summarizing and analyzing the collected data (e.g., using averages, charts, or graphs).
Inference: Making decisions or predictions about a population based on data from a sample.
Example: In an exit poll of 3,889 voters, 53.1% said they voted for Jerry Brown. Summarizing this percentage is description; predicting the outcome for all 9.5 million voters is inference.
Probability
Probability provides a framework for quantifying the likelihood of various possible outcomes.
Definition: Probability is the measure of how likely an event is to occur.
Example: If Jerry Brown were actually going to lose the election, what is the chance that an exit poll of 3,889 voters would show support by 53.1%?
Section 1.2 Sample Versus Population
We Observe Samples but are Interested in Populations
In statistics, we often collect data from a subset of a larger group, but our interest is usually in the larger group as a whole.
Subjects: The entities measured in a study (e.g., individuals, schools, countries).
Population and Sample
Understanding the distinction between a population and a sample is fundamental in statistics.
Population: The total set of subjects in which we are interested.
Sample: A subset of the population from whom we actually collect data.
Example: In a California gubernatorial election exit poll, the population is all 9.5 million voters, while the sample is the 3,889 voters who were interviewed.
Descriptive Statistics and Inferential Statistics
Statistics can be divided into two broad categories:
Descriptive Statistics: Methods for summarizing collected data using graphs, averages, and percentages.
Inferential Statistics: Methods for making decisions or predictions about a population based on sample data.
Example: If 48% of a sample of New York residents support a plastic bag ban, inferential statistics allow us to estimate the percentage of all residents who support the ban, often with a margin of error.
Sample Statistics and Population Parameters
It is important to distinguish between statistics and parameters:
Parameter: A numerical summary of a population (e.g., the true average graduation rate).
Statistic: A numerical summary of a sample (e.g., the average graduation rate in a sample of colleges).
Examples of parameters:
Average graduation rate of all colleges in a given year.
Percentage of all opioid overdose deaths involving prescription opioids.
Median household income for all households in a country.
Randomness and Variability
Random sampling is essential for making valid inferences about populations. Variability refers to how much observations differ from one another.
Within-sample variability: Differences among individuals within a single sample.
Between-sample variability: Differences in results from one sample to another.
Estimation from Surveys with Random Sampling
Surveys often use random samples to estimate population parameters, such as the percentage of people favoring a policy.
Margin of Error: Indicates the expected variability from one random sample to the next. For a sample proportion, the margin of error is often calculated as:
where is the sample size.
Example: A margin of error of ±3% means the true population percentage is likely within 3% of the sample percentage.
Statistical Significance
Statistical significance indicates that an observed difference between groups is unlikely to have occurred by chance alone.
Significance: The observed difference is larger than what would be expected from ordinary random sample-to-sample variability.
Interpretation: Statisticians use statistical tests to determine if results are significant.
Interpretation of Statistics
Interpreting statistics requires both mathematical calculations and critical thinking about how data are collected and what the numbers mean.
Numerical results may be correct, but their interpretation can be flawed if the context or methodology is not considered.
Example: A 30% increase in ice cream sales after a new advertisement does not necessarily prove the advertisement was effective; other factors may have contributed.
Take-home Messages
Statistics is more than just numbers; it encompasses a range of techniques for analyzing and interpreting data.
Statistical claims should be questioned and critically evaluated, considering the data sources and methods used.
Proper use of statistics is essential for making informed decisions, while misuse can lead to deceptive conclusions.