BackChapter 1: Data Collection in Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Chapter 1: Data Collection
Statistics: An Overview
Statistics is the science of collecting, analyzing, organizing, and summarizing information to draw conclusions or answer questions. It provides a framework for making informed decisions based on data.
Anecdotal claims can be refuted with statistical analysis.
Poorly collected data are not useful.
Be aware of lurking variables in a study.
Results in statistics are not always certain.
Important Statistical Terminology
Population: The entire group to be studied.
Sample: A subset of the population that is being studied.
Individual: A person or object that is a member of the population being studied.
Variable: The characteristics of the individuals within the population.
Data: The list of observed values for a variable.
Statistic: A numerical summary of a sample.
Parameter: A numerical summary of a population.
Types of Statistics
Descriptive statistics: Organizing and summarizing data.
Inferential statistics: Extends the result from a sample to the population and measures the reliability of the result.
The Process of Statistics
Identify the research objective.
Collect the data.
Describe the data.
Perform inference (draw conclusions).
Examples: Populations and Samples
Example 1: An automobile manufacturer studies people who have bought hybrid vehicles.
Individuals: People who bought hybrids.
Variables: Income, age, lifestyle, first-time buyers, etc.
Example 2: A statistics class surveys 50 colleges and universities about student fees.
Population: All colleges and universities in the country.
Sample: 50 colleges and universities in the country.
Types of Variables
Qualitative (Categorical) Variables: Place individuals into groups or categories.
Quantitative Variables: Have a finite or continuous measurement.
Types of Quantitative Variables
Discrete: The variable has either a finite or countable number of possible values.
Continuous: The variable has an uncountable number of possible values that are not countable.
Levels of Measurement
Nominal Level
Data can be put into categories ("in name only").
Data cannot be ordered.
Examples: The color of vehicles on a dealer's lot.
Ordinal Level
Data can be ordered.
Comparisons are only relative.
Examples: Class rank (freshman, sophomore, junior, senior).
Interval Level
Data can be ordered.
Meaningful differences can be computed.
A value of zero does not mean the absence of the quantity.
Examples: Air temperature (in degrees Celsius or Fahrenheit), calendar dates.
Ratio Level
Data can be ordered.
Meaningful differences and ratios can be computed.
A value of zero means the absence of the quantity.
Examples: A person's income, Kelvin temperature.
Note: To determine the level of measurement of data, state the variable type and the types of variable calculations that are appropriate.
Observational Studies vs. Designed Experiments
Observational Study: The researcher collects data from existing sources or survey data to make an overall claim.
Designed Experiment: The researcher builds an experiment to answer a specific research question.
Sampling Methods
Random Sampling: The process of using chance to select individuals from a population to be included in the sample.
Simple Random Sample (SRS)
A sample of size from a population of size where every possible sample of size has an equally likely chance of occurring.
Every individual in the population is equally likely to be picked for the sample.
Example: SRS of 2 from 5 students (Bob, Patricia, Mike, Ian, Dana)
Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | Sample 6 | Sample 7 | Sample 8 | Sample 9 | Sample 10 |
|---|---|---|---|---|---|---|---|---|---|
B, P | B, M | B, I | B, D | P, M | P, I | P, D | M, I | M, D | I, D |
Obtaining a Simple Random Sample (SRS)
Obtain a frame – a list of all the individuals in the population.
Number the individuals in the frame from 1 through N.
Use a random number generator or table to select a sample of size n.
Simulating SRS with Discrete Uniform Distribution
Rows: Use a slightly larger number than the needed sample size ().
Columns: 1
Minimum: 1
Maximum: Number of individuals in the population ()
Seeding: Use a fixed seed for reproducibility.
Other Sampling Methods
Stratified Sampling: The population is divided into non-overlapping groups called strata. A random sample is drawn from each stratum, often in proportion to the actual percentages in the population.
Systematic Sampling: Every th individual from the population is included in the sample. The first individual selected corresponds to a random number between 1 and .
Cluster Sampling: The population is divided into pre-existing (non-overlapping) segments or clusters, often geographically. Clusters are selected randomly, and every member of the cluster is in the sample.
Convenience Sampling: Uses individuals that are easily obtained. This type of sample is most likely severely biased and not based on randomness.
Sampling Method Examples
Systematic: Selecting every 60th student from a list after a random start.
Cluster: Selecting all residents from randomly chosen nursing homes.
Stratified: Selecting random samples from each of three groups (e.g., children, adolescents, adults).
Bias in Sampling
Sampling Bias: Occurs when the sampling method favors one part of the population over another. Common in convenience samples. Results from under-coverage if a segment of the population is underrepresented.
Nonresponse Bias: Happens when individuals selected do not respond, and their opinions differ from those who do respond. Low response rates can misrepresent the population.
Response Bias: When survey responses do not reflect the respondents' true feelings. Can result from interviewer error, wording of questions, or types of questions.
Types of Error
Nonsampling Error: Includes undercoverage, nonresponse bias, response bias, and data-entry error.
Sampling Error: Caused by using a sample to estimate a population value and is always present when using samples.
Additional info: The above notes expand on brief points and fill in missing context for clarity and completeness, as is standard in introductory statistics textbooks.