Skip to main content
Back

Foundations of Statistics: Data, Sampling, and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistics is the science of planning studies and experiments, obtaining data, and organizing, summarizing, presenting, analyzing, and interpreting those data to draw conclusions. Understanding the foundational concepts is essential for effective data analysis and interpretation.

Key Definitions

  • Data: Collections of observations, such as measurements, genders, or survey responses.

  • Population: The complete collection of all measurements or data that are being considered. It is the group about which we want to draw conclusions.

  • Sample: A subcollection of members selected from a population.

  • Census: The collection of data from every member of the population.

  • Statistic: A numerical measurement describing some characteristic of a sample.

  • Parameter: A numerical measurement describing some characteristic of a population.

Types of Data

Data can be classified based on their nature and the way they are measured.

Quantitative vs. Qualitative Data

  • Quantitative (Numerical) Data: Numbers representing counts or measurements. Example: heights, weights, ages.

  • Qualitative (Categorical) Data: Names or labels that represent categories. Example: gender, eye color.

Discrete vs. Continuous Data

  • Discrete Data: Quantitative data values that are countable (finite or countably infinite). Example: number of students in a class.

  • Continuous Data: Quantitative data values that can take on infinitely many values within a given range. Example: lengths, weights, time.

Levels of Measurement

Data can be measured at different levels, each with specific properties:

  • Nominal: Categories only; cannot be arranged in order. Example: eye color.

  • Ordinal: Categories with a meaningful order, but differences between values are not meaningful. Example: course grades (A, B, C).

  • Interval: Data can be ordered, and differences are meaningful, but there is no natural zero starting point. Example: temperature in Celsius.

  • Ratio: Data can be ordered, differences are meaningful, and there is a natural zero. Ratios are meaningful. Example: heights, weights.

Sampling Methods

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

Types of Samples

  • Simple Random Sample: Every possible sample of a given size has the same chance of being selected.

  • Systematic Sample: Select every k-th member from a list after a random start.

  • Convenience Sample: Use data that are easy to obtain; may introduce bias.

  • Stratified Sample: Divide the population into subgroups (strata) and randomly sample from each stratum.

  • Cluster Sample: Divide the population into clusters, randomly select some clusters, and use all members from those clusters.

  • Voluntary Response Sample: Individuals choose to participate; often leads to bias.

Potential Issues in Sampling

  • Bias: Systematic error introduced by the sampling method.

  • Loaded Questions: Questions worded to elicit a specific response.

  • Nonresponse: When individuals selected for the sample do not respond.

  • Self-Selection: When individuals decide themselves whether to participate, often leading to bias.

Experimental Design and Observational Studies

Understanding the difference between experiments and observational studies is crucial for interpreting results.

Types of Studies

  • Experiment: A treatment is applied, and the effects are observed. Subjects are called experimental units.

  • Observational Study: Observes and measures characteristics without influencing them.

Key Concepts in Experimental Design

  • Lurking Variable: A variable not included in the study that could affect the results.

  • Replication: Repetition of an experiment on more than one individual to ensure reliability.

  • Blinding: Single-blind: subjects do not know if they receive treatment or placebo. Double-blind: neither subjects nor researchers know.

  • Placebo Effect: Improvement due to the belief in treatment, not the treatment itself.

  • Randomization: Assigning subjects to groups by chance to reduce bias.

Misleading Conclusions and Statistical Significance

It is important to distinguish between correlation and causation, and between statistical and practical significance.

  • Correlation does not imply causation: Just because two variables are associated does not mean one causes the other.

  • Statistical Significance: The result is unlikely to occur by chance.

  • Practical Significance: The result is large enough to be meaningful in real life.

Examples and Applications

  • Sample vs. Population: In a survey of 1046 adults, the sample is the 1046 adults surveyed; the population is all adults who use a public restroom.

  • Statistical vs. Practical Significance: An IQ program increases scores by 3 points with a 25% chance; this may be statistically significant but not practically significant.

  • Loaded Questions: "Should people have the right to carry guns to defend themselves and their families?" vs. "Should people have the right to carry guns that have the potential to hurt others?" The wording can influence responses.

Summary Table: Types of Data and Levels of Measurement

Type

Description

Example

Quantitative (Discrete)

Countable numerical values

Number of students in a class

Quantitative (Continuous)

Infinitely many possible values

Height, weight, time

Qualitative (Categorical)

Names or labels

Gender, eye color

Nominal

Categories only, no order

Eye color

Ordinal

Categories with order, differences not meaningful

Course grades

Interval

Order and differences meaningful, no true zero

Temperature (Celsius)

Ratio

Order, differences, and ratios meaningful, true zero

Height, weight

Key Formulas

  • Percentage Calculation:

  • Sample Mean:

Conclusion

Understanding the basic concepts of data, sampling, and experimental design is fundamental to the study of statistics. Careful attention to definitions, types of data, and proper sampling methods ensures the validity and reliability of statistical conclusions.

Pearson Logo

Study Prep