Skip to main content
Back

Introduction to Statistics: Data, Sampling, and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistics is the science of collecting, analyzing, interpreting, and presenting data. It is essential for making informed decisions in various fields such as business, health, education, and social sciences. Understanding statistics enables individuals to critically evaluate information and make data-driven decisions.

Definitions of Statistics, Probability, and Key Terms

Statistics and Probability

  • Statistics involves methods for collecting, organizing, summarizing, and interpreting data.

  • Probability is the mathematical study of randomness and uncertainty, used to predict the likelihood of events.

Statistics is divided into two main branches:

  • Descriptive statistics: Methods for summarizing data using graphs and numerical measures (e.g., mean, median).

  • Inferential statistics: Methods for making predictions or inferences about a population based on sample data, often using probability theory.

Key Terms

  • Population: The entire group of individuals or items under study.

  • Sample: A subset of the population selected for analysis.

  • Parameter: A numerical characteristic of a population (e.g., population mean μ).

  • Statistic: A numerical characteristic calculated from a sample (e.g., sample mean \( \bar{x} \)).

  • Variable: A characteristic or measurement that can vary among individuals in a population (e.g., height, income).

  • Data: The observed values of a variable.

Types of Data

Qualitative (Categorical) Data

Qualitative data describe attributes or categories. Examples include eye color, type of car, or student classification.

  • Cannot be meaningfully averaged or ordered (unless ordinal).

Quantitative Data

Quantitative data are numerical and can be measured or counted.

  • Discrete data: Countable values (e.g., number of books, number of students).

  • Continuous data: Measurable values that can take any value within a range (e.g., weight, height, time).

Examples of Data Types

  • Number of books: Quantitative discrete

  • Weight of a backpack: Quantitative continuous

  • Color of a backpack: Qualitative

Describing Data with Tables and Graphs

Frequency Tables

Frequency tables organize data values and their counts. They can also include relative frequency (proportion of total) and cumulative relative frequency (running total of proportions).

Graphs for Qualitative Data

  • Pie charts: Show proportions of categories as slices of a circle.

  • Bar graphs: Use bars to represent the frequency or proportion of categories.

  • Pareto charts: Bar graphs with categories ordered from largest to smallest.

Pie chart showing classification of statistics students

Example: The pie chart above displays the classification of statistics students (first-year, sophomore, junior, senior), which is qualitative (categorical) data.

Graphs for Quantitative Data

  • Histograms: Show the distribution of quantitative data by grouping values into intervals (bins).

Histogram of number of credit hours completed per student

Example: The histogram above shows the number of credit hours completed per student, which is quantitative discrete data.

Comparing Groups with Graphs

  • Pie charts and bar graphs can be used to compare proportions between groups or institutions.

Pie charts comparing part-time and full-time students at two collegesBar graph comparing full-time and part-time students at two colleges

Example: The pie charts and bar graph above compare the proportions and counts of part-time and full-time students at De Anza College and Foothill College.

Special Cases in Graphs

  • Percentages may add to more than 100% if individuals can belong to multiple categories.

  • Missing categories (e.g., "Other/Unknown") should be included for accurate representation.

Bar graph showing percentages for multiple student categoriesBar graph of student ethnicity without Other/Unknown categoryBar graph of student ethnicity with Other/Unknown categoryPareto chart of student ethnicityPie charts of student ethnicity with and without sorting

Example: The bar graphs and pie charts above illustrate the importance of including all categories and sorting for clarity when displaying categorical data.

Sampling Methods

Random Sampling Techniques

  • Simple random sampling: Every member of the population has an equal chance of being selected.

  • Stratified sampling: Population is divided into strata (groups), and a random sample is taken from each stratum.

  • Cluster sampling: Population is divided into clusters, some clusters are randomly selected, and all members of selected clusters are included.

  • Systematic sampling: Every nth member is selected from a list after a random starting point.

  • Convenience sampling: Sample is taken from individuals who are easiest to reach (often biased).

Sampling Errors and Bias

  • Sampling error: Natural variation due to using a sample instead of the whole population.

  • Sampling bias: Systematic error due to non-random sampling or underrepresentation of certain groups.

  • Nonsampling error: Errors not related to the act of sampling, such as data entry mistakes or faulty measurement tools.

Levels of Measurement

Data can be classified into four levels of measurement, which determine the types of statistical analyses that are appropriate:

  • Nominal: Categories with no inherent order (e.g., colors, brands).

  • Ordinal: Categories with a meaningful order but no measurable difference between values (e.g., rankings).

  • Interval: Ordered values with measurable differences, but no true zero (e.g., temperature in Celsius).

  • Ratio: Ordered values with measurable differences and a true zero (e.g., height, weight, age).

Experimental Design and Ethics

Components of Experimental Design

  • Explanatory variable: The variable manipulated by the researcher (independent variable).

  • Response variable: The outcome measured (dependent variable).

  • Treatments: Different conditions applied to experimental units.

  • Control group: Receives a placebo or standard treatment for comparison.

  • Random assignment: Assigning subjects to treatments by chance to reduce bias.

  • Blinding: Keeping subjects and/or researchers unaware of treatment assignments to prevent bias (single-blind or double-blind).

  • Lurking variables: Uncontrolled variables that may affect the response variable.

Ethical Considerations

  • Studies must be designed to avoid harm, ensure informed consent, and report results honestly.

  • Professional guidelines and federal regulations provide standards for ethical research.

Summary Table: Types of Sampling Methods

Sampling Method

Description

Example

Simple Random

Every member has equal chance of selection

Randomly select 100 students from a list

Stratified

Divide population into groups, sample from each

Sample 25 students from each grade level

Cluster

Divide into clusters, randomly select clusters, include all in selected clusters

Randomly select 3 classes, survey all students in those classes

Systematic

Select every nth member after a random start

Survey every 10th person on a list

Convenience

Sample those easiest to reach

Survey students in the library

Summary Table: Levels of Measurement

Level

Order?

Equal Intervals?

True Zero?

Example

Nominal

No

No

No

Eye color

Ordinal

Yes

No

No

Class rank

Interval

Yes

Yes

No

Temperature (°C)

Ratio

Yes

Yes

Yes

Height

Summary Table: Frequency Table Example

Data Value

Frequency

Relative Frequency

Cumulative Relative Frequency

2

3

0.15

0.15

3

5

0.25

0.40

4

3

0.15

0.55

5

6

0.30

0.85

6

2

0.10

0.95

7

1

0.05

1.00

Key Formulas

  • Mean (Arithmetic Average):

  • Relative Frequency:

  • Cumulative Relative Frequency:

Conclusion

Understanding the foundational concepts of statistics—including types of data, sampling methods, levels of measurement, and experimental design—is essential for analyzing and interpreting data accurately. Proper use of tables, graphs, and statistical terminology ensures clear communication and valid conclusions in research and everyday decision-making.

Pearson Logo

Study Prep