Introduction to Statistics: Data, Sampling, and Experimental Design

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistics is the science of collecting, analyzing, interpreting, and presenting data. It is essential for making informed decisions in various fields such as business, health, education, and social sciences. Understanding statistics enables individuals to critically evaluate information and make data-driven decisions.

Definitions of Statistics, Probability, and Key Terms

Statistics and Probability

Statistics involves methods for collecting, organizing, summarizing, and interpreting data.
Probability is the mathematical study of randomness and uncertainty, used to predict the likelihood of events.

Statistics is divided into two main branches:

Descriptive statistics: Methods for summarizing data using graphs and numerical measures (e.g., mean, median).
Inferential statistics: Methods for making predictions or inferences about a population based on sample data, often using probability theory.

Key Terms

Population: The entire group of individuals or items under study.
Sample: A subset of the population selected for analysis.
Parameter: A numerical characteristic of a population (e.g., population mean μ).
Statistic: A numerical characteristic calculated from a sample (e.g., sample mean \( \bar{x} \)).
Variable: A characteristic or measurement that can vary among individuals in a population (e.g., height, income).
Data: The observed values of a variable.

Types of Data

Qualitative (Categorical) Data

Qualitative data describe attributes or categories. Examples include eye color, type of car, or student classification.

Cannot be meaningfully averaged or ordered (unless ordinal).

Quantitative Data

Quantitative data are numerical and can be measured or counted.

Discrete data: Countable values (e.g., number of books, number of students).
Continuous data: Measurable values that can take any value within a range (e.g., weight, height, time).

Examples of Data Types

Number of books: Quantitative discrete
Weight of a backpack: Quantitative continuous
Color of a backpack: Qualitative

Describing Data with Tables and Graphs

Frequency Tables

Frequency tables organize data values and their counts. They can also include relative frequency (proportion of total) and cumulative relative frequency (running total of proportions).

Graphs for Qualitative Data

Pie charts: Show proportions of categories as slices of a circle.
Bar graphs: Use bars to represent the frequency or proportion of categories.
Pareto charts: Bar graphs with categories ordered from largest to smallest.

Pie chart showing classification of statistics students

Example: The pie chart above displays the classification of statistics students (first-year, sophomore, junior, senior), which is qualitative (categorical) data.

Graphs for Quantitative Data

Histograms: Show the distribution of quantitative data by grouping values into intervals (bins).

Histogram of number of credit hours completed per student

Example: The histogram above shows the number of credit hours completed per student, which is quantitative discrete data.

Comparing Groups with Graphs

Pie charts and bar graphs can be used to compare proportions between groups or institutions.

Pie charts comparing part-time and full-time students at two colleges Bar graph comparing full-time and part-time students at two colleges

Example: The pie charts and bar graph above compare the proportions and counts of part-time and full-time students at De Anza College and Foothill College.

Special Cases in Graphs

Percentages may add to more than 100% if individuals can belong to multiple categories.
Missing categories (e.g., "Other/Unknown") should be included for accurate representation.

Bar graph showing percentages for multiple student categories Bar graph of student ethnicity without Other/Unknown category Pareto chart of student ethnicity Pie charts of student ethnicity with and without sorting

Example: The bar graphs and pie charts above illustrate the importance of including all categories and sorting for clarity when displaying categorical data.

Sampling Methods

Random Sampling Techniques

Simple random sampling: Every member of the population has an equal chance of being selected.
Stratified sampling: Population is divided into strata (groups), and a random sample is taken from each stratum.
Cluster sampling: Population is divided into clusters, some clusters are randomly selected, and all members of selected clusters are included.
Systematic sampling: Every nth member is selected from a list after a random starting point.
Convenience sampling: Sample is taken from individuals who are easiest to reach (often biased).

Sampling Errors and Bias

Sampling error: Natural variation due to using a sample instead of the whole population.
Sampling bias: Systematic error due to non-random sampling or underrepresentation of certain groups.
Nonsampling error: Errors not related to the act of sampling, such as data entry mistakes or faulty measurement tools.

Levels of Measurement

Data can be classified into four levels of measurement, which determine the types of statistical analyses that are appropriate:

Nominal: Categories with no inherent order (e.g., colors, brands).
Ordinal: Categories with a meaningful order but no measurable difference between values (e.g., rankings).
Interval: Ordered values with measurable differences, but no true zero (e.g., temperature in Celsius).
Ratio: Ordered values with measurable differences and a true zero (e.g., height, weight, age).

Experimental Design and Ethics

Components of Experimental Design

Explanatory variable: The variable manipulated by the researcher (independent variable).
Response variable: The outcome measured (dependent variable).
Treatments: Different conditions applied to experimental units.
Control group: Receives a placebo or standard treatment for comparison.
Random assignment: Assigning subjects to treatments by chance to reduce bias.
Blinding: Keeping subjects and/or researchers unaware of treatment assignments to prevent bias (single-blind or double-blind).
Lurking variables: Uncontrolled variables that may affect the response variable.

Ethical Considerations

Studies must be designed to avoid harm, ensure informed consent, and report results honestly.
Professional guidelines and federal regulations provide standards for ethical research.

Summary Table: Types of Sampling Methods

Sampling Method	Description	Example
Simple Random	Every member has equal chance of selection	Randomly select 100 students from a list
Stratified	Divide population into groups, sample from each	Sample 25 students from each grade level
Cluster	Divide into clusters, randomly select clusters, include all in selected clusters	Randomly select 3 classes, survey all students in those classes
Systematic	Select every nth member after a random start	Survey every 10th person on a list
Convenience	Sample those easiest to reach	Survey students in the library

Summary Table: Levels of Measurement

Level	Order?	Equal Intervals?	True Zero?	Example
Nominal	No	No	No	Eye color
Ordinal	Yes	No	No	Class rank
Interval	Yes	Yes	No	Temperature (°C)
Ratio	Yes	Yes	Yes	Height

Summary Table: Frequency Table Example

Data Value	Frequency	Relative Frequency	Cumulative Relative Frequency
2	3	0.15	0.15
3	5	0.25	0.40
4	3	0.15	0.55
5	6	0.30	0.85
6	2	0.10	0.95
7	1	0.05	1.00

Key Formulas

Mean (Arithmetic Average):
Relative Frequency:
Cumulative Relative Frequency:

Conclusion

Understanding the foundational concepts of statistics—including types of data, sampling methods, levels of measurement, and experimental design—is essential for analyzing and interpreting data accurately. Proper use of tables, graphs, and statistical terminology ensures clear communication and valid conclusions in research and everyday decision-making.