BackIntroduction to Statistics: Data, Variables, and Sampling Methods
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
What is Statistics?
Definition and Scope
Statistics is the science of generalizing data to understand and describe groups using information collected from smaller samples. It involves collecting, organizing, analyzing, and interpreting data to draw conclusions about populations.
Data: Collections of observations, such as measurements, genders, or survey responses.
Statistics (the field): The science of planning studies and experiments, obtaining data, and drawing conclusions based on the data.
Process Involved in a Statistical Study
Prepare, Analyze, Conclude
The statistical study process consists of three main steps: prepare, analyze, and conclude.
Prepare: Define the context, source, and sampling method. Ask: What is the goal? What data is needed?
Analyze: Organize and summarize the data, explore relationships, and apply statistical methods.
Conclude: Assess the significance of results and determine practical implications.
Population vs. Sample
Definitions and Differences
Understanding the distinction between a population and a sample is fundamental in statistics.
Population: The entire group of interest.
Sample: A subset of the population selected for measurement or analysis.
Parameter: A numerical measure describing a characteristic of a population.
Statistic: A numerical measure describing a characteristic of a sample.
Example: The average height of all adult female Canadians is a population parameter; the average height from a sample of adult female Canadians is a sample statistic.
Term | Definition |
|---|---|
Population | All members of a group of interest |
Sample | Subset of the population |
Parameter | Numerical measure of a population |
Statistic | Numerical measure of a sample |
Data Basics and Variables
Data Matrices and Observation Units
Data is often organized in a data matrix, where columns represent variables (characteristics measured) and rows represent observation units (individual cases or subjects).
Variables: Characteristics such as gender, pulse, body mass index, etc.
Observation Unit: Each row in the data matrix; an individual case.
Sample Statistics: Calculations (e.g., mean, median) performed on the sample data.
Types of Variables
Quantitative vs. Qualitative Data
Variables can be classified as quantitative (numerical) or qualitative (categorical).
Quantitative Data: Numbers representing counts or measurements (e.g., height, weight).
Categorical Data: Names or labels representing categories (e.g., gender, ethnicity).
Subtypes of Data
Discrete Data: Result from a countable number of values (e.g., number of students).
Continuous Data: Result from infinitely many possible values within a range (e.g., height, weight).
Nominal Data: Categorical data without a natural order (e.g., blood type).
Ordinal Data: Categorical data with a natural order (e.g., education level).
Type | Subtype | Example |
|---|---|---|
Quantitative | Discrete | Number of books |
Quantitative | Continuous | Height |
Categorical | Nominal | Blood type |
Categorical | Ordinal | Education level |
Sampling Methods
Types of Sampling
Sampling methods determine how samples are selected from populations.
Simple Random Sampling (SRS): Every member has an equal chance of being selected.
Stratified Sampling: Population divided into strata; samples drawn from each stratum.
Cluster Sampling: Population divided into clusters; entire clusters are sampled.
Multistage Sampling: Combination of cluster and stratified sampling; samples are drawn in stages.
Convenience Sampling: Samples are chosen based on ease of access.
Sampling Method | Description |
|---|---|
Simple Random | Equal chance for all cases |
Stratified | Sample from each subgroup |
Cluster | Sample entire clusters |
Multistage | Sample in multiple stages |
Convenience | Easy to get samples |
Other Sampling Terms
Available Data: Data collected in the past for other purposes.
Sample Survey: Data collected from a sample.
Census: Data collected from all cases in a population.
Observational Studies and Experiments
Observational Studies
In an observational study, researchers observe and measure characteristics without influencing the subjects.
Example: Surveys are common observational studies.
Experiments
In an experiment, a treatment is applied to subjects, and the effect is measured.
Treatment Group: Receives the treatment.
Control Group: Does not receive the treatment.
Factor (Explanatory Variable): Variable whose effect is studied.
Response Variable: Variable whose values are compared.
Example: Vaccine trials where one group receives the vaccine and another receives a placebo.
Blind Tests
Single-blind: Subjects do not know which group they are in.
Double-blind: Neither subjects nor researchers know group assignments.
Key Formulas
Population and Sample Statistics
Population Mean:
Sample Mean:
Summary Table: Types of Data
Type | Subtype | Example |
|---|---|---|
Quantitative | Discrete | Number of students |
Quantitative | Continuous | Height, weight |
Categorical | Nominal | Blood type |
Categorical | Ordinal | Education level |
Summary Table: Sampling Methods
Method | Description |
|---|---|
Simple Random | Equal chance for all cases |
Stratified | Sample from each subgroup |
Cluster | Sample entire clusters |
Multistage | Sample in multiple stages |
Convenience | Easy to get samples |
Additional info: These notes provide foundational concepts for introductory statistics, including definitions, examples, and key distinctions necessary for further study in the field.