BackStatistics Fundamentals: Concepts, Data, and Summarization
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Statistics: Introduction and Key Concepts
Definitions and Scope
Statistics is the science of collecting, organizing, analyzing, and interpreting data. It provides tools for understanding and making decisions based on information from populations and samples.
Data: Information that is collected for analysis.
Population: The complete set of all individuals or items to be studied.
Sample: A subset of the population, selected for analysis.
Statistic: A numerical summary about a sample.
Parameter: A numerical summary about a population.
Variable: A characteristic of individuals in the population being studied.
Types of Data
Qualitative vs. Quantitative Data
Data can be classified based on its nature and measurement.
Qualitative (Categorical): Non-numerical data used to identify categories (e.g., gender, type of car).
Quantitative (Numerical): Data that can be measured or counted (e.g., number of cars in a parking lot, test scores, height, weight).
Quantitative Data Subtypes
Discrete: Countable values with gaps between them (e.g., number of students).
Continuous: Uncountable values that can take on infinitely many possibilities (e.g., height, weight).
Levels of Measurement
Measurement Scales
Levels of measurement determine the type of statistical analysis that can be performed.
Nominal (Qualitative): Categories without a natural order (e.g., eye color).
Ordinal (Qualitative): Categories with a natural order (e.g., ranking, satisfaction levels).
Interval (Quantitative): Ordered values with meaningful differences, but no true zero (e.g., temperature in Celsius).
Ratio (Quantitative): Like interval, but with a true zero; ratios are meaningful (e.g., height, age).
Methods of Data Collection
Observational vs. Experimental Studies
Observational Study: No treatment is applied; data is observed and recorded.
Designed Experiment: Treatment is applied to study its effect.
Types of Observational Studies
Retrospective: Uses past data.
Prospective: Collects data going forward.
Cross-sectional: Data collected at one point in time.
Sampling Methods and Bias
Random Sampling and Bias
Random Sample: Every member of the population has an equal chance of being selected.
Bias: Systematic error in choosing a sample; should be avoided.
Sampling Techniques
Simple Random: Equal chance for all members.
Systematic: Every k-th member is chosen.
Stratified: Population divided into groups, then sampled from each group.
Cluster: Population divided into groups, some groups are randomly selected and all members in those groups are sampled.
Convenience: Sample is taken from easily available members (not recommended).
Types of Bias in Sampling
Non-Response: Lack of response from selected individuals.
Response: Responses may be affected by misinterpretation, lying, or pressure.
Basics of Designed Experiments
Key Terms
Treatment: The condition applied to experimental units.
Placebo: An inactive treatment used for comparison.
Placebo Effect: Psychological or physical improvement due to belief in treatment.
Single-Blind: Subjects do not know which group they are in.
Double-Blind: Neither subjects nor researchers know group assignments.
Control Group: Group receiving no treatment or placebo.
Randomization: Random assignment to groups.
Block: Grouping subjects by a variable in a designed experiment.
Process of Statistics
Identify the problem.
Collect data.
Give data an illustration (e.g., tables, charts).
Perform inference.
Interpret results and make decisions.
Organization of Data
Frequency Distribution Tables
Tables with categories and the number in each category.
Relative frequency distribution: Shows proportion or percentage for each category.
Equation for Relative Frequency:
Illustrations
Bar Graph (qualitative data)
Pareto Chart (highest frequency on left)
Pie Chart
Variables in Experiments
Explanatory Variable: Independent variable; explains or affects the response.
Response Variable: Dependent variable; affected by the explanatory variable.
Confounding Variable: Related to both explanatory and response variables; can affect results.
Lurking Variable: Not measured but affects results.
Graphical Representation of Data
Histograms
Graph showing frequency distribution of quantitative data.
Steps: Find range, choose number of classes, calculate class width, create frequency table, construct histogram.
Cumulative Frequency Table
Shows cumulative totals for each class.
Other Graphs
Stem and Leaf Plot: Good for small data sets with small ranges.
Dot Plot: Good for small data values; useful for identifying outliers.
Frequency Polygon: Uses class midpoints and lines.
Ogive: Uses cumulative frequency.
Time Series Plot: Shows trends over time.
Misleading Graphs: Inconsistent scales, omission of context, starting axis above zero, misuse of pictographs.
Numerical Summaries of Data
Measures of Central Tendency
Mean (Arithmetic Average):
Median: Middle value when data is ordered.
Mode: Value with greatest frequency.
Round Off Rule: Round final answer to one more decimal place than the data value.
Resistance
The mean is not resistant to outliers.
The median is resistant to outliers.
Measures of Dispersion (Spread)
Range:
Standard Deviation: Square root of variance.
Mid Range:
Empirical Rule (Normal Distributions)
68% of data within 1 standard deviation of mean
95% within 2 standard deviations
99.7% within 3 standard deviations
Only use for normal distributions. Never assume normality without evidence.
Measures of Position and Outliers
Percentiles and Quartiles
Percentile Formula:
Quartiles divide data into four equal parts.
First quartile (Q1): 25th percentile
Second quartile (Q2): 50th percentile (median)
Third quartile (Q3): 75th percentile
Identifying Outliers
Lower fence:
Upper fence:
Outliers are values outside these fences.
Interquartile range is the middle 50% of the data.
Boxplot (Five-Number Summary)
Minimum, Q1, Q2 (median), Q3, Maximum
Symmetry determined by whisker lengths
Z-Scores
Standardized value:
Indicates how many standard deviations a value is from the mean.
Positive Z: above mean; Negative Z: below mean; Z = 0: at mean.
Probability Fundamentals
Definition and Rules
Probability: Study of uncertainty; likelihood of an event occurring.
Probability values must be between 0 and 1, inclusive.
(example)
Addition Rule
If A and B are two events:
(if disjoint)
(general rule)
Disjoint (mutually exclusive): No overlap between events.
Not disjoint: Use general formula.
Summary Table: Levels of Measurement
Level | Type | Order | True Zero | Examples |
|---|---|---|---|---|
Nominal | Qualitative | No | No | Eye color, gender |
Ordinal | Qualitative | Yes | No | Ranking, satisfaction |
Interval | Quantitative | Yes | No | Temperature (Celsius) |
Ratio | Quantitative | Yes | Yes | Height, age |
Summary Table: Measures of Central Tendency
Measure | Definition | Formula | Resistance to Outliers |
|---|---|---|---|
Mean | Arithmetic average | No | |
Median | Middle value | Middle of ordered data | Yes |
Mode | Most frequent value | Value with highest frequency | Yes |
Summary Table: Sampling Methods
Method | Description | Example |
|---|---|---|
Simple Random | Equal chance for all members | Lottery draw |
Systematic | Every k-th member chosen | Every 10th person |
Stratified | Divide into groups, sample from each | Sample from each grade level |
Cluster | Divide into groups, sample all from some groups | Sample all students from selected classes |
Convenience | Sample from easily available members | Survey at a mall |
Additional info:
Some context and examples have been expanded for clarity and completeness.
Tables have been recreated and summarized for key comparisons.