Skip to main content
Back

Statistics Fundamentals: Concepts, Data, and Summarization

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Statistics: Introduction and Key Concepts

Definitions and Scope

Statistics is the science of collecting, organizing, analyzing, and interpreting data. It provides tools for understanding and making decisions based on information from populations and samples.

  • Data: Information that is collected for analysis.

  • Population: The complete set of all individuals or items to be studied.

  • Sample: A subset of the population, selected for analysis.

  • Statistic: A numerical summary about a sample.

  • Parameter: A numerical summary about a population.

  • Variable: A characteristic of individuals in the population being studied.

Types of Data

Qualitative vs. Quantitative Data

Data can be classified based on its nature and measurement.

  • Qualitative (Categorical): Non-numerical data used to identify categories (e.g., gender, type of car).

  • Quantitative (Numerical): Data that can be measured or counted (e.g., number of cars in a parking lot, test scores, height, weight).

Quantitative Data Subtypes

  • Discrete: Countable values with gaps between them (e.g., number of students).

  • Continuous: Uncountable values that can take on infinitely many possibilities (e.g., height, weight).

Levels of Measurement

Measurement Scales

Levels of measurement determine the type of statistical analysis that can be performed.

  • Nominal (Qualitative): Categories without a natural order (e.g., eye color).

  • Ordinal (Qualitative): Categories with a natural order (e.g., ranking, satisfaction levels).

  • Interval (Quantitative): Ordered values with meaningful differences, but no true zero (e.g., temperature in Celsius).

  • Ratio (Quantitative): Like interval, but with a true zero; ratios are meaningful (e.g., height, age).

Methods of Data Collection

Observational vs. Experimental Studies

  • Observational Study: No treatment is applied; data is observed and recorded.

  • Designed Experiment: Treatment is applied to study its effect.

Types of Observational Studies

  • Retrospective: Uses past data.

  • Prospective: Collects data going forward.

  • Cross-sectional: Data collected at one point in time.

Sampling Methods and Bias

Random Sampling and Bias

  • Random Sample: Every member of the population has an equal chance of being selected.

  • Bias: Systematic error in choosing a sample; should be avoided.

Sampling Techniques

  • Simple Random: Equal chance for all members.

  • Systematic: Every k-th member is chosen.

  • Stratified: Population divided into groups, then sampled from each group.

  • Cluster: Population divided into groups, some groups are randomly selected and all members in those groups are sampled.

  • Convenience: Sample is taken from easily available members (not recommended).

Types of Bias in Sampling

  • Non-Response: Lack of response from selected individuals.

  • Response: Responses may be affected by misinterpretation, lying, or pressure.

Basics of Designed Experiments

Key Terms

  • Treatment: The condition applied to experimental units.

  • Placebo: An inactive treatment used for comparison.

  • Placebo Effect: Psychological or physical improvement due to belief in treatment.

  • Single-Blind: Subjects do not know which group they are in.

  • Double-Blind: Neither subjects nor researchers know group assignments.

  • Control Group: Group receiving no treatment or placebo.

  • Randomization: Random assignment to groups.

  • Block: Grouping subjects by a variable in a designed experiment.

Process of Statistics

  1. Identify the problem.

  2. Collect data.

  3. Give data an illustration (e.g., tables, charts).

  4. Perform inference.

  5. Interpret results and make decisions.

Organization of Data

Frequency Distribution Tables

  • Tables with categories and the number in each category.

  • Relative frequency distribution: Shows proportion or percentage for each category.

Equation for Relative Frequency:

Illustrations

  • Bar Graph (qualitative data)

  • Pareto Chart (highest frequency on left)

  • Pie Chart

Variables in Experiments

  • Explanatory Variable: Independent variable; explains or affects the response.

  • Response Variable: Dependent variable; affected by the explanatory variable.

  • Confounding Variable: Related to both explanatory and response variables; can affect results.

  • Lurking Variable: Not measured but affects results.

Graphical Representation of Data

Histograms

  • Graph showing frequency distribution of quantitative data.

  • Steps: Find range, choose number of classes, calculate class width, create frequency table, construct histogram.

Cumulative Frequency Table

  • Shows cumulative totals for each class.

Other Graphs

  • Stem and Leaf Plot: Good for small data sets with small ranges.

  • Dot Plot: Good for small data values; useful for identifying outliers.

  • Frequency Polygon: Uses class midpoints and lines.

  • Ogive: Uses cumulative frequency.

  • Time Series Plot: Shows trends over time.

  • Misleading Graphs: Inconsistent scales, omission of context, starting axis above zero, misuse of pictographs.

Numerical Summaries of Data

Measures of Central Tendency

  • Mean (Arithmetic Average):

  • Median: Middle value when data is ordered.

  • Mode: Value with greatest frequency.

Round Off Rule: Round final answer to one more decimal place than the data value.

Resistance

  • The mean is not resistant to outliers.

  • The median is resistant to outliers.

Measures of Dispersion (Spread)

  • Range:

  • Standard Deviation: Square root of variance.

  • Mid Range:

Empirical Rule (Normal Distributions)

  • 68% of data within 1 standard deviation of mean

  • 95% within 2 standard deviations

  • 99.7% within 3 standard deviations

Only use for normal distributions. Never assume normality without evidence.

Measures of Position and Outliers

Percentiles and Quartiles

  • Percentile Formula:

  • Quartiles divide data into four equal parts.

  • First quartile (Q1): 25th percentile

  • Second quartile (Q2): 50th percentile (median)

  • Third quartile (Q3): 75th percentile

Identifying Outliers

  • Lower fence:

  • Upper fence:

  • Outliers are values outside these fences.

Interquartile range is the middle 50% of the data.

Boxplot (Five-Number Summary)

  • Minimum, Q1, Q2 (median), Q3, Maximum

  • Symmetry determined by whisker lengths

Z-Scores

  • Standardized value:

  • Indicates how many standard deviations a value is from the mean.

  • Positive Z: above mean; Negative Z: below mean; Z = 0: at mean.

Probability Fundamentals

Definition and Rules

  • Probability: Study of uncertainty; likelihood of an event occurring.

  • Probability values must be between 0 and 1, inclusive.

  • (example)

Addition Rule

  • If A and B are two events:

    • (if disjoint)

    • (general rule)

  • Disjoint (mutually exclusive): No overlap between events.

  • Not disjoint: Use general formula.

Summary Table: Levels of Measurement

Level

Type

Order

True Zero

Examples

Nominal

Qualitative

No

No

Eye color, gender

Ordinal

Qualitative

Yes

No

Ranking, satisfaction

Interval

Quantitative

Yes

No

Temperature (Celsius)

Ratio

Quantitative

Yes

Yes

Height, age

Summary Table: Measures of Central Tendency

Measure

Definition

Formula

Resistance to Outliers

Mean

Arithmetic average

No

Median

Middle value

Middle of ordered data

Yes

Mode

Most frequent value

Value with highest frequency

Yes

Summary Table: Sampling Methods

Method

Description

Example

Simple Random

Equal chance for all members

Lottery draw

Systematic

Every k-th member chosen

Every 10th person

Stratified

Divide into groups, sample from each

Sample from each grade level

Cluster

Divide into groups, sample all from some groups

Sample all students from selected classes

Convenience

Sample from easily available members

Survey at a mall

Additional info:

  • Some context and examples have been expanded for clarity and completeness.

  • Tables have been recreated and summarized for key comparisons.

Pearson Logo

Study Prep