Fundamental Concepts and Data Analysis in Introductory Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Ch 1 - Introduction to Statistical Studies

Types of Studies

Statistical studies are designed to collect, analyze, and interpret data. Understanding the type of study is crucial for determining the validity of conclusions.

Observational Study: Researchers observe subjects without manipulating variables. Example: Recording customer shopping habits.
Experiment: Researchers apply treatments and observe effects. Example: Testing the effect of a new drug on patients.

Example: A study records the number of items purchased and time spent in a store by customers. This is an observational study because no variables are manipulated.

Populations and Samples

In statistics, data are collected from a population or a sample:

Population: The entire group of interest (e.g., all store customers).
Sample: A subset of the population (e.g., 192 customers surveyed).

Parameter and Statistic

Parameter: a value that describes the population (e.g., the average age of all store customers) (usually a greek letter)
Statistic: a value that comes from the sample (e.g., the the average age of the 192 customers surveyed)

Variables: Types and Measurement Scales

Variables are characteristics measured in a study. They can be classified as:

Quantitative Variables: Numeric values (e.g., number of items purchased).
Qualitative (Categorical) Variables: Non-numeric categories (e.g., type of pet).

Quantitative variables can be further classified as:

Discrete: Countable values (e.g., number of pets).
Continuous: Any value within a range (e.g., height, weight).

Ch 2 - Graphical Representation of Data

Types of Tables

Frequency Table, Relative Frequency Table, Cumulative Frequency Table, Cumulative Relative Frequency Table

Types of Graphs

Choosing the appropriate graph depends on the variable type:

Bar Graph, Pie Chart: For qualitative variables.
Histogram, Stem-and-Leaf Plot, Dot Plot: For quantitative variables.

Example: To display the number of items purchased (quantitative), use a histogram or stem-and-leaf plot.

Data may be grouped in classes or bins.

Class width, lower limit, upper limit.

Experimental Design

Matched Pairs and Blinding

Experimental design ensures valid and unbiased results.

Matched Pairs Design: Subjects are paired based on similarities, and each receives different treatments.
Blinding: Subjects do not know which treatment they receive, reducing bias.

Example: Subjects smell two brands of pest shampoo and rank them. If the order is randomized and subjects are unaware of the brands, the study is blinded.

Ch 3 - Summarizing Data: Descriptive Statistics

Measures of Center

Mean: The arithmetic average. (µ: population mean, x-bar: sample mean)
Median: The middle value when data are ordered.
Mode: The most frequently occurring value.

Resistance: the mean is the best measure of center if the distribution is symmetric; the median is better if the data are skewed.

Measures of Spread

Range: Difference between the highest and lowest values.
Standard Deviation: Measures average distance from the mean. (σ: population std dev., s: sample std dev)
Variance (): The square of the standard deviation.
Interquartile Range (IQR): Difference between the 75th and 25th percentiles.

Shape of Distributions

Symmetric: Data are evenly distributed around the center.
Skewed Right: Tail extends to the right (higher values).
Skewed Left: Tail extends to the left (lower values).
Bell-Shaped: Symmetric with a peak in the center like a bell.

Empirical Rule (68-95-99.7 Rule)

For bell-shaped (normal) distributions:

About 68% of data fall within 1 standard deviation of the mean.
About 95% within 2 standard deviations. (between z= -2 and z=2)
About 99.7% within 3 standard deviations.

Example: If the mean monthly utility bill is $124 and the standard deviation is $12, then approximately 68% of bills are between $112 and $136.

z-score: how many standard deviations a data point is from the mean. z= (data - mean) / st dev.

Describing Data Sets

Statistics vs. Parameters

Statistic: A numerical summary of a sample.
Parameter: A numerical summary of a population.

Example: The mean of 36,300 miles from a sample is a statistic.

Calculating Summary Statistics

Median: Middle value of ordered data.
Mode: Most frequent value.
Range: Highest minus lowest value.
Standard Deviation (): (use statcrunch!)
Variance (): (use statcrunch!)
Quartiles: Values that divide data into four equal parts with 25% of data in each quartile.
Percentiles: the percentage to the LEFT of a data point

Correlation and Causation

Scatterplots and Correlation

Scatterplots visually display the relationship between two quantitative variables.

Positive Association: As one variable increases, so does the other.
Negative Association: As one variable increases, the other decreases.
No Association: No discernible pattern.

Correlation Coefficient ()

The linear correlation coefficient measures the strength and direction of a linear relationship:

ranges from -1 (perfect negative) to +1 (perfect positive).
Values near 0 indicate weak or no linear relationship.

Example: If and the critical value is 0.514, the relationship is just significant at the chosen significance level.

Causation

Correlation does not imply causation. Even if two variables are correlated, it does not mean one causes the other.

Summary Table: Types of Variables and Graphs

Variable Type	Example	Appropriate Graph
Qualitative (Nominal)	Pet type	Bar graph, pie chart
Quantitative (Discrete)	Number of items	Histogram, stem-and-leaf plot, dot plot
Quantitative (Continuous)	Height, rainfall	Histogram, boxplot

Additional info:

When data are skewed, the median is a better measure of center than the mean because it is resistant to outliers.
Empirical Rule applies only to approximately bell-shaped distributions.
Blinding and randomization are essential for reducing bias in experiments.