BackStep-by-Step Guidance for Statistics Review: Dr. Burner's Baldness Study
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Q1. Create a frequency chart based on the above data. The first group should be 4.2 – 4.6.
Background
Topic: Frequency Distributions
This question tests your ability to organize raw data into a frequency distribution table, which is a foundational skill in descriptive statistics.
Key Terms and Formulas
Frequency: The number of data points that fall within a specified interval (class).
Class Interval: A range of values used to group data (e.g., 4.2–4.6).
Lower Class Limit: The smallest value in a class interval.
Upper Class Limit: The largest value in a class interval.
Step-by-Step Guidance
List all the data points in order from smallest to largest to help with grouping.
Determine the number of classes (intervals) you want. A common rule is to use between 5 and 8 classes for 30 data points.
Calculate the class width:
Starting with 4.2–4.6 as the first interval, create subsequent intervals by adding the class width to the lower limit of each previous class.
Count how many data points fall into each interval and record these frequencies in a table.
Try solving on your own before revealing the answer!
Q2. Create a histogram for the above frequency chart.
Background
Topic: Data Visualization – Histograms
This question assesses your ability to represent frequency data graphically using a histogram, which helps visualize the distribution of the data.
Key Terms and Formulas
Histogram: A bar graph representing the frequency of data within each class interval.
X-axis: Represents the class intervals.
Y-axis: Represents the frequency for each interval.
Step-by-Step Guidance
Use the frequency chart you created in Q1 as the basis for your histogram.
Draw the x-axis and label it with your class intervals (e.g., 4.2–4.6, 4.7–5.1, etc.).
Draw the y-axis and label it with frequencies (the counts from your table).
For each class interval, draw a bar whose height corresponds to the frequency for that interval.
Try solving on your own before revealing the answer!
Q3. What is the shape of the data?
Background
Topic: Distribution Shapes
This question asks you to describe the overall pattern of the data, such as whether it is symmetric, skewed left, or skewed right.
Key Terms
Symmetric: Data is evenly distributed around the center.
Skewed Right (Positive Skew): Tail is longer on the right.
Skewed Left (Negative Skew): Tail is longer on the left.
Step-by-Step Guidance
Look at your histogram from Q2 and observe the distribution of the bars.
Determine if the data is roughly symmetric or if one tail is longer than the other.
Consider the mean and median (from later questions) to help confirm the shape.
Try solving on your own before revealing the answer!
Q4. What is the average length of the forehead?
Background
Topic: Measures of Central Tendency – Mean
This question tests your ability to calculate the mean (average) of a data set.
Key Formula
Where are the data values and is the number of data points.
Step-by-Step Guidance
Add up all the forehead length measurements: .
Count the total number of data points (), which is 30 in this case.
Divide the sum by the number of data points: .
Try solving on your own before revealing the answer!
Q5. What is the median of the forehead?
Background
Topic: Measures of Central Tendency – Median
This question tests your ability to find the median, or middle value, of a data set.
Key Terms and Steps
Median: The middle value when data is ordered from least to greatest.
Step-by-Step Guidance
Order all 30 data points from smallest to largest.
Since there are 30 data points (an even number), the median is the average of the 15th and 16th values in the ordered list.
Identify the 15th and 16th values and calculate their average.
Try solving on your own before revealing the answer!
Q6. Based on your mean and median, what is the shape of the data?
Background
Topic: Comparing Mean and Median to Assess Skewness
This question asks you to use the relationship between the mean and median to infer the shape of the distribution.
Key Concepts
If mean > median, the data is likely skewed right.
If mean < median, the data is likely skewed left.
If mean ≈ median, the data is likely symmetric.
Step-by-Step Guidance
Compare the mean and median values you calculated in Q4 and Q5.
Use the above rules to determine the likely shape of the data.
Try solving on your own before revealing the answer!
Q7. Determine the five-number summary for the data.
Background
Topic: Five-Number Summary
This question tests your ability to find the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum of a data set.
Key Terms
Minimum: Smallest value
Q1: 25th percentile
Median (Q2): 50th percentile
Q3: 75th percentile
Maximum: Largest value
Step-by-Step Guidance
Order the data from smallest to largest.
Identify the minimum and maximum values.
Find the median (Q2) as in Q5.
Find Q1 (the median of the lower half, not including Q2 if n is even).
Find Q3 (the median of the upper half, not including Q2 if n is even).
Try solving on your own before revealing the answer!
Q8. Are there any outliers for the data? (Show the work)
Background
Topic: Identifying Outliers Using the IQR Method
This question tests your ability to use the interquartile range (IQR) to determine if any data points are outliers.
Key Formulas
Lower Fence:
Upper Fence:
Step-by-Step Guidance
Calculate the IQR using your Q1 and Q3 from Q7.
Compute the lower and upper fences using the formulas above.
Check if any data points fall below the lower fence or above the upper fence.
Try solving on your own before revealing the answer!
Q9. Create a box plot for the data.
Background
Topic: Box Plots (Box-and-Whisker Plots)
This question tests your ability to visually summarize the five-number summary and identify potential outliers.
Key Terms
Box Plot: A graphical representation of the five-number summary.
Step-by-Step Guidance
Draw a number line that includes the range of your data.
Mark the minimum, Q1, median, Q3, and maximum on the number line.
Draw a box from Q1 to Q3, with a line at the median.
Draw whiskers from the box to the minimum and maximum (unless there are outliers, in which case whiskers stop at the last non-outlier).
Try solving on your own before revealing the answer!
Q10. The percentage of men between the ages of 34 and 45 who start having hair loss
Background
Topic: Normal Distribution and Z-scores
This question tests your ability to use the normal distribution to find probabilities (percentages) for a given range.
Key Formulas
Where is the value, is the mean, and is the standard deviation.
Step-by-Step Guidance
Calculate the z-score for 34:
Calculate the z-score for 45:
Use the standard normal table to find the probabilities corresponding to each z-score.
Subtract the lower probability from the higher to find the percentage between 34 and 45.
Try solving on your own before revealing the answer!
Q11. The percentage of men greater than 37 years of age who have started having hair loss
Background
Topic: Normal Distribution – Right Tail Probability
This question tests your ability to find the probability that a value is greater than a certain point using the normal distribution.
Key Formulas
Step-by-Step Guidance
Calculate the z-score for 37.
Use the standard normal table to find the probability to the left of this z-score.
Subtract this probability from 1 to get the probability to the right (greater than 37).
Try solving on your own before revealing the answer!
Q12. What is the percentage of men less than 40 years of age who have hair loss?
Background
Topic: Normal Distribution – Left Tail Probability
This question tests your ability to find the probability that a value is less than a certain point using the normal distribution.
Key Formulas
Step-by-Step Guidance
Calculate the z-score for 40.
Use the standard normal table to find the probability to the left of this z-score.
Convert this probability to a percentage.
Try solving on your own before revealing the answer!
Q13. At what age is a man at the 90th percentile for hair loss?
Background
Topic: Finding a Value from a Percentile in a Normal Distribution
This question tests your ability to use the normal distribution to find the value (age) corresponding to a given percentile.
Key Formulas
Find the z-score that corresponds to the 90th percentile (use a z-table).
Step-by-Step Guidance
Look up the z-score for the 90th percentile (commonly about 1.28).
Plug the z-score, mean, and standard deviation into the formula: .
Try solving on your own before revealing the answer!
Q14. How closely are these two variables correlated? Describe the relationship.
Background
Topic: Correlation and Scatterplots
This question tests your ability to assess the strength and direction of the relationship between two quantitative variables.
Key Terms
Correlation Coefficient (r): Measures the strength and direction of a linear relationship.
Scatterplot: A graph of paired data points.
Step-by-Step Guidance
Plot the data pairs (duration, hair growth) on a scatterplot.
Observe the pattern: does it look linear, positive, negative, or no relationship?
Calculate the correlation coefficient (r) using the formula or a calculator.
Try solving on your own before revealing the answer!
Q15. What is the equation of the line of best fit and describe both the slope and y-intercept in the context of the problem?
Background
Topic: Linear Regression
This question tests your ability to find the least-squares regression line and interpret its components in context.
Key Formulas
Step-by-Step Guidance
Calculate the means and for duration and hair growth.
Compute the slope using the formula above.
Calculate the y-intercept .
Write the equation in the form .
Interpret the slope (change in hair growth per minute) and y-intercept (predicted hair growth at 0 minutes) in context.
Try solving on your own before revealing the answer!
Q16. Using the line of best fit, extrapolate the length of hair growth for 8 mins of medicine on the scalp.
Background
Topic: Using Regression for Prediction
This question tests your ability to use the regression equation to predict a value outside the observed data range (extrapolation).
Key Formula
Step-by-Step Guidance
Take the equation of the line of best fit from Q15.
Plug in for the duration.
Solve for to get the predicted hair growth.
Try solving on your own before revealing the answer!
Q17. Calculate the residual at duration of 4 mins.
Background
Topic: Residuals in Regression
This question tests your understanding of how to calculate and interpret residuals (the difference between observed and predicted values).
Key Formula
Step-by-Step Guidance
Find the observed hair growth for 4 minutes from the data.
Use the regression equation to calculate the predicted hair growth for 4 minutes.
Subtract the predicted value from the observed value to get the residual.
Try solving on your own before revealing the answer!
Q18. What is the coefficient of determination and what does it say about this line?
Background
Topic: Coefficient of Determination ()
This question tests your ability to interpret , which measures the proportion of variance in the dependent variable explained by the regression line.
Key Formula
Step-by-Step Guidance
Square the correlation coefficient () you found in Q14 to get .
Interpret as the percentage of variation in hair growth explained by the duration of the drug.
Try solving on your own before revealing the answer!
Q19. Determine a marginal distribution for age groups.
Background
Topic: Marginal Distributions in Two-Way Tables
This question tests your ability to summarize the totals for each age group, regardless of hair growth outcome.
Key Terms
Marginal Distribution: The totals for each category in a two-way table.
Step-by-Step Guidance
Add the "Yes" and "No" counts for each age group to get the total for each group.
Express these totals as counts or as percentages of the overall total.
Try solving on your own before revealing the answer!
Q20. Create a conditional distribution of hair growth among the different age groups.
Background
Topic: Conditional Distributions in Two-Way Tables
This question tests your ability to find the distribution of hair growth outcomes within each age group.
Key Terms
Conditional Distribution: The distribution of one variable for each value of another variable.
Step-by-Step Guidance
For each age group, calculate the percentage of "Yes" and "No" responses out of the total for that group.
Repeat for all age groups to complete the conditional distribution.
Try solving on your own before revealing the answer!
Q21. Create a bar graph on the marginal data.
Background
Topic: Bar Graphs for Categorical Data
This question tests your ability to visually represent marginal distributions using a bar graph.
Key Terms
Bar Graph: A chart with rectangular bars representing the frequency or percentage of categories.
Step-by-Step Guidance
Use the marginal totals from Q19 for each age group.
Draw bars for each age group with heights corresponding to their totals.
Try solving on your own before revealing the answer!
Q22. Create a segmented bar graph on the conditional data.
Background
Topic: Segmented (Stacked) Bar Graphs
This question tests your ability to represent conditional distributions visually.
Key Terms
Segmented Bar Graph: A bar graph where each bar is divided into segments representing subcategories (e.g., "Yes" and "No").
Step-by-Step Guidance
For each age group, draw a bar whose total height represents 100% of that group.
Divide each bar into segments proportional to the percentage of "Yes" and "No" responses (from Q20).
Try solving on your own before revealing the answer!
Q23. What is the percentage of 40-44-year-olds who also do not regrow their hair?
Background
Topic: Conditional Probability
This question tests your ability to calculate the percentage of a subgroup within a category.
Key Formula
Step-by-Step Guidance
Find the number of "No" responses in the 40-44 age group.
Find the total number of participants in the 40-44 age group.
Divide the "No" count by the total and multiply by 100% to get the percentage.
Try solving on your own before revealing the answer!
Q24. Suppose that Dr. Burner initially decided to go out and invite only the people who were in his immediate family. What type of bias might this be, and why might it be a problem?
Background
Topic: Sampling Bias
This question tests your understanding of different types of bias in sampling and why they can affect the validity of a study.
Key Terms
Sampling Bias: When the sample is not representative of the population.
Convenience Sample: Choosing individuals who are easiest to reach.
Step-by-Step Guidance
Identify the type of bias (e.g., convenience or family bias).
Explain why sampling only from immediate family may not represent the broader population of men ages 35–55.
Discuss how this could affect the study's results.
Try solving on your own before revealing the answer!
Q25. Dr. Burner ultimately decided to look up all the people in the hair clubs around the United States. How might Dr. Burner create a process to get an appropriate sample of 2000 people?
Background
Topic: Sampling Methods
This question tests your understanding of how to select a representative sample from a larger population.
Key Terms
Random Sampling: Every individual has an equal chance of being selected.
Stratified Sampling: Dividing the population into subgroups and sampling from each.
Step-by-Step Guidance
List all members of the hair clubs as your sampling frame.
Decide on a sampling method (e.g., simple random sample, stratified sample).
Use a random number generator or other unbiased method to select 2000 participants.
Ensure the sample is representative of the population in terms of age, region, etc., if necessary.