Skip to main content
Back

Step-by-Step Guidance for Statistics Review: Dr. Burner's Baldness Study

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Q1. Create a frequency chart based on the above data. The first group should be 4.2 – 4.6.

Background

Topic: Frequency Distributions

This question tests your ability to organize raw data into a frequency distribution table, which is a foundational skill in descriptive statistics.

Key Terms and Formulas

  • Frequency: The number of data points that fall within a specified interval (class).

  • Class Interval: A range of values used to group data (e.g., 4.2–4.6).

  • Lower Class Limit: The smallest value in a class interval.

  • Upper Class Limit: The largest value in a class interval.

Step-by-Step Guidance

  1. List all the data points in order from smallest to largest to help with grouping.

  2. Determine the number of classes (intervals) you want. A common rule is to use between 5 and 8 classes for 30 data points.

  3. Calculate the class width:

  4. Starting with 4.2–4.6 as the first interval, create subsequent intervals by adding the class width to the lower limit of each previous class.

  5. Count how many data points fall into each interval and record these frequencies in a table.

Try solving on your own before revealing the answer!

Q2. Create a histogram for the above frequency chart.

Background

Topic: Data Visualization – Histograms

This question assesses your ability to represent frequency data graphically using a histogram, which helps visualize the distribution of the data.

Key Terms and Formulas

  • Histogram: A bar graph representing the frequency of data within each class interval.

  • X-axis: Represents the class intervals.

  • Y-axis: Represents the frequency for each interval.

Step-by-Step Guidance

  1. Use the frequency chart you created in Q1 as the basis for your histogram.

  2. Draw the x-axis and label it with your class intervals (e.g., 4.2–4.6, 4.7–5.1, etc.).

  3. Draw the y-axis and label it with frequencies (the counts from your table).

  4. For each class interval, draw a bar whose height corresponds to the frequency for that interval.

Try solving on your own before revealing the answer!

Q3. What is the shape of the data?

Background

Topic: Distribution Shapes

This question asks you to describe the overall pattern of the data, such as whether it is symmetric, skewed left, or skewed right.

Key Terms

  • Symmetric: Data is evenly distributed around the center.

  • Skewed Right (Positive Skew): Tail is longer on the right.

  • Skewed Left (Negative Skew): Tail is longer on the left.

Step-by-Step Guidance

  1. Look at your histogram from Q2 and observe the distribution of the bars.

  2. Determine if the data is roughly symmetric or if one tail is longer than the other.

  3. Consider the mean and median (from later questions) to help confirm the shape.

Try solving on your own before revealing the answer!

Q4. What is the average length of the forehead?

Background

Topic: Measures of Central Tendency – Mean

This question tests your ability to calculate the mean (average) of a data set.

Key Formula

  • Where are the data values and is the number of data points.

Step-by-Step Guidance

  1. Add up all the forehead length measurements: .

  2. Count the total number of data points (), which is 30 in this case.

  3. Divide the sum by the number of data points: .

Try solving on your own before revealing the answer!

Q5. What is the median of the forehead?

Background

Topic: Measures of Central Tendency – Median

This question tests your ability to find the median, or middle value, of a data set.

Key Terms and Steps

  • Median: The middle value when data is ordered from least to greatest.

Step-by-Step Guidance

  1. Order all 30 data points from smallest to largest.

  2. Since there are 30 data points (an even number), the median is the average of the 15th and 16th values in the ordered list.

  3. Identify the 15th and 16th values and calculate their average.

Try solving on your own before revealing the answer!

Q6. Based on your mean and median, what is the shape of the data?

Background

Topic: Comparing Mean and Median to Assess Skewness

This question asks you to use the relationship between the mean and median to infer the shape of the distribution.

Key Concepts

  • If mean > median, the data is likely skewed right.

  • If mean < median, the data is likely skewed left.

  • If mean ≈ median, the data is likely symmetric.

Step-by-Step Guidance

  1. Compare the mean and median values you calculated in Q4 and Q5.

  2. Use the above rules to determine the likely shape of the data.

Try solving on your own before revealing the answer!

Q7. Determine the five-number summary for the data.

Background

Topic: Five-Number Summary

This question tests your ability to find the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum of a data set.

Key Terms

  • Minimum: Smallest value

  • Q1: 25th percentile

  • Median (Q2): 50th percentile

  • Q3: 75th percentile

  • Maximum: Largest value

Step-by-Step Guidance

  1. Order the data from smallest to largest.

  2. Identify the minimum and maximum values.

  3. Find the median (Q2) as in Q5.

  4. Find Q1 (the median of the lower half, not including Q2 if n is even).

  5. Find Q3 (the median of the upper half, not including Q2 if n is even).

Try solving on your own before revealing the answer!

Q8. Are there any outliers for the data? (Show the work)

Background

Topic: Identifying Outliers Using the IQR Method

This question tests your ability to use the interquartile range (IQR) to determine if any data points are outliers.

Key Formulas

  • Lower Fence:

  • Upper Fence:

Step-by-Step Guidance

  1. Calculate the IQR using your Q1 and Q3 from Q7.

  2. Compute the lower and upper fences using the formulas above.

  3. Check if any data points fall below the lower fence or above the upper fence.

Try solving on your own before revealing the answer!

Q9. Create a box plot for the data.

Background

Topic: Box Plots (Box-and-Whisker Plots)

This question tests your ability to visually summarize the five-number summary and identify potential outliers.

Key Terms

  • Box Plot: A graphical representation of the five-number summary.

Step-by-Step Guidance

  1. Draw a number line that includes the range of your data.

  2. Mark the minimum, Q1, median, Q3, and maximum on the number line.

  3. Draw a box from Q1 to Q3, with a line at the median.

  4. Draw whiskers from the box to the minimum and maximum (unless there are outliers, in which case whiskers stop at the last non-outlier).

Try solving on your own before revealing the answer!

Q10. The percentage of men between the ages of 34 and 45 who start having hair loss

Background

Topic: Normal Distribution and Z-scores

This question tests your ability to use the normal distribution to find probabilities (percentages) for a given range.

Key Formulas

  • Where is the value, is the mean, and is the standard deviation.

Step-by-Step Guidance

  1. Calculate the z-score for 34:

  2. Calculate the z-score for 45:

  3. Use the standard normal table to find the probabilities corresponding to each z-score.

  4. Subtract the lower probability from the higher to find the percentage between 34 and 45.

Try solving on your own before revealing the answer!

Q11. The percentage of men greater than 37 years of age who have started having hair loss

Background

Topic: Normal Distribution – Right Tail Probability

This question tests your ability to find the probability that a value is greater than a certain point using the normal distribution.

Key Formulas

Step-by-Step Guidance

  1. Calculate the z-score for 37.

  2. Use the standard normal table to find the probability to the left of this z-score.

  3. Subtract this probability from 1 to get the probability to the right (greater than 37).

Try solving on your own before revealing the answer!

Q12. What is the percentage of men less than 40 years of age who have hair loss?

Background

Topic: Normal Distribution – Left Tail Probability

This question tests your ability to find the probability that a value is less than a certain point using the normal distribution.

Key Formulas

Step-by-Step Guidance

  1. Calculate the z-score for 40.

  2. Use the standard normal table to find the probability to the left of this z-score.

  3. Convert this probability to a percentage.

Try solving on your own before revealing the answer!

Q13. At what age is a man at the 90th percentile for hair loss?

Background

Topic: Finding a Value from a Percentile in a Normal Distribution

This question tests your ability to use the normal distribution to find the value (age) corresponding to a given percentile.

Key Formulas

  • Find the z-score that corresponds to the 90th percentile (use a z-table).

Step-by-Step Guidance

  1. Look up the z-score for the 90th percentile (commonly about 1.28).

  2. Plug the z-score, mean, and standard deviation into the formula: .

Try solving on your own before revealing the answer!

Q14. How closely are these two variables correlated? Describe the relationship.

Background

Topic: Correlation and Scatterplots

This question tests your ability to assess the strength and direction of the relationship between two quantitative variables.

Key Terms

  • Correlation Coefficient (r): Measures the strength and direction of a linear relationship.

  • Scatterplot: A graph of paired data points.

Step-by-Step Guidance

  1. Plot the data pairs (duration, hair growth) on a scatterplot.

  2. Observe the pattern: does it look linear, positive, negative, or no relationship?

  3. Calculate the correlation coefficient (r) using the formula or a calculator.

Try solving on your own before revealing the answer!

Q15. What is the equation of the line of best fit and describe both the slope and y-intercept in the context of the problem?

Background

Topic: Linear Regression

This question tests your ability to find the least-squares regression line and interpret its components in context.

Key Formulas

Step-by-Step Guidance

  1. Calculate the means and for duration and hair growth.

  2. Compute the slope using the formula above.

  3. Calculate the y-intercept .

  4. Write the equation in the form .

  5. Interpret the slope (change in hair growth per minute) and y-intercept (predicted hair growth at 0 minutes) in context.

Try solving on your own before revealing the answer!

Q16. Using the line of best fit, extrapolate the length of hair growth for 8 mins of medicine on the scalp.

Background

Topic: Using Regression for Prediction

This question tests your ability to use the regression equation to predict a value outside the observed data range (extrapolation).

Key Formula

Step-by-Step Guidance

  1. Take the equation of the line of best fit from Q15.

  2. Plug in for the duration.

  3. Solve for to get the predicted hair growth.

Try solving on your own before revealing the answer!

Q17. Calculate the residual at duration of 4 mins.

Background

Topic: Residuals in Regression

This question tests your understanding of how to calculate and interpret residuals (the difference between observed and predicted values).

Key Formula

Step-by-Step Guidance

  1. Find the observed hair growth for 4 minutes from the data.

  2. Use the regression equation to calculate the predicted hair growth for 4 minutes.

  3. Subtract the predicted value from the observed value to get the residual.

Try solving on your own before revealing the answer!

Q18. What is the coefficient of determination and what does it say about this line?

Background

Topic: Coefficient of Determination ()

This question tests your ability to interpret , which measures the proportion of variance in the dependent variable explained by the regression line.

Key Formula

Step-by-Step Guidance

  1. Square the correlation coefficient () you found in Q14 to get .

  2. Interpret as the percentage of variation in hair growth explained by the duration of the drug.

Try solving on your own before revealing the answer!

Q19. Determine a marginal distribution for age groups.

Background

Topic: Marginal Distributions in Two-Way Tables

This question tests your ability to summarize the totals for each age group, regardless of hair growth outcome.

Key Terms

  • Marginal Distribution: The totals for each category in a two-way table.

Step-by-Step Guidance

  1. Add the "Yes" and "No" counts for each age group to get the total for each group.

  2. Express these totals as counts or as percentages of the overall total.

Try solving on your own before revealing the answer!

Q20. Create a conditional distribution of hair growth among the different age groups.

Background

Topic: Conditional Distributions in Two-Way Tables

This question tests your ability to find the distribution of hair growth outcomes within each age group.

Key Terms

  • Conditional Distribution: The distribution of one variable for each value of another variable.

Step-by-Step Guidance

  1. For each age group, calculate the percentage of "Yes" and "No" responses out of the total for that group.

  2. Repeat for all age groups to complete the conditional distribution.

Try solving on your own before revealing the answer!

Q21. Create a bar graph on the marginal data.

Background

Topic: Bar Graphs for Categorical Data

This question tests your ability to visually represent marginal distributions using a bar graph.

Key Terms

  • Bar Graph: A chart with rectangular bars representing the frequency or percentage of categories.

Step-by-Step Guidance

  1. Use the marginal totals from Q19 for each age group.

  2. Draw bars for each age group with heights corresponding to their totals.

Try solving on your own before revealing the answer!

Q22. Create a segmented bar graph on the conditional data.

Background

Topic: Segmented (Stacked) Bar Graphs

This question tests your ability to represent conditional distributions visually.

Key Terms

  • Segmented Bar Graph: A bar graph where each bar is divided into segments representing subcategories (e.g., "Yes" and "No").

Step-by-Step Guidance

  1. For each age group, draw a bar whose total height represents 100% of that group.

  2. Divide each bar into segments proportional to the percentage of "Yes" and "No" responses (from Q20).

Try solving on your own before revealing the answer!

Q23. What is the percentage of 40-44-year-olds who also do not regrow their hair?

Background

Topic: Conditional Probability

This question tests your ability to calculate the percentage of a subgroup within a category.

Key Formula

Step-by-Step Guidance

  1. Find the number of "No" responses in the 40-44 age group.

  2. Find the total number of participants in the 40-44 age group.

  3. Divide the "No" count by the total and multiply by 100% to get the percentage.

Try solving on your own before revealing the answer!

Q24. Suppose that Dr. Burner initially decided to go out and invite only the people who were in his immediate family. What type of bias might this be, and why might it be a problem?

Background

Topic: Sampling Bias

This question tests your understanding of different types of bias in sampling and why they can affect the validity of a study.

Key Terms

  • Sampling Bias: When the sample is not representative of the population.

  • Convenience Sample: Choosing individuals who are easiest to reach.

Step-by-Step Guidance

  1. Identify the type of bias (e.g., convenience or family bias).

  2. Explain why sampling only from immediate family may not represent the broader population of men ages 35–55.

  3. Discuss how this could affect the study's results.

Try solving on your own before revealing the answer!

Q25. Dr. Burner ultimately decided to look up all the people in the hair clubs around the United States. How might Dr. Burner create a process to get an appropriate sample of 2000 people?

Background

Topic: Sampling Methods

This question tests your understanding of how to select a representative sample from a larger population.

Key Terms

  • Random Sampling: Every individual has an equal chance of being selected.

  • Stratified Sampling: Dividing the population into subgroups and sampling from each.

Step-by-Step Guidance

  1. List all members of the hair clubs as your sampling frame.

  2. Decide on a sampling method (e.g., simple random sample, stratified sample).

  3. Use a random number generator or other unbiased method to select 2000 participants.

  4. Ensure the sample is representative of the population in terms of age, region, etc., if necessary.

Try solving on your own before revealing the answer!

Pearson Logo

Study Prep