BackSTA 2023 Test 1 Preview – Step-by-Step Statistics Study Guide
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Q1. A parameter is a numerical summary that describes a characteristic of a sample, while a statistic describes a characteristic of a population. (True/False)
Background
Topic: Parameters vs. Statistics
This question tests your understanding of the difference between a parameter (population) and a statistic (sample).
Key Terms:
Parameter: A value that describes a characteristic of a population.
Statistic: A value that describes a characteristic of a sample.
Step-by-Step Guidance
Recall that a population includes all members of a defined group, while a sample is a subset of the population.
Understand that a parameter summarizes a population, and a statistic summarizes a sample.
Compare the statement in the question to these definitions to determine if it is accurate.
Try solving on your own before revealing the answer!
Q2. In a frequency distribution, the sum of all relative frequencies must always equal exactly 1 (or 100%). (True/False)
Background
Topic: Frequency Distributions
This question checks your understanding of how relative frequencies are calculated and their properties.
Key Terms:
Relative Frequency: The proportion of the total number of data values that fall within a class.
Formula:
Step-by-Step Guidance
Recall that relative frequencies are proportions, so each is between 0 and 1.
Think about what happens when you add up the relative frequencies for all classes in a complete distribution.
Consider whether the sum should be exactly 1 (or 100%) if all data are accounted for.
Try solving on your own before revealing the answer!
Q3. Ratio-level data have all the properties of interval-level data plus a meaningful true zero point, so ratios between values are meaningful. (True/False)
Background
Topic: Levels of Measurement
This question tests your knowledge of the characteristics of ratio and interval data.
Key Terms:
Interval Level: Data with meaningful differences between values, but no true zero.
Ratio Level: Data with all interval properties plus a true zero, allowing for meaningful ratios.
Step-by-Step Guidance
Recall the definitions of interval and ratio levels of measurement.
Think about what a "true zero" means and why it allows for ratios to be meaningful.
Compare the statement to these definitions to determine its accuracy.
Try solving on your own before revealing the answer!
Q4. In a right-skewed (positively skewed) distribution, the mean is typically less than the median. (True/False)
Background
Topic: Measures of Center and Skewness
This question tests your understanding of how the mean and median relate in skewed distributions.
Key Terms:
Right-skewed (positively skewed): The tail on the right side of the distribution is longer or fatter than the left.
Mean vs. Median: In skewed distributions, the mean is pulled in the direction of the skew.
Step-by-Step Guidance
Recall how the mean and median are affected by skewness.
For a right-skewed distribution, consider whether the mean is greater than or less than the median.
Compare this to the statement in the question.
Try solving on your own before revealing the answer!
Q5. According to the Empirical Rule (for a bell-shaped distribution), approximately 95% of data values fall within two standard deviations of the mean. (True/False)
Background
Topic: Empirical Rule (68-95-99.7 Rule)
This question tests your knowledge of the Empirical Rule for normal distributions.
Key Terms and Formula:
Empirical Rule: For a normal (bell-shaped) distribution:
About 68% within 1 standard deviation
About 95% within 2 standard deviations
About 99.7% within 3 standard deviations
Step-by-Step Guidance
Recall the percentages associated with 1, 2, and 3 standard deviations in the Empirical Rule.
Compare the statement to the rule to determine if it is correct.
Try solving on your own before revealing the answer!
Q6. Cluster sampling is a probability-sampling method in which the population is divided into non-overlapping groups called clusters, and a random selection of entire clusters is chosen for the sample. (True/False)
Background
Topic: Sampling Methods
This question tests your understanding of cluster sampling and how it differs from other sampling methods.
Key Terms:
Cluster Sampling: The population is divided into clusters, some clusters are randomly selected, and all members of selected clusters are included in the sample.
Step-by-Step Guidance
Recall the definition and process of cluster sampling.
Compare the statement to the definition to determine if it is accurate.
Try solving on your own before revealing the answer!
Q7. A z-score of −2.5 indicates that a data value lies 2.5 standard deviations below the mean, and such a value would typically be considered unusual. (True/False)
Background
Topic: z-Scores and Unusual Values
This question tests your understanding of z-scores and what is considered an unusual value in statistics.
Key Terms and Formula:
z-score:
Values with are often considered unusual.
Step-by-Step Guidance
Recall what a z-score represents in terms of standard deviations from the mean.
Consider the typical cutoff for an "unusual" value (usually ).
Compare the statement to these facts.
Try solving on your own before revealing the answer!
Q8. Which of the following is an example of ordinal-level data?
Background
Topic: Levels of Measurement
This question tests your ability to distinguish between nominal, ordinal, interval, and ratio data.
Key Terms:
Ordinal Data: Data that can be ordered or ranked, but differences between values are not meaningful.
Step-by-Step Guidance
Review each answer choice and determine if the data can be ordered or ranked.
Check if the differences between values are meaningful or not.
Identify which option best fits the definition of ordinal data.
Try solving on your own before revealing the answer!
Q9. A researcher randomly selects 4 of the 20 school districts in a county and then surveys every teacher in each of those 4 districts. What sampling method is being used?
Background
Topic: Sampling Methods
This question tests your understanding of different probability sampling methods.
Key Terms:
Cluster Sampling: Randomly select entire groups (clusters) and include all members of those groups.
Stratified Sampling: Divide the population into strata and randomly sample from each stratum.
Systematic Sampling: Select every k-th member from a list.
Simple Random Sampling: Every member has an equal chance of being selected.
Step-by-Step Guidance
Identify how the sample is being selected (entire groups or individuals).
Compare the process described to the definitions of each sampling method.
Determine which method matches the scenario.
Try solving on your own before revealing the answer!
Q10. The following data set represents the number of text messages sent by 10 students in one day: 3, 5, 7, 8, 10, 12, 14, 15, 18, 28. Which measure of center best describes the typical value, given the presence of the potential outlier 28?
Background
Topic: Measures of Center and Outliers
This question tests your understanding of how outliers affect measures of center (mean, median, mode, midrange).
Key Terms:
Mean: Sensitive to outliers.
Median: Resistant to outliers.
Mode: Most frequent value.
Midrange: Average of the minimum and maximum values.
Step-by-Step Guidance
Identify the potential outlier in the data set.
Recall which measure of center is most affected by outliers.
Determine which measure would best represent the "typical" value in this case.
Try solving on your own before revealing the answer!
Q11. For a data set with mean and standard deviation , a data value of has a z-score of:
Background
Topic: z-Scores
This question tests your ability to calculate a z-score for a given data value.
Key Formula:
Step-by-Step Guidance
Identify the values: , , .
Substitute these values into the z-score formula.
Simplify the numerator and denominator before dividing.
Try solving on your own before revealing the answer!
Q12. A frequency distribution of exam scores is shown below. What is the relative frequency for the 70–79 class?
Score | Frequency |
|---|---|
50–59 | 3 |
60–69 | 8 |
70–79 | 12 |
80–89 | 5 |
90–99 | 2 |
Background
Topic: Relative Frequency
This question tests your ability to calculate the relative frequency for a class in a frequency distribution.
Key Formula:
Step-by-Step Guidance
Add up all the frequencies to find the total number of observations.
Identify the frequency for the 70–79 class.
Divide the frequency for the 70–79 class by the total frequency.
Try solving on your own before revealing the answer!
Q13. The ages (in years) of seven employees are: 22, 25, 30, 35, 42, 50, 60. Which statement is correct?
Background
Topic: Measures of Center
This question tests your understanding of mean, median, and mode, and how they relate in skewed data.
Key Terms:
Mean: Arithmetic average.
Median: Middle value when data are ordered.
Mode: Most frequent value.
Step-by-Step Guidance
Order the data (already done).
Find the median (middle value).
Calculate the mean by adding all values and dividing by 7.
Compare the mean and median to see which is greater.
Try solving on your own before revealing the answer!
Q14. According to Chebyshev’s Theorem, at least what percentage of data values must lie within 3 standard deviations of the mean for any distribution?
Background
Topic: Chebyshev’s Theorem
This question tests your knowledge of Chebyshev’s Theorem, which applies to all distributions (not just normal).
Key Formula:
For any , at least of the data values lie within standard deviations of the mean.
Step-by-Step Guidance
Set for 3 standard deviations.
Plug into the formula: .
Calculate the result as a percentage.
Try solving on your own before revealing the answer!
Q15. Levels of Measurement and Data Types. For each item, state (i) whether the data are qualitative or quantitative, and (ii) the level of measurement (nominal, ordinal, interval, or ratio).
(a) Number of students absent each day
(b) Jersey numbers of football players
(c) Letter grades (A, B, C, D, F)
(d) Body temperature (°F)
(e) Annual household income ($)
Background
Topic: Data Types and Levels of Measurement
This question tests your ability to classify data as qualitative or quantitative and to identify the correct level of measurement.
Key Terms:
Qualitative: Descriptive, non-numeric data.
Quantitative: Numeric data.
Nominal: Categories with no order.
Ordinal: Categories with a meaningful order.
Interval: Numeric, no true zero, differences are meaningful.
Ratio: Numeric, true zero, ratios are meaningful.
Step-by-Step Guidance
For each variable, decide if it is qualitative or quantitative.
Determine the level of measurement based on the definitions above.
Fill in the table for each variable.
Try solving on your own before revealing the answer!
Q16. Five-Number Summary and Interquartile Range. The following data represent the weekly study hours of 12 college students (already sorted): 2, 4, 5, 7, 8, 10, 12, 14, 15, 18, 20, 25
Background
Topic: Five-Number Summary and Outliers
This question tests your ability to find the five-number summary, calculate the IQR, and identify outliers using the 1.5 × IQR rule.
Key Terms and Formulas:
Five-number summary: Min, Q1, Median (Q2), Q3, Max
IQR:
Outlier rule: Lower fence = , Upper fence =
Step-by-Step Guidance
Identify the minimum and maximum values.
Find the median (Q2) by locating the middle value(s).
Find Q1 (median of the lower half) and Q3 (median of the upper half).
Calculate the IQR using .
Compute the lower and upper fences using the 1.5 × IQR rule.
Check for any values outside these fences to identify outliers.
Try solving on your own before revealing the answer!
Q17. Frequency, Relative Frequency, and Cumulative Frequency. The table below shows the number of hours per week that 30 PBSC students spend on social media.
Hours | Frequency | Relative Frequency | Cumulative Frequency |
|---|---|---|---|
0–4 | 6 | ||
5–9 | 10 | ||
10–14 | 8 | ||
15–19 | 4 | ||
20–24 | 2 | ||
Total | 30 |
Background
Topic: Frequency Distributions
This question tests your ability to calculate relative and cumulative frequencies, and interpret percentages from a frequency table.
Key Formulas:
Relative Frequency:
Cumulative Frequency: Sum of frequencies up to and including the current class.
Step-by-Step Guidance
For each class, divide the frequency by the total to get the relative frequency.
For cumulative frequency, add each class's frequency to the sum of all previous frequencies.
To find the percentage of students spending fewer than 10 hours, add the frequencies for the relevant classes and divide by the total.
To find the percentage spending 15 or more hours, add the frequencies for those classes and divide by the total.
Try solving on your own before revealing the answer!
Q18. Descriptive Statistics — Ungrouped Data. Twenty students reported the number of hours they slept last night: 4, 5, 5, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 11
Background
Topic: Descriptive Statistics
This question tests your ability to compute mean, median, mode, range, standard deviation, quartiles, IQR, and identify outliers for ungrouped data.
Key Formulas:
Mean:
Median: Middle value (or average of two middle values)
Mode: Most frequent value
Range:
Sample Standard Deviation:
IQR:
Outlier rule: Lower fence = , Upper fence =
Step-by-Step Guidance
Order the data (already done).
Calculate the mean by summing all values and dividing by 20.
Find the median (average of the 10th and 11th values).
Identify the mode (most frequent value).
Compute the range (max - min).
Calculate the sample standard deviation using the formula above.
Find Q1 and Q3 (medians of lower and upper halves), then compute the IQR.
Apply the 1.5 × IQR rule to check for outliers.
Try solving on your own before revealing the answer!
Q19. Descriptive Statistics — Grouped Data. The frequency distribution below summarizes the body temperatures (°F) of 100 healthy adults.
Temperature (°F) | Frequency |
|---|---|
96.5–97.0 | 4 |
97.0–97.5 | 15 |
97.5–98.0 | 28 |
98.0–98.5 | 34 |
98.5–99.0 | 14 |
99.0–99.5 | 5 |
Background
Topic: Descriptive Statistics for Grouped Data
This question tests your ability to estimate the mean and standard deviation from grouped data, and apply the Empirical Rule.
Key Formulas:
Class width: Difference between lower limits of consecutive classes.
Midpoint:
Estimated mean: , where is frequency, is midpoint, is total frequency.
Estimated standard deviation:
Empirical Rule: 95% of values lie within
Step-by-Step Guidance
Find the class width by subtracting the lower limit of the first class from the lower limit of the second class.
Calculate the midpoint for each class.
Multiply each midpoint by its class frequency, sum these products, and divide by the total frequency to estimate the mean.
Use the estimated mean and midpoints to calculate the estimated standard deviation.
Apply the Empirical Rule to estimate the range containing approximately 95% of values.
Try solving on your own before revealing the answer!
Q20. z-Scores, Percentiles, and Empirical Rule. A statistics class has a mean test score of and a standard deviation of . The distribution of scores is approximately bell-shaped.
Background
Topic: z-Scores, Percentiles, and Empirical Rule
This question tests your ability to calculate z-scores, apply the Empirical Rule, and interpret percentiles.
Key Formulas:
z-score:
Empirical Rule: 68% within 1 SD, 95% within 2 SD, 99.7% within 3 SD
Percentile rank:
Step-by-Step Guidance
For part (a), substitute the student's score, mean, and standard deviation into the z-score formula.
Interpret the z-score to determine if the value is unusual (typically, is unusual).
For part (b), use the Empirical Rule to fill in the intervals for 68%, 95%, and 99.7% of scores.
For part (c), count the number of scores below 82 in the sorted list, divide by the total, and multiply by 100 to get the percentile rank.
For part (d), consider the meaning of percentile rank versus percentage correct and explain the difference.