BackStep-by-Step Guidance for STAT 101 Assignments (Statistics Study Guide)
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Q1. Identify the WHO and the WHAT (categorical or quantitative) in the restaurant chain scenario.
Background
Topic: Data Collection and Types of Variables
This question tests your understanding of how to identify the subjects (WHO) and the variables (WHAT) in a dataset, as well as distinguishing between categorical and quantitative variables.
Key Terms:
WHO: The subjects or cases from which data are collected.
WHAT: The variables measured or recorded for each subject.
Categorical Variable: A variable that places an individual into one of several groups or categories.
Quantitative Variable: A variable that takes numerical values for which arithmetic operations make sense.
Step-by-Step Guidance
Identify the subjects (WHO) in the scenario. Ask yourself: Who is the data being collected from?
List each variable (WHAT) being measured or recorded for each subject.
For each variable, determine if it is categorical (describes a quality or group) or quantitative (measured numerically).
Organize your findings in a clear table or list, separating categorical and quantitative variables.
Try solving on your own before revealing the answer!
Q2. For each variable in the movie theatre scenario, state which type of graph you would use to display the data and why.
Background
Topic: Data Visualization
This question tests your ability to match variable types to appropriate graphical displays and to justify your choices.
Key Terms:
Bar Chart: Used for categorical variables.
Histogram: Used for quantitative variables.
Boxplot: Used for quantitative variables to show spread and outliers.
Pie Chart: Used for categorical variables to show proportions.
Step-by-Step Guidance
For each variable, determine if it is categorical or quantitative.
Recall which types of graphs are best suited for each variable type.
Match each variable to a graph type and briefly explain your reasoning for each choice.
Try solving on your own before revealing the answer!
Q3. Construct a histogram of the retail store spending data using bins of width 5, starting with the bin [10,15].
Background
Topic: Histograms and Frequency Distributions
This question tests your ability to organize quantitative data into bins and visually represent the distribution using a histogram.
Key Terms and Steps:
Histogram: A graphical display of data using bars of different heights to show the frequency of data in each bin.
Bin: An interval of values for grouping data.
Step-by-Step Guidance
List all the data points provided.
Determine the range of the data and decide on the bin intervals, starting at 10 and using a width of 5 (e.g., [10,15), [15,20), etc.).
Count how many data points fall into each bin.
Draw the histogram, labeling the bins on the x-axis and the frequencies on the y-axis.
Try solving on your own before revealing the answer!
Q4. Construct a relative frequency table and a pie graph for product category revenue data.
Background
Topic: Relative Frequency and Pie Charts
This question tests your ability to calculate relative frequencies and represent categorical data as proportions of a whole using a pie chart.
Key Terms and Formulas:
Relative Frequency:
Pie Chart: A circular chart divided into sectors representing proportions.
Step-by-Step Guidance
Sum the revenue for all categories to find the grand total.
For each category, divide its revenue by the grand total to get the relative frequency (as a decimal or percentage).
List the relative frequencies in a table.
Use the relative frequencies to determine the size of each sector in the pie chart (percentage of the circle).
Try solving on your own before revealing the answer!
Q5. Provide the five number summary and construct a boxplot for the monthly sales data.
Background
Topic: Descriptive Statistics – Five Number Summary and Boxplots
This question tests your ability to summarize a dataset using the minimum, first quartile (Q1), median, third quartile (Q3), and maximum, and to represent this visually with a boxplot.
Key Terms and Steps:
Five Number Summary: Minimum, Q1, Median, Q3, Maximum
Boxplot: A graphical representation of the five number summary.
Step-by-Step Guidance
Order the data from smallest to largest.
Identify the minimum and maximum values.
Find the median (middle value).
Determine Q1 (median of the lower half) and Q3 (median of the upper half).
Try solving on your own before revealing the answer!
Q6. Construct a scatter plot for production hours and product defects, and describe the direction, form, and strength.
Background
Topic: Scatter Plots and Association
This question tests your ability to visualize the relationship between two quantitative variables and describe their association.
Key Terms:
Scatter Plot: A graph showing the relationship between two quantitative variables.
Direction: Positive or negative association.
Form: Linear or nonlinear pattern.
Strength: How closely the points follow a clear form.
Step-by-Step Guidance
Plot each pair of (production hours, product defects) as a point on the graph.
Observe the overall pattern: does it look linear or curved?
Determine if the association is positive (both increase), negative (one increases, the other decreases), or neither.
Assess how tightly the points cluster around a line or curve (strength).
Try solving on your own before revealing the answer!
Q7. Calculate the mean, median, range, and standard deviation for the annual income data. Determine the skewness of the distribution.
Background
Topic: Measures of Central Tendency, Spread, and Shape
This question tests your ability to compute basic descriptive statistics and interpret the shape of a distribution.
Key Terms and Formulas:
Mean:
Median: The middle value when data are ordered.
Range:
Standard Deviation:
Step-by-Step Guidance
Order the data from smallest to largest.
Calculate the mean by summing all values and dividing by the number of data points.
Find the median (middle value or average of two middle values).
Compute the range by subtracting the smallest value from the largest.
Calculate the standard deviation using the formula above.
Compare the mean and median to assess skewness: if mean > median, likely right-skewed; if mean < median, likely left-skewed; if mean ≈ median, likely symmetric.
Try solving on your own before revealing the answer!
Q8. Compute the standard deviation for each store and determine which has more consistent sales.
Background
Topic: Standard Deviation and Consistency
This question tests your ability to calculate and compare standard deviations to assess variability in data.
Key Formula:
Step-by-Step Guidance
For each store, calculate the mean of the sales data.
Subtract the mean from each data point to find the deviations.
Square each deviation and sum them.
Divide the sum by (n-1), where n is the number of data points.
Take the square root to find the standard deviation for each store.
Compare the standard deviations: the store with the smaller standard deviation has more consistent sales.
Try solving on your own before revealing the answer!
Q9. Interpret the regression slope and intercept in the context of the music shop's advertising and revenue data.
Background
Topic: Linear Regression Interpretation
This question tests your ability to interpret the meaning of the slope and intercept in a regression equation within the context of the problem.
Key Terms:
Slope: The change in the response variable for a one-unit increase in the explanatory variable.
Intercept: The predicted value of the response variable when the explanatory variable is zero.
Step-by-Step Guidance
Identify the explanatory (x) and response (y) variables in the regression equation.
Interpret the slope: For each additional $10,000 spent on advertising, how much does the model predict revenue will change?
Interpret the intercept: What does the model predict for revenue if advertising spending is zero?
Relate both interpretations back to the context of the music shop's business.
Try solving on your own before revealing the answer!
Q10. Calculate the error for January and February and identify if each is an overestimate or underestimate.
Background
Topic: Regression Residuals
This question tests your ability to calculate residuals (errors) and interpret their meaning in the context of predictions.
Key Formula:
Step-by-Step Guidance
For each month, use the regression equation to calculate the predicted revenue based on advertising spending.
Subtract the predicted value from the actual revenue to find the error (residual).
If the error is positive, the model underestimated the actual value; if negative, it overestimated.
Try solving on your own before revealing the answer!
Q11. Predict the monthly revenue for $42,000 in advertising and discuss the reasonableness of predicting for $70,000.
Background
Topic: Regression Prediction and Extrapolation
This question tests your ability to use a regression equation for prediction and to evaluate the appropriateness of extrapolating beyond the data range.
Key Formula:
Step-by-Step Guidance
Convert $42,000 to the units used in the regression equation (i.e., $10,000 units).
Plug the value into the regression equation to calculate the predicted revenue.
Consider whether $70,000 is within the range of observed advertising spending. Discuss why predictions far outside the data range may not be reliable.
Try solving on your own before revealing the answer!
Q12. Interpret the coefficient of determination and the correlation coefficient.
Background
Topic: Regression – and Correlation
This question tests your understanding of what and the correlation coefficient tell you about the strength and direction of a linear relationship.
Key Terms and Formulas:
Coefficient of Determination (): Proportion of variance in the response variable explained by the explanatory variable.
Correlation Coefficient (): (sign matches the slope).
Step-by-Step Guidance
Interpret as the percentage of variation in revenue explained by advertising spending.
Calculate from and interpret its meaning (strength and direction of the relationship).
Try solving on your own before revealing the answer!
Q13. Probability questions based on the flexible work arrangement table (multiple sub-questions).
Background
Topic: Basic Probability and Contingency Tables
These questions test your ability to calculate probabilities from a two-way table, including marginal, joint, conditional, and compound probabilities, as well as concepts of independence.
Key Terms and Formulas:
Probability:
Conditional Probability:
Independence: if A and B are independent.
Step-by-Step Guidance
For each question, identify the relevant counts from the table (e.g., total remote, total not Finance, etc.).
Set up the probability as a fraction using the appropriate counts and the total number of employees.
For compound events (e.g., 'and', 'or', 'at least one'), use the addition or multiplication rules as appropriate.
For conditional probability, use the formula above and identify the correct numerator and denominator.
For independence, compare to .
Try solving on your own before revealing the answer!
Q14. Binomial probability distribution for number of purchases during a flash promotion (multiple sub-questions).
Background
Topic: Binomial Distribution
These questions test your understanding of the binomial setting, probability calculations, and properties of the binomial distribution.
Key Terms and Formulas:
Binomial Random Variable: = number of successes in independent trials, each with probability of success.
Binomial Probability:
Expected Value:
Variance: , where
Step-by-Step Guidance
Define the random variable in the context of the problem.
Explain why the binomial model is appropriate (fixed , independent, same ).
For each possible value of (from 0 to 7), use the binomial formula to calculate .
Sum all probabilities to check they add to 1 (allowing for rounding).
Calculate and using both the probability distribution and the formulas and .
Compare the calculated values to the formula results and note any observations.
Try solving on your own before revealing the answer!
Q15. Probability and confidence interval questions for proportions (multiple sub-questions).
Background
Topic: Sampling Distributions, Normal Approximation, and Confidence Intervals for Proportions
These questions test your ability to use the normal approximation to the binomial, calculate probabilities for sample proportions, construct and interpret confidence intervals, and determine sample sizes for a desired margin of error.
Key Terms and Formulas:
Sample Proportion:
Standard Error:
Normal Approximation: Use when and
Confidence Interval:
Margin of Error:
Sample Size for Proportion:
Step-by-Step Guidance
For probability questions, check if the normal approximation is appropriate and calculate the mean and standard error.
Convert the problem to a standard normal probability (z-score) and use the normal table to find the probability.
For confidence intervals, calculate the sample proportion, standard error, and use the appropriate value for the confidence level.
For margin of error and sample size, rearrange the margin of error formula to solve for .
For comparing two proportions, use the formula for the confidence interval for the difference in proportions.