Skip to main content
Ch. 10 - Correlation and Regression
Triola - Elementary Statistics 14th Edition
Triola14th EditionElementary StatisticsISBN: 9780137366446Not the one you use?Change textbook
Chapter 10, Problem 10.5.12

Finding the Best Model
In Exercises 5–16, construct a scatterplot and identify the mathematical model that best fits the given data. Assume that the model is to be used only for the scope of the given data, and consider only linear, quadratic, logarithmic, exponential, and power models.
Detecting Fraud Leading digits of check amounts are often analyzed for the purpose of detecting fraud. The accompanying table lists frequencies of leading digits from checks written by the author (an honest guy).
Table showing frequencies of leading digits 1 to 9 in check amounts, with digit 1 occurring most frequently at 83 times.

Verified step by step guidance
1
Step 1: Construct a scatterplot by plotting the leading digits (1 through 9) on the x-axis and their corresponding frequencies on the y-axis. This visual representation will help identify the pattern or trend in the data.
Step 2: Observe the shape of the scatterplot to determine which type of mathematical model might fit best. For example, if the points decrease rapidly and then level off, a logarithmic or power model might be appropriate; if the points form a curve opening downward or upward, a quadratic model might fit; if the points decrease steadily, a linear or exponential model might be suitable.
Step 3: Consider the context of the data. Since the data represents frequencies of leading digits, it is known from Benford's Law that the distribution often follows a logarithmic pattern, where lower digits occur more frequently than higher digits.
Step 4: To confirm the best model, calculate or fit each candidate model (linear, quadratic, logarithmic, exponential, power) to the data using regression techniques and compare their goodness-of-fit measures, such as R-squared values.
Step 5: Select the model with the best fit (highest R-squared and reasonable residuals) for the given data, keeping in mind the model should only be used within the scope of the provided data.

Verified video answer for a similar problem:

This video solution was recommended by our tutors as helpful for the problem above.
Video duration:
3m
Was this helpful?

Key Concepts

Here are the essential concepts you must grasp in order to answer the question correctly.

Scatterplot Construction

A scatterplot is a graphical representation of data points plotted on a coordinate plane, showing the relationship between two variables. It helps visualize patterns, trends, or correlations, which is essential for selecting an appropriate mathematical model.
Recommended video:
Guided course
06:36
Scatterplots & Intro to Correlation

Model Selection and Types

Choosing the best-fitting model involves comparing different mathematical functions—linear, quadratic, logarithmic, exponential, and power—to see which best describes the data trend. Each model type has distinct characteristics and fits different data patterns.
Recommended video:
Guided course
04:24
Types of Data

Frequency Distribution and Fraud Detection

Frequency distribution shows how often each leading digit appears in the data. Analyzing these frequencies can reveal patterns consistent with natural data (like Benford's Law) or anomalies that may indicate fraud, making it crucial to understand the expected distribution.
Recommended video:
Guided course
06:38
Intro to Frequency Distributions
Related Practice
Textbook Question

Notation What is the difference between the regression equation y^ = b0 + b1x and the regression equation y = β0 + β1x.

292
views
Textbook Question

Interpreting the Coefficient of Determination

In Exercises 5–8, use the value of the linear correlation coefficient r to find the coefficient of determination and the percentage of the total variation that can be explained by the linear relationship between the two variables.

Times of Taxi Rides and Fares r = 0.953 (x = time in minutes, y = fare in dollars)

189
views
Textbook Question

Dummy Variable Refer to Data Set 18 “Bear Measurements” in Appendix B and use the sex, age, and weight of the bears. For sex, let 0 represent female and let 1 represent male. Letting the response variable represent weight, use the variable of age and the dummy variable of sex to find the multiple regression equation. Use the equation to find the predicted weight of a bear with the characteristics given below. Does sex appear to have much of an effect on the weight of a bear?


Female bear that is 20 years of age

Male bear that is 20 years of age

228
views
Textbook Question

Finding the Best Model

In Exercises 5–16, construct a scatterplot and identify the mathematical model that best fits the given data. Assume that the model is to be used only for the scope of the given data, and consider only linear, quadratic, logarithmic, exponential, and power models.

Sunspot Numbers Listed below in order by row are annual sunspot numbers beginning with 1980. Is the best model a good model? Carefully examine the scatterplot and identify the pattern of the points. Which of the models fits that pattern?

[IMAGE]

176
views
Textbook Question

Interpreting a Graph The accompanying graph plots the numbers of points scored in each Super Bowl from the first Super Bowl in 1967 (coded as year 1) to the last Super Bowl at the time of this writing. The graph of the quadratic equation that best fits the data is also shown in red. What feature of the graph justifies the value of R^2 = 0.205 for the quadratic model?

153
views
Textbook Question

Sum of Squares Criterion In addition to the value of another measurement used to assess the quality of a model is the sum of squares of the residuals. Recall from Section 10-2 that a residual is (the difference between an observed y value and the value predicted from the model). Better models have smaller sums of squares. Refer to the U.S. population data in Table 10-7.

a. Find the sum of squares of the residuals resulting from the linear model.

187
views