
Stats: Data and Models, 6th edition
- Richard D. De Veaux |
- Paul F. Velleman |
- David E. Bock |
Title overview
For courses in Introductory Statistics.
Encourage statistical thinking with innovative tools and technology
Stats: Data and Models helps students think critically about data while maintaining core concepts and unparalleled readability. Technology and simulations demonstrate variability at critical points throughout, guiding students to understand more complicated statistical concepts they’ll encounter later in the course. Exposure to large data sets and multivariate thinking prepares them to become critical data consumers.
The 6th Edition is aligned with the new 2026 GAISE adopted by the American Statistical Association. It increases use of the authors' signature tools for teaching about randomness, sampling distribution models, and inference. Each chapter now includes current discussions of ethical issues and concludes with student projects suitable for collaborative work; discussion of models for data is expanded; and much more.
Hallmark features of this title
- Where Are We Going? chapter openers give context for the work students are about to begin within the broader course.
- Reality Checks ask students to reconsider their answers before interpreting their results.
- Notation Alerts appear whenever special notation is introduced.
- The Tech Support section gives instructions for applying the topics covered by the chapter within each of the supported statistics packages.
- Focused examples accompany each important concept, usually applying the concept with real, up-to-the-minute data.
- Just Checking questions throughout the chapter involve minimal calculation, and encourage students to pause and think about what they've just read.
New and updated features of this title
- Increased discussion of ethics reflects the GAISE recommendation to integrate ethics discussions into the course.
- New Ethics Matters features in each chapter reflect current events and issues familiar to students.
- New Student Projects at the end of each chapter form the basis for more extensive investigations by students working on their own; they can also support team efforts.
- Revised Random Matters boxes, rewritten to provide step-by-step guidance, lead students through bootstrap calculations and comparing bootstrap results to classical inference.
Key features
Features of MyLab Statistics for the 6th Edition
- New videos provide conceptual support, worked exercises, and tech help with stats software.
- New Student Projects at the end of each chapter are pre-built in MediaShare for group or individual project work, in the classroom or at home.
- Expanded comprehensive auto-graded exercise options: Exercises have been carefully reviewed, vetted, and improved using aggregated student usage and performance data over time.
- Expanded special applications demonstrate properties of randomness, illustrate the concept of a sampling distribution, and offer bootstrap methods for inference.
- Additional and revised exercises with immediate feedback offer helpful insights when students enter incorrect answers, and are updated for new data and clarity. They regenerate algorithmically to give students unlimited practice; most include learning aids such as guided solutions and sample problems.
- StatCrunch®: integrated directly into MyLab Statistics, this powerful web-based statistical software allows users to perform complex analyses, share data sets, and generate compelling reports of their data. The vibrant online community offers tens of thousands of shared data sets for students to analyze.
- Interactive applets simulate statistical analysis, allowing students to work with data at scale; instructors can use these to augment topics like bootstrapping. Connected MyLab homework problems offer the option to assign assessment using these simulations.
Table of contents
I: EXPLORING AND UNDERSTANDING DATA
- 1. Stats Starts Here
- 1.1 What Is Statistics?
- 1.2 Data
- 1.3 Variables
- 1.4 Models
- 2. Displaying and Describing Data
- 2.1 Summarizing and Displaying a Categorical Variable
- 2.2 Displaying a Quantitative Variable
- 2.3 Shape
- 2.4 Center
- 2.5 Spread
- 3. Relationships Between Categorical Variables–Contingency Tables
- 3.1 Contingency Tables
- 3.2 Conditional Distributions
- 3.3 Displaying Contingency Tables
- 3.4 Three Categorical Variables
- 4. Understanding and Comparing Distributions
- 4.1 Displays for Comparing Groups
- 4.2 Outliers
- 4.3 Re-Expressing Data: A First Look
- 5. The Standard Deviation as a Ruler and the Normal Model
- 5.1 Using the Standard Deviation to Standardize Values
- 5.2 Shifting and Scaling
- 5.3 Normal Models
- 5.4 Working with Normal Percentiles
- 5.5 Normal Probability Plots
- Review of Part I: Exploring and Understanding Data
II. EXPLORING RELATIONSHIPS BETWEEN VARIABLES
- 6. Scatterplots, Association, and Correlation
- 6.1 Scatterplots
- 6.2 Correlation
- 6.3 Warning: Correlation ≠ Causation
- 6.4 Straightening Scatterplots
- 7. Linear Regression
- 7.1 Least Squares: The Line of “Best Fit”
- 7.2 The Linear Model
- 7.3 Finding the Least Squares Line
- 7.4 Regression to the Mean
- 7.5 Examining the Residuals
- 7.6 R2: The Variation Accounted for by the Model
- 7.7 Regression Assumptions and Conditions
- 8. Regression Wisdom
- 8.1 Examining Residuals
- 8.2 Extrapolation: Reaching Beyond the Data
- 8.3 Outliers, Leverage, and Influence
- 8.4 Lurking Variables and Causation
- 8.5 Working with Summary Values
- 8.6 Straightening Scatterplots: The Three Goals
- 8.7 Finding a Good Re-Expression
- 9. Multiple Regression
- 9.1 What Is Multiple Regression?
- 9.2 Interpreting Multiple Regression Coefficients
- 9.3 The Multiple Regression Model: Assumptions and Conditions
- 9.4 Partial Regression Plots
- 9.5 Indicator Variables
- Review of Part II: Exploring Relationships Between Variables
III. GATHERING DATA
- 10. Sample Surveys
- 10.1 The Three Big Ideas of Sampling
- 10.2 Populations and Parameters
- 10.3 Simple Random Samples
- 10.4 Other Sampling Designs
- 10.5 From the Population to the Sample: You Can't Always Get What You Want
- 10.6 The Valid Survey
- 10.7 Common Sampling Mistakes, or How to Sample Badly
- 11. Experiments and Observational Studies
- 11.1 Observational Studies
- 11.2 Randomized, Comparative Experiments
- 11.3 The Four Principles of Experimental Design
- 11.4 Control Groups
- 11.5 Blocking
- 11.6 Confounding
- Review of Part III: Gathering Data
IV. RANDOMNESS AND PROBABILITY
- 12. From Randomness to Probability
- 12.1 Random Phenomena
- 12.2 Modeling Probability
- 12.3 Formal Probability
- 13. Probability Rules!
- 13.1 The General Addition Rule
- 13.2 Conditional Probability and the General Multiplication Rule
- 13.3 Independence
- 13.4 Picturing Probability: Tables, Venn Diagrams, and Trees
- 13.5 Reversing the Conditioning and Bayes' Rule
- 14. Random Variables
- 14.1 Center: The Expected Value
- 14.2 Spread: The Standard Deviation
- 14.3 Shifting and Combining Random Variables
- 14.4 Continuous Random Variables
- 15. Probability Models
- 15.1 Bernoulli Trials
- 15.2 The Geometric Model
- 15.3 The Binomial Model
- 15.4 Approximating the Binomial with a Normal Model
- 15.5 The Continuity Correction
- 15.6 The Poisson Model
- 15.7 Other Continuous Random Variables: The Uniform and the Exponential
- Review of Part IV: Randomness and Probability
V. INFERENCE FOR ONE PARAMETER
- 16. Sampling Distribution Models and Confidence Intervals for Proportions
- 16.1 The Sampling Distribution Model for a Proportion
- 16.2 When Does the Normal Model Work? Assumptions and Conditions
- 16.3 A Confidence Interval for a Proportion
- 16.4 Interpreting Confidence Intervals: What Does 95% Confidence Really Mean?
- 16.5 Margin of Error: Certainty vs. Precision
- 16.6 Choosing the Sample Size
- 17. Confidence Intervals for Means
- 17.1 The Central Limit Theorem
- 17.2 A Confidence Interval for the Mean
- 17.3 Interpreting Confidence Intervals
- 17.4 Picking Our Interval up by Our Bootstraps
- 17.5 Thoughts About Confidence Intervals
- 18. Testing Hypotheses
- 18.1 Hypotheses
- 18.2 P-Values
- 18.3 The Reasoning of Hypothesis Testing
- 18.4 A Hypothesis Test for the Mean
- 18.5 Intervals and Tests
- 18.6 P-Values and Decisions: What to Tell About a Hypothesis Test
- 19. More About Tests and Intervals
- 19.1 Interpreting P-Values
- 19.2 Alpha Levels and Critical Values
- 19.3 Practical vs. Statistical Significance
- 19.4 Errors
- Review of Part V: Inference for One Parameter
VI. INFERENCE FOR RELATIONSHIPS
- 20. Comparing Groups
- 20.1 A Confidence Interval for the Difference Between Two Proportions
- 20.2 Assumptions and Conditions for Comparing Proportions
- 20.3 The Two-Sample z-Test: Testing for the Difference Between Proportions
- 20.4 A Confidence Interval for the Difference Between Two Means
- 20.5 The Two-Sample t-Test: Testing for the Difference Between Two Means
- 20.6 Randomization Tests and Confidence Intervals for Two Means
- 20.7 Pooling
- 20.8 The Standard Deviation of a Difference
- 21. Paired Samples and Blocks
- 21.1 Paired Data
- 21.2 The Paired t-Test
- 21.3 Confidence Intervals for Matched Pairs
- 21.4 Blocking
- 22. Comparing Counts
- 22.1 Goodness-of-Fit Tests
- 22.2 Chi-Square Test of Homogeneity
- 22.3 Examining the Residuals
- 22.4 Chi-Square Test of Independence
- 23. Inferences for Regression
- 23.1 The Regression Model
- 23.2 Assumptions and Conditions
- 23.3 Regression Inference and Intuition
- 23.4 The Regression Table
- 23.5 Multiple Regression Inference
- 23.6 Confidence and Prediction Intervals
- 23.7 Logistic Regression
- 23.8 More About Regression
- Review of Part VI: Inference for Relationships
VII. INFERENCE WHEN VARIABLES ARE RELATED
- 24. Multiple Regression Wisdom
- 24.1 Multiple Regression Inference
- 24.2 Comparing Multiple Regression Model
- 24.3 Indicators
- 24.4 Diagnosing Regression Models: Looking at the Cases
- 24.5 Building Multiple Regression Models
- 25. Analysis of Variance
- 25.1 Testing Whether the Means of Several Groups Are Equal
- 25.2 The ANOVA Table
- 25.3 Assumptions and Conditions
- 25.4 Comparing Means
- 25.5 ANOVA on Observational Data
- 26. Multifactor Analysis of Variance
- 26.1 A Two Factor ANOVA Model
- 26.2 Assumptions and Conditions
- 26.3 Interactions
- 27. Statistics and Data Science
- 27.1 Introduction to Data Mining
- 27.2 The Data Science Workflow
- 27.3 Statistical and Machine Learning Algorithms: A Sample
- 27.4 Models Built from Combining Other Models
- 27.5 Comparing Models
- 27.6 Summary
- Review of Part VII: Inference When Variables Are Related
- Parts I - V Cumulative Review Exercises
Appendices
- Answers
- Credits
- Indexes
- Tables and Selected Formulas
Author bios
About our authors
Richard D. De Veaux is an internationally known educator and consultant. He has taught at the Wharton School and the Princeton University School of Engineering, where he won a Lifetime Award for Dedication and Excellence in Teaching. He is the C. Carlisle and M. Tippit Professor and Chair of the Statistics Department at Williams College, where he has taught since 1994. Dick has won both the Wilcoxon and Shewell awards from the American Society for Quality. He is a fellow of the American Statistical Association (ASA) and an elected member of the International Statistical Institute (ISI). In 2008, he was named Statistician of the Year by the Boston Chapter of the ASA and was the 2018–2021 Vice-President of the ASA. Dick is also well known in industry, where for more than 30 years he has consulted for such Fortune 500 companies as American Express, Hewlett-Packard, Alcoa, DuPont, Pillsbury, General Electric, and Chemical Bank. Because he consulted with Mickey Hart on his book Planet Drum, he has also sometimes been called the “Official Statistician for the Grateful Dead.” His real-world experiences and anecdotes illustrate many of this book’s chapters.
Dick holds degrees from Princeton University in Civil Engineering (B.S.E.) and Mathematics (A.B.) and from Stanford University in Dance Education (M.A.) and Statistics (Ph.D.), where he studied dance with Inga Weiss and Statistics with Persi Diaconis. His research focuses on the analysis of large data sets and data mining in science and industry. In his spare time, he is an avid cyclist and swimmer. He also is the founder of the “Diminished Faculty,” an a cappella Doo-Wop quartet at Williams College and sings bass in the college concert choir and with the Choeur Vittoria of Paris. Dick is the father of 4 children.
Paul F. Velleman has an international reputation for innovative Statistics education. He is the author and designer of the multimedia Statistics program ActivStats, for which he was awarded the EDUCOM Medal for innovative uses of computers in teaching statistics, and the ICTCM Award for Innovation in Using Technology in College Mathematics. He also developed the award-winning statistics program, Data Desk, the Internet site Data and Story Library (DASL) (DASL.datadescription.com), which provides data sets for teaching Statistics (and is one source for the datasets used in this text.), and the tools referenced in the text for simulation and bootstrapping. Paul’s understanding of using and teaching with technology informs much of this book’s approach.
Paul taught Statistics at Cornell University, where he was awarded the MacIntyre Award for Exemplary Teaching. He is Emeritus Professor of Statistical Science from Cornell and lives in Maine with his wife, Sue Michlovitz. He holds an A.B. from Dartmouth College in Mathematics and Social Science, and M.S. and Ph.D. degrees in Statistics from Princeton University, where he studied with John Tukey. His research often deals with statistical graphics and data analysis methods. Paul co-authored (with David Hoaglin) ABCs of Exploratory Data Analysis. Paul is a Fellow of the American Statistical Association and of the American Association for the Advancement of Science. Paul is the father of 2 boys. In his spare time he sings with the a capella group VoXX and studies tai chi.
David E. Bock taught mathematics at Ithaca High School for 35 years. He has taught Statistics at Ithaca High School, Tompkins-Cortland Community College, Ithaca College, and Cornell University. Dave has won numerous teaching awards, including the MAA’s Edyth May Sliffe Award for Distinguished High School Mathematics Teaching (twice), Cornell University’s Outstanding Educator Award (three times), and has been a finalist for New York State Teacher of the Year.
Dave holds degrees from the University at Albany in Mathematics (B.A.) and Statistics/Education (M.S.). Dave has been a reader and table leader for the AP Statistics exam, serves as a Statistics consultant to the College Board, and leads workshops and institutes for AP Statistics teachers. He has served as K–12 Education and Outreach Coordinator and a senior lecturer for the Mathematics Department at Cornell University. His understanding of how students learn informs much of this book’s approach.
Dave and his wife relax by biking or hiking, spending much of their free time in Canada, the Rockies, or the Blue Ridge Mountains. They have a son, a daughter, and 4 grandchildren.