Skip to main content
Back

Statistics Midterm 1 Study Guide: Data, Variables, and Data Organization

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Foundations of Statistics

Understanding Data

Statistics is the science of collecting, analyzing, and interpreting data. Data are values or measurements collected in context, and understanding their meaning requires knowing the units, source, and method of collection.

  • Definition of Data: Data are numbers or labels collected to represent information about objects, people, or events.

  • Contextual Information: To interpret data, you need to know the units of measurement, the population or sample, and how the data were collected.

  • Example: The numbers 1.73, 1.83, 1.58, 1.80, 1.65 could represent heights (in meters) of college students if the context is provided.

Types of Variables

Variables are characteristics or properties that can take on different values among subjects in a study. They are classified as either categorical or numerical.

  • Categorical Variable: Represents categories or groups (e.g., gender, area of interest).

  • Numerical Variable: Represents quantities or measurements (e.g., age, GPA, average speed).

  • Example: In a survey, 'gender' is categorical, while 'GPA' is numerical.

Identifying Variables in Tables

Tables often summarize data by listing variables and their values for each subject or object. It is important to distinguish between categorical and numerical variables in such tables.

  • Example Table:

Model

Series Number

Weight (lbs)

Road Bike

All Terrain

Class A

Standard

AT30

28

Yes

No

No

Road Runner

R840

20

Yes

No

No

All Terrain

C300

32

No

Yes

No

Class A above

D90

14

No

No

Yes

  • Variables: Model (categorical), Series Number (categorical), Weight (numerical), Road Bike/All Terrain/Class A (categorical).

Organizing Data

Stacked vs. Unstacked Data Formats

Data can be organized in different formats depending on the study design and analysis needs.

  • Stacked Format: Each row represents one observation, and variables are listed in columns. Used when data are grouped by a variable (e.g., gender).

  • Unstacked Format: Data for different groups are separated into different columns. Each row may represent a group or a summary.

  • Example:

Gender

Age

Female

45

Male

39

  • This is stacked data because each row represents one person.

Coding Categorical Variables

Categorical variables are often coded numerically for analysis. For example, 'Yes' may be coded as 1 and 'No' as 0.

  • Unstacked Numerical Data: Categorical responses are recorded as numbers (e.g., 1 for 'Yes', 0 for 'No').

  • Example: In a survey, 'Do you have children?' may be coded as 1 (Yes) and 0 (No).

Populations and Samples

Defining Population and Sample

In statistics, the population is the entire group of interest, while a sample is a subset of the population used for analysis.

  • Population: All individuals or objects of interest (e.g., all students at a school).

  • Sample: The group actually studied (e.g., students who responded to a survey).

  • Example: If a poll asks students about satisfaction with school offerings, the population is all students, and the sample is those who participated in the poll.

Describing and Comparing Data

Frequencies, Proportions, and Percentages

Data can be summarized using frequencies (counts), proportions, and percentages to describe and compare groups.

  • Frequency: The number of times a value or category occurs.

  • Proportion: The fraction of the total represented by a category:

  • Percentage: The proportion multiplied by 100:

  • Example: If 350 out of 900 adults prefer a movie on DVD, the proportion is and the percentage is

Using Tables to Answer Questions

Tables can be used to answer questions about relationships between variables, such as whether one variable is associated with another.

  • Example: To determine if student commute distance is associated with living situation, use the columns for 'Commute Distance' and 'Living Situation' in the table.

Summary Table: Variable Classification

Variable

Type

Gender

Categorical

GPA

Numerical

Age

Numerical

Area of Interest

Categorical

School Year

Categorical

Key Formulas

  • Proportion:

  • Percentage:

Additional info:

  • Some questions and tables were inferred to be about variable classification and data organization based on context and standard introductory statistics curriculum.

  • Examples and explanations were expanded for clarity and completeness.

Pearson Logo

Study Prep