BackStatistics Midterm 1 Study Guide: Data, Variables, and Data Organization
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Foundations of Statistics
Understanding Data
Statistics is the science of collecting, analyzing, and interpreting data. Data are values or measurements collected in context, and understanding their meaning requires knowing the units, source, and method of collection.
Definition of Data: Data are numbers or labels collected to represent information about objects, people, or events.
Contextual Information: To interpret data, you need to know the units of measurement, the population or sample, and how the data were collected.
Example: The numbers 1.73, 1.83, 1.58, 1.80, 1.65 could represent heights (in meters) of college students if the context is provided.
Types of Variables
Variables are characteristics or properties that can take on different values among subjects in a study. They are classified as either categorical or numerical.
Categorical Variable: Represents categories or groups (e.g., gender, area of interest).
Numerical Variable: Represents quantities or measurements (e.g., age, GPA, average speed).
Example: In a survey, 'gender' is categorical, while 'GPA' is numerical.
Identifying Variables in Tables
Tables often summarize data by listing variables and their values for each subject or object. It is important to distinguish between categorical and numerical variables in such tables.
Example Table:
Model | Series Number | Weight (lbs) | Road Bike | All Terrain | Class A |
|---|---|---|---|---|---|
Standard | AT30 | 28 | Yes | No | No |
Road Runner | R840 | 20 | Yes | No | No |
All Terrain | C300 | 32 | No | Yes | No |
Class A above | D90 | 14 | No | No | Yes |
Variables: Model (categorical), Series Number (categorical), Weight (numerical), Road Bike/All Terrain/Class A (categorical).
Organizing Data
Stacked vs. Unstacked Data Formats
Data can be organized in different formats depending on the study design and analysis needs.
Stacked Format: Each row represents one observation, and variables are listed in columns. Used when data are grouped by a variable (e.g., gender).
Unstacked Format: Data for different groups are separated into different columns. Each row may represent a group or a summary.
Example:
Gender | Age |
|---|---|
Female | 45 |
Male | 39 |
This is stacked data because each row represents one person.
Coding Categorical Variables
Categorical variables are often coded numerically for analysis. For example, 'Yes' may be coded as 1 and 'No' as 0.
Unstacked Numerical Data: Categorical responses are recorded as numbers (e.g., 1 for 'Yes', 0 for 'No').
Example: In a survey, 'Do you have children?' may be coded as 1 (Yes) and 0 (No).
Populations and Samples
Defining Population and Sample
In statistics, the population is the entire group of interest, while a sample is a subset of the population used for analysis.
Population: All individuals or objects of interest (e.g., all students at a school).
Sample: The group actually studied (e.g., students who responded to a survey).
Example: If a poll asks students about satisfaction with school offerings, the population is all students, and the sample is those who participated in the poll.
Describing and Comparing Data
Frequencies, Proportions, and Percentages
Data can be summarized using frequencies (counts), proportions, and percentages to describe and compare groups.
Frequency: The number of times a value or category occurs.
Proportion: The fraction of the total represented by a category:
Percentage: The proportion multiplied by 100:
Example: If 350 out of 900 adults prefer a movie on DVD, the proportion is and the percentage is
Using Tables to Answer Questions
Tables can be used to answer questions about relationships between variables, such as whether one variable is associated with another.
Example: To determine if student commute distance is associated with living situation, use the columns for 'Commute Distance' and 'Living Situation' in the table.
Summary Table: Variable Classification
Variable | Type |
|---|---|
Gender | Categorical |
GPA | Numerical |
Age | Numerical |
Area of Interest | Categorical |
School Year | Categorical |
Key Formulas
Proportion:
Percentage:
Additional info:
Some questions and tables were inferred to be about variable classification and data organization based on context and standard introductory statistics curriculum.
Examples and explanations were expanded for clarity and completeness.