BackChapter 2: Data Collection – Foundations for Economic Data Analysis
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Data Collection in Economics
Understanding how data is collected, classified, and measured is fundamental for economic analysis. This chapter introduces key terminology, types of data, levels of measurement, and sampling methods, all of which are essential for interpreting and conducting research in macroeconomics and related fields.
Variables and Data
Basic Terminology
Observation: A single member of a collection of items under study (e.g., a person, firm, or region).
Variable: A characteristic or attribute of the subject or individual (e.g., income, age, or invoice amount).
Data Set: The collection of all values of all variables for all observations chosen for study.
Data sets may include one or more variables, and the type and number of variables determine the analytical techniques that can be used.
Types of Data Sets
Univariate: One variable (e.g., income).
Bivariate: Two variables (e.g., income and age).
Multivariate: More than two variables (e.g., income, age, gender).
Data Set | Variables | Example | Typical Tasks |
|---|---|---|---|
Univariate | One | Income | Histograms, basic statistics |
Bivariate | Two | Income, Age | Scatter plots, correlation |
Multivariate | More than two | Income, Age, Gender | Regression modeling |
Categorical and Numerical Data
Types of Data
Categorical (Qualitative): Data that can be grouped by categories or labels (e.g., vehicle type, pay type).
Numerical (Quantitative): Data that can be measured and expressed numerically.
Discrete: Countable values (e.g., number of eggs in a carton).
Continuous: Any value within a range (e.g., waiting time, customer satisfaction percentage).
Type | Example |
|---|---|
Categorical (Verbal Label) | Vehicle type (car, truck, SUV) |
Categorical (Coded) | Vehicle type (1, 2, 3) |
Numerical (Discrete) | Broken eggs in a carton (1, 2, 3...) |
Numerical (Continuous) | Patient waiting time (14.27 minutes) |
Time Series and Cross-Sectional Data
Time Series Data
Each observation represents a different, equally spaced point in time (e.g., years, months, days). The periodicity is the time between observations.
Periodicity can be annual, quarterly, monthly, weekly, daily, or hourly.
Examples in Macroeconomics:
National Income
Economic Indicators
Monetary Data
Examples in Microeconomics:
Sales
Market Share
Inventory Turnover
Cross-Sectional Data
Each observation represents a different individual unit at the same point in time (e.g., people, firms, regions).
Used to analyze variation among observations or relationships between variables at a single time point.
Levels of Measurement
Understanding the level of measurement is crucial for selecting appropriate statistical methods.
Nominal: Categories only; no order (e.g., eye color, brand names). Only counting and mode are meaningful.
Ordinal: Categories with a meaningful order, but intervals between categories are not equal (e.g., survey responses: never, sometimes, often).
Interval: Ordered categories with equal intervals, but no true zero (e.g., temperature in Celsius). Means and standard deviations are meaningful, but ratios are not.
Ratio: All properties of interval data, plus a meaningful zero (e.g., income, weight). Ratios are meaningful (e.g., $20 is twice as much as $10).
Level | Characteristics | Example |
|---|---|---|
Nominal | Categories only | Eye color |
Ordinal | Rank has meaning, but intervals are not equal | Survey frequency: never, sometimes, often |
Interval | Equal intervals, no true zero | Temperature (Celsius) |
Ratio | Equal intervals, true zero | Income, weight |
Likert Scales
A special case of interval data, commonly used in surveys to measure attitudes or opinions. Typically uses 5 or 7 points, with a neutral midpoint if the number of points is odd.
Example: "Strongly Agree, Somewhat Agree, Neither Agree nor Disagree, Somewhat Disagree, Strongly Disagree"
Likert data are coded numerically for analysis.
Sampling Concepts
Population vs. Sample
Population: The entire group of interest (e.g., all U.S. gasoline stations).
Sample: A subset of the population selected for analysis.
Census: An examination of all items in the population.
Sampling is often necessary due to constraints such as time, cost, and accessibility.
Parameters and Statistics
Parameter: A numerical measure describing a characteristic of a population (usually denoted by Greek letters, e.g., for mean).
Statistic: A numerical measure describing a characteristic of a sample (usually denoted by Roman letters, e.g., for mean).
Target Population and Sampling Frame
Target Population: The group the researcher wants to study.
Sampling Frame: The actual list or group from which the sample is drawn. The frame should match the target population as closely as possible.
Sampling Methods
Random Sampling Methods
Simple Random Sample: Every item has an equal chance of being selected.
Systematic Sample: Select every th item from a list, starting at a random point.
Stratified Sample: Divide the population into homogeneous subgroups (strata) and sample randomly within each stratum.
Cluster Sample: Divide the population into clusters (often geographically), randomly select clusters, and sample all or some items within selected clusters.
Sampling With and Without Replacement
With Replacement: Selected items are returned to the population and can be chosen again.
Without Replacement: Selected items are not returned and cannot be chosen again.
Sampling without replacement is important when the sample size is a significant fraction of the population (commonly, if ).
Non-Random Sampling Methods
Judgment Sample: Items are selected based on the sampler's expertise.
Convenience Sample: Items are selected based on ease of access.
Focus Groups: Panels formed for open-ended discussion and idea gathering.
Non-random methods are subject to bias and may not be representative of the population.
Sources of Error or Bias
Nonresponse Bias: Respondents differ from non-respondents.
Selection Bias: Self-selected respondents are atypical.
Response Error: Respondents provide false or inaccurate information.
Coverage Error: The sampling frame does not match the target population.
Measurement Error: Poorly worded questions or unclear instructions.
Interviewer Error: Responses influenced by the interviewer.
Sampling Error: Random and unavoidable variation due to sampling.
Data Sources
Reliable data sources are essential for economic analysis. Examples include:
U.S. Bureau of Labor Statistics
Economic Report of the President
Federal Reserve System
U.S. Census Bureau
World Bank
Online databases and government websites
Survey Research and Questionnaire Design
Steps in Survey Research
State the research goals.
Develop a budget (time, money, staff).
Create a research design (target population, frame, sample size).
Choose a survey type and method of administration.
Design the data collection instrument (questionnaire).
Pretest and revise the instrument.
Administer the survey and follow up as needed.
Code and analyze the data.
Types of Surveys
Mail Surveys: Useful for reaching a wide audience, but may have low response rates.
Telephone Surveys: Allow for clarification, but may be limited by nonresponse and screening.
Web Surveys: Cost-effective and fast, but may suffer from self-selection bias.
Personal Interviews: Allow for in-depth responses, but are time-consuming and costly.
Direct Observation: Useful for behavioral data, but may be limited in scope.
Questionnaire Design Guidelines
Use clear, concise wording and instructions.
Ensure questions are unbiased and cover all possibilities.
Use appropriate response scales (e.g., Likert, check boxes, open-ended).
Pretest the questionnaire and revise as needed.
Keep the survey as short as possible to encourage completion.
Survey Validity and Reliability
Validity: The survey measures what it is intended to measure.
Reliability: The survey produces consistent results over time.
Data Quality
Responses are often coded numerically for analysis.
Missing values should be handled carefully (e.g., using special codes).
Check for inconsistent or out-of-range responses.
Document all data-coding decisions.
Applications and Practice Problems
Classify variables as categorical, discrete numerical, or continuous numerical.
Distinguish between cross-sectional and time series data.
Identify the level of measurement for various variables (nominal, ordinal, interval, ratio).
Decide when to use a sample versus a census.
Recognize appropriate sampling methods and potential sources of error or bias.
Example: National income data collected annually is a time series and typically measured at the ratio level, allowing for meaningful comparisons and economic analysis.
Additional info: These concepts are foundational for macroeconomic research, as accurate data collection and analysis underpin the study of economic indicators, policy evaluation, and forecasting.