Skip to main content
Back

Defining and Collecting Data: Business Statistics Chapter 1 Study Notes

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Defining and Collecting Data

Introduction

This chapter introduces foundational concepts in business statistics, focusing on how to define variables, understand measurement scales, collect data, and address issues in data preparation and survey errors. Mastery of these topics is essential for effective statistical analysis and interpretation in business contexts.

Classifying Variables

Types of Variables

Variables are characteristics or properties that can take different values. They are classified as follows:

  • Categorical (Qualitative) Variables: Represent categories or groups, such as "yes/no" or "blue/brown/green".

  • Numerical (Quantitative) Variables: Represent measured or counted quantities.

    • Discrete Variables: Arise from a counting process (e.g., number of children).

    • Continuous Variables: Arise from a measuring process (e.g., weight, voltage).

Example: Eye color is a categorical variable; number of defects per hour is a discrete numerical variable; weight is a continuous numerical variable.

Measurement Scales

Types of Measurement Scales

Measurement scales determine how variables are categorized and interpreted:

  • Nominal Scale: Classifies data into distinct categories without implied ranking (e.g., cellular provider: AT&T, Sprint, Verizon).

  • Ordinal Scale: Classifies data into ordered categories (e.g., ratings: good, better, best).

  • Interval Scale: Ordered scale with meaningful differences between measurements, but no true zero point (e.g., temperature in Celsius).

  • Ratio Scale: Ordered scale with meaningful differences and a true zero point (e.g., weight, salary).

Example: Age measured in years is a ratio scale; standardized exam scores are interval scales.

Table of numerical variables and their measurement levels

Data Collection

Population vs. Sample

Data can be collected from:

  • Population: All items or individuals of interest.

  • Sample: A subset of the population, used when collecting data from the entire population is impractical.

Example: A population of 40 individuals; a sample of 4 selected from that population.

Parameter vs. Statistic

  • Parameter: Summarizes a variable for the entire population.

  • Statistic: Summarizes a variable for a sample.

Sources of Data

  • Ongoing Business Activities: E.g., banks analyzing transaction data for fraud detection.

  • Distributed Data: E.g., financial data from investment services.

  • Survey Data: E.g., political polls, product satisfaction surveys.

  • Designed Experiments: E.g., consumer testing of products.

  • Observational Studies: E.g., measuring customer service times.

Primary vs. Secondary Data Sources

  • Primary: Data collected and analyzed by the same person or organization.

  • Secondary: Data analyzed by someone other than the collector (e.g., census data).

Sampling Methods

Sampling Frame

The sampling frame is a list of items that make up the population. Inaccurate frames can lead to biased results.

Types of Samples

  • Nonprobability Samples: Items are chosen without regard to probability.

    • Convenience Sampling: Based on ease of access.

    • Judgment Sampling: Based on expert opinion.

  • Probability Samples: Items are chosen based on known probabilities.

    • Simple Random Sample: Every item has an equal chance of selection.

    • Systematic Sample: Select every kth item after a random start.

    • Stratified Sample: Divide population into strata and sample proportionally.

    • Cluster Sample: Divide population into clusters, randomly select clusters, and sample within them.

Comparing Sampling Methods

  • Simple Random & Systematic: Easy to use, may not represent all characteristics.

  • Stratified: Ensures representation across subgroups.

  • Cluster: Cost-effective, but less efficient.

Data Preparation and Cleaning

Data Cleaning

Data cleaning corrects irregularities such as invalid values, coding errors, and integration errors. It is a crucial preprocessing step before analysis.

  • Invalid Variable Values: Non-numeric entries for numerical variables, values outside defined ranges.

  • Coding Errors: Inconsistent or incorrect categorical values, extraneous characters.

  • Data Integration Errors: Redundant columns, duplicated rows, differing units.

Data cleaning can be semi-automated using software tools, but manual review is often necessary. Always preserve the original data.

Stacked vs. Unstacked Data

  • Unstacked Data: Separate variables for different groups.

  • Stacked Data: Single column for variable of interest, additional columns for grouping variables.

Recoding Variables

Recoding involves redefining categories or converting numerical variables to categorical. New categories must be mutually exclusive and collectively exhaustive.

Survey Errors and Ethical Issues

Types of Survey Errors

  • Coverage Error: Some groups excluded from the frame.

  • Nonresponse Error: Differences between respondents and nonrespondents.

  • Sampling Error: Random variation between samples.

  • Measurement Error: Poor question design or respondent mistakes.

Ethical Issues

  • Survey designers may intentionally bias results through coverage or nonresponse errors.

  • Failure to report sampling error margins is unethical.

  • Leading questions or interviewer bias can distort results.

  • Respondents may provide false information.

Summary Table: Types of Variables and Measurement Scales

Variable Type

Examples

Measurement Scale

Categorical

Marital Status, Political Party, Eye Color

Nominal, Ordinal

Numerical (Discrete)

Number of Children, Defects per hour

Ratio

Numerical (Continuous)

Weight, Voltage

Ratio

Ordinal

Ratings: Good, Better, Best; Low, Med, High

Ordinal

Key Formulas

Sample Mean

The sample mean is calculated as:

Population Mean

The population mean is calculated as:

Sampling Interval (Systematic Sampling)

For systematic sampling:

Where N is the population size and n is the sample size.

Chapter Summary

This chapter covered the definition and classification of variables, measurement scales, data collection methods, sampling techniques, data preparation, survey errors, and ethical issues. Understanding these concepts is fundamental for conducting reliable and valid statistical analyses in business.

Pearson Logo

Study Prep