Skip to main content
Back

Fundamentals of Data Collection and Types in Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Introduction to Statistics

Statistics is the science of collecting, analyzing, interpreting, and drawing conclusions from data. It provides methods for summarizing and presenting information to support decision-making and scientific inquiry.

Objectives of Statistics

  • Objective: Use sample data as a basis for drawing conclusions about the whole population.

  • Data: Collections of observations or information (such as measurements, genders, survey responses, etc.).

Where Does Data Come From?

Understanding the source of data is fundamental in statistics. Data can be collected from entire populations or from samples.

Population, Census, and Sample

  • Population: The complete collection of all individuals (scores, people, measurements, etc.) to be studied. The collection is complete in the sense that it includes all individuals to be studied.

  • Census: The collection of data from every member of the population. The population size symbol is N.

  • Parameter: A characteristic of data that comes from the population (referred to as capital or proper form).

  • Sample: A sub-collection (or subset) of members selected from a population. Samples tend to be the most appropriate way to collect information from a population. The sample size symbol is n.

    • Needs to be random

    • Needs to be representative of the population

  • Statistic: A characteristic of data that comes from a sample (referred to as the lower case).

Common Symbols in Statistics

The following table summarizes common symbols used for characteristics of samples and populations:

Characteristic

Statistic (Sample)

Parameter (Population)

Average (mean)

Standard Deviation

Variance

Proportion

Correlation

Types of Data

Data in statistics can be classified into different types based on their nature and measurement.

Quantitative (Numerical) Data

  • Consists of numbers representing counts or measurements.

  • Can be broken down further into two types:

    • Discrete Data: Result when the number of possible values is either a finite number or a "countable value".

    • Continuous Data: Result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps, or a "measurable value".

Qualitative (Categorical) Data

  • Names or labels that are not numbers representing counts or measurements.

Levels of Measurement

Levels of measurement describe how data can be categorized, ordered, and quantified.

  • Nominal: Data consists of names, labels, or categories only. Cannot be ordered.

  • Ordinal: Data can be arranged in order, but differences between data values cannot be determined or are meaningless.

  • Interval: Data can be ordered, and meaningful differences can be found, but there is no natural zero starting point.

  • Ratio: Interval level modified to include the inherent zero starting point. Differences and ratios are meaningful.

Sampling Methods

Sampling is the process of selecting a subset of individuals from a population to estimate characteristics of the whole population.

Observation vs. Experiment

  • Observation: We observe and measure specific characteristics but do not attempt to modify. Sample surveys use estimation population parameters, so the sample needs to be representative of the population as possible. Observations or surveys can be taken in three different ways:

    • Retrospective: Go back through records.

    • Cross-sectional: Observed at one point in time.

    • Prospective (longitudinal or cohort): Go forward in time and observe groups sharing common factors.

  • Experiment: We apply some treatment and then proceed to observe its effects on the subjects (experimental units). Experiments try to assess the effects of treatments, and experimental units are not always drawn randomly from a population. An experiment is the only way we can statistically identify causality (cause and effect relationship).

Survey and Sampling Procedures

  • Simple Random Sample: Selecting n subjects in such a way that every possible sample of the same size has the same chance of being chosen. This implies the opportunity to replicate the study.

  • Systematic: Select a starting point and select every kth element in the population.

    • Example: If a company employs 10,000 employees and plans to conduct a survey of its employees, they can select every 10th employee from the company roster to obtain a sample of size 1000.

  • Convenience: We simply use results that are readily available because it is convenient.

    • Example: In order to find the average age of community college students, a sample can be selected of students from PHSC for convenience.

  • Stratified: Divide the population into at least two different subpopulations (or strata) that share the same characteristic (e.g., gender, ethnicity, age, etc.), then draw a random sample from each stratum.

    • Example: If the college wants to award scholarships to two men and two women, two men are randomly selected from the group of men and two women are randomly selected from the group of women. This is an example of stratified sampling.

  • Cluster: Divide the population into sections or clusters (like strata) and then randomly select a few of these clusters, and then choose all the members from these selected clusters.

Summary Table: Sampling Methods

Sampling Method

Description

Example

Simple Random

Every member has equal chance of selection

Randomly select 100 students from a list

Systematic

Select every kth member after a random start

Select every 10th employee from a roster

Convenience

Use readily available subjects

Survey students in a nearby classroom

Stratified

Divide into strata, sample from each

Sample men and women separately

Cluster

Divide into clusters, sample all from selected clusters

Sample all students from randomly selected classes

Key Terms and Formulas

  • Mean (Sample):

  • Mean (Population):

  • Standard Deviation (Sample):

  • Standard Deviation (Population):

  • Variance (Sample):

  • Variance (Population):

Conclusion

Understanding the sources and types of data, levels of measurement, and sampling methods is essential for designing statistical studies and interpreting results. These foundational concepts form the basis for more advanced statistical analysis and inference.

Pearson Logo

Study Prep