Skip to main content
Back

Preliminaries in Applied Statistics: Data Types, Notation, and Descriptive Statistics

Study Guide - Smart Notes

Tailored notes based on your materials, expanded with key definitions, examples, and context.

Preliminaries in Statistics

Introduction to Data Types

Understanding the types of data is fundamental in statistics, as it determines the appropriate methods for analysis and interpretation.

  • Numerical Data: Quantitative values that can be measured or counted.

    • Continuous: Can take any value within a range (e.g., income).

    • Discrete: Can only take specific, separate values (e.g., family size).

  • Categorical Data: Qualitative values that describe categories or groups.

    • Nominal: Categories without any order (e.g., pet type).

    • Ordinal: Categories with a meaningful order (e.g., satisfaction level).

Household Income

Family Size

Pet

Satisfaction

4200.50

3

Dog

Neutral

6100.00

5

Cat

Very Satisf

3900.75

2

Fish

Dissatisfied

5000.00

4

Bird

Satisfied

7200.25

6

Dog

Very Satisf

Mathematical Notation for Data

Expressing data with mathematical symbols allows for concise and precise analysis.

  • List Notation: An ordered set of elements, e.g., {x1, x2, ..., xn}.

  • Example: Monthly highest temperatures of Davis in Fahrenheit: {56.1, 61, 66.5, 72.8, 81.2, 88.8, 93.5, 92.9, 90.1, 80.2, 65.7, 56.1}

  • General notation: , where is the observation index and is the total number of observations.

  • For multiple datasets, use different letters: e.g., for monthly lowest temperatures.

Summation Notation

Summation is a key operation in statistics, used to aggregate data values.

  • Capital-Sigma Notation:

  • Variants: For and a function :

Properties of Summation

Summation follows several algebraic properties, which are useful for simplifying expressions.

Note: In general, unless is linear.

Linear Function:

Linear functions produce straight lines; nonlinear functions produce curves.

Descriptive Statistics

Measures of Central Tendency

Central tendency describes where the center of a dataset lies.

  • Mean (Arithmetic Average):

  • Median: The middle value when data is ordered. If is even, median is the average of the two middle values.

  • Mode: The value that appears most frequently in the dataset.

Examples:

  • Mean of 2, 4, 6:

  • Median of 1, 3, 7, 8, 9: 7

  • Median of 1, 2, 3, 4:

  • Mode of 2, 2, 3, 4, 4, 4, 5: 4

Properties of the Mean

  • The mean is the sum divided by the number of observations.

  • Mean follows the properties of summation:

Measures of Spread

Spread measures how much the data values vary.

  • Range: Difference between the largest and smallest observations.

  • Deviation:

  • Sample Variance: Average of squared deviations:

  • Sample Standard Deviation: Square root of variance:

Example: For 2, 4, 6, , deviations are -2, 0, 2. , .

Why Divide by n - 1 in Sample Variance?

Dividing by (instead of ) in sample variance provides an unbiased estimate of the population variance.

  • Explanation 1 (Unbiasedness): gives a more accurate estimate.

  • Explanation 2 (Degrees of Freedom): Calculating the mean uses up one piece of information, leaving independent deviations.

For population data, divide by ; for sample data, divide by .

Additional info: This adjustment is known as Bessel's correction.

Pearson Logo

Study Prep