BackPreliminaries in Applied Statistics: Data Types, Notation, and Descriptive Statistics
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Preliminaries in Statistics
Introduction to Data Types
Understanding the types of data is fundamental in statistics, as it determines the appropriate methods for analysis and interpretation.
Numerical Data: Quantitative values that can be measured or counted.
Continuous: Can take any value within a range (e.g., income).
Discrete: Can only take specific, separate values (e.g., family size).
Categorical Data: Qualitative values that describe categories or groups.
Nominal: Categories without any order (e.g., pet type).
Ordinal: Categories with a meaningful order (e.g., satisfaction level).
Household Income | Family Size | Pet | Satisfaction |
|---|---|---|---|
4200.50 | 3 | Dog | Neutral |
6100.00 | 5 | Cat | Very Satisf |
3900.75 | 2 | Fish | Dissatisfied |
5000.00 | 4 | Bird | Satisfied |
7200.25 | 6 | Dog | Very Satisf |
Mathematical Notation for Data
Expressing data with mathematical symbols allows for concise and precise analysis.
List Notation: An ordered set of elements, e.g., {x1, x2, ..., xn}.
Example: Monthly highest temperatures of Davis in Fahrenheit: {56.1, 61, 66.5, 72.8, 81.2, 88.8, 93.5, 92.9, 90.1, 80.2, 65.7, 56.1}
General notation: , where is the observation index and is the total number of observations.
For multiple datasets, use different letters: e.g., for monthly lowest temperatures.
Summation Notation
Summation is a key operation in statistics, used to aggregate data values.
Capital-Sigma Notation:
Variants: For and a function :
Properties of Summation
Summation follows several algebraic properties, which are useful for simplifying expressions.
Note: In general, unless is linear.
Linear Function:
Linear functions produce straight lines; nonlinear functions produce curves.
Descriptive Statistics
Measures of Central Tendency
Central tendency describes where the center of a dataset lies.
Mean (Arithmetic Average):
Median: The middle value when data is ordered. If is even, median is the average of the two middle values.
Mode: The value that appears most frequently in the dataset.
Examples:
Mean of 2, 4, 6:
Median of 1, 3, 7, 8, 9: 7
Median of 1, 2, 3, 4:
Mode of 2, 2, 3, 4, 4, 4, 5: 4
Properties of the Mean
The mean is the sum divided by the number of observations.
Mean follows the properties of summation:
Measures of Spread
Spread measures how much the data values vary.
Range: Difference between the largest and smallest observations.
Deviation:
Sample Variance: Average of squared deviations:
Sample Standard Deviation: Square root of variance:
Example: For 2, 4, 6, , deviations are -2, 0, 2. , .
Why Divide by n - 1 in Sample Variance?
Dividing by (instead of ) in sample variance provides an unbiased estimate of the population variance.
Explanation 1 (Unbiasedness): gives a more accurate estimate.
Explanation 2 (Degrees of Freedom): Calculating the mean uses up one piece of information, leaving independent deviations.
For population data, divide by ; for sample data, divide by .
Additional info: This adjustment is known as Bessel's correction.