BackIntroduction to the Practice of Statistics: Key Concepts, Data Types, Study Designs, and Sampling Methods
Study Guide - Smart Notes
Tailored notes based on your materials, expanded with key definitions, examples, and context.
Introduction to the Practice of Statistics
Objectives
Define Statistics
Explain the process of statistics
Distinguish between Qualitative and Quantitative variables
Distinguish between Discrete and Continuous variables
Definition of Statistics
Statistics is the science of collecting, organizing, summarizing, and analyzing information to draw conclusions or answer questions.
Four-Step Statistical Process
Plan (Identify a question): Formulate a statistical question that can be answered with data. This is a crucial step in the process.
Collect (Produce Data): Design and implement a plan to collect appropriate data. Data can be gathered through observations, interviews, questionnaires, databases, sampling, or experimentation.
Process (Analyze the Data): Organize and summarize the data using graphical or numerical methods. Examples include histograms, dot plots, and box plots.
Conclusion (Interpret the Results): Interpret findings in the context of the original question and explain how the data answers it.
Population, Sample, and Types of Statistics
Population: The entire group of individuals to be studied.
Individual: A person or object that is a member of the population.
Sample: A subset of the population being studied.
Descriptive statistics: Methods for organizing and summarizing data, often using tables, graphs, and numerical summaries.
Inferential statistics: Methods that use sample data to make generalizations about a population and measure the reliability of the results.
Examples
Parameter: The average score for a class of 28 students taking a calculus midterm exam was 72%.
Statistic: Interviews of 100 adults found that 44% could state the minimum age required for the office of U.S. president.
Types of Data: Qualitative and Quantitative Variables
Qualitative vs. Quantitative Variables
Qualitative (Categorical) variables: Classify individuals based on some attribute or characteristic.
Quantitative variables: Provide numerical measures of individuals. Arithmetic operations can be performed on these values.
Discrete vs. Continuous Variables
Discrete variable: Has a finite or countable number of possible values (e.g., number of children).
Continuous variable: Has an infinite number of possible values that are not countable (e.g., daily intake of whole grains measured in grams).
Example: Classification of Variables
Nationality: Qualitative
Number of children: Discrete
Household income in the previous year: Quantitative
Daily intake of whole grains: Continuous
Observational Studies versus Designed Experiments
Objectives
Distinguish between an observational study and an experiment
Explain the various types of observational studies
Definitions
Observational study: Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables.
Designed experiment: Researcher intentionally changes the value of the explanatory variable and records the value of the response variable.
Lurking variable: An explanatory variable not considered in a study but that affects the value of the response variable.
Key Point
Observational studies do not allow a researcher to claim causation, only association. Only well-designed experiments can prove the cause and effect.
Types of Observational Studies
Cross-sectional studies: Collect information about individuals at a specific point in time or over a very short period.
Case-control studies: Retrospective studies that require individuals to look back in time or require the researcher to look at existing records. Individuals are matched based on certain characteristics.
Cohort studies: Identify a group of individuals to participate in a study (the cohort) and are observed over a long period. Characteristics are recorded, and some individuals are exposed to certain factors.
Examples
Cross-sectional: Daily coffee consumption and nonmelanoma skin cancer.
Case-control: Tanning and skin cancer (comparing people with and without skin cancer).
Cohort: Doll and Hill cohort study on smoking and lung cancer, following a group of male British doctors over 50 years.
Sampling Techniques
Simple Random Sampling
Random sampling is the process of using chance to select individuals from a population to be included in the sample. A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring.
Steps for Obtaining a Simple Random Sample
Put members in alphabetical order and number them.
Randomly select numbers using a random number generator or table.
Match the generated random numbers to the corresponding individuals.
Example
From 80 students, select 5 using random digits: 05, 16, 62, 77, 48.
Other Sampling Techniques
Stratified sample: Separate the population into homogeneous, nonoverlapping groups (strata), then obtain a simple random sample from each stratum.
Systematic sample: Select every kth individual from the population, starting with a randomly selected individual between 1 and k.
Cluster sample: Divide the population into clusters and randomly select entire clusters for the sample.
Sampling Method | Description | Example |
|---|---|---|
Simple Random Sampling | Every member has an equal chance of being selected | Randomly select 5 students from a list of 80 |
Stratified Sampling | Population divided into strata, sample taken from each stratum | Divide by gender, randomly select from each group |
Systematic Sampling | Select every k-th individual | Choose every 10th person from a list |
Cluster Sampling | Divide population into clusters, randomly select clusters | Randomly select classrooms, survey all students in selected rooms |
Formulas
Population size:
Sample size:
Simple random sample probability:
Additional info:
Sampling methods are crucial for ensuring that statistical conclusions are valid and representative of the population.
Stratified sampling increases precision by ensuring representation from all subgroups.
Systematic sampling is efficient but may introduce bias if there is a pattern in the population list.
Cluster sampling is cost-effective for large populations but may increase sampling error.