This course, part of Harvard University's Professional Certificate Program in Data Science, covers several standard steps of the data wrangling process, like importing data into R, tidying data, string processing, HTML parsing, working with dates and times, and text mining. Rarely are all these wrangling steps necessary in a single analysis, but a data scientist will likely face them all at some point.
Very rarely is data easily accessible in a data science project. It's more likely for the data to be in a file, a database, or extracted from documents such as web pages, tweets, or PDFs. In these cases, the first step is to import the data into R and tidy the datausing the tidyverse package. The steps that convert data from its raw form to the tidy form are called data wrangling.
This process is a critical step for any data scientist. Knowing how to wrangle and clean data will enable you to make critical insights that would otherwise be hidden.
Topics of study
How to import data into R from different file formats
How to tidy data using the tidy verse to better facilitate analysis
String processing with regular expressions (regex)
Wrangling data using dplyr
How to work with dates and times as file formats
About Harvard University
Harvard University is devoted to excellence in teaching, learning and research, and to developing leaders in many disciplines who make a difference globally. Harvard faculty are engaged with teaching and research to push the boundaries of human knowledge. The University has 12 degree-granting schools in addition to the Radcliffe Institute for Advanced Study.
Established in 1636, Harvard is the oldest institution of higher education in the United States. The University, which is based in Cambridge and Boston, Massachusetts, has an enrollment of over 20,000 degree candidates, including undergraduate, graduate and professional students. Harvard has more than 360,000 alumni around the world.