Introduction - Video Tutorials & Practice Problems
Video duration:
3m
Play a video:
<v ->Hello, I'm Jared P. Lander.</v> R and data science is a huge part of my life. In addition to owning a boutique data science firm in New York City that specializes in R, I run the New York Open Statistical Programming Meetup, the world's largest R meetup, teach R programming at Columbia University as part of the intro to data science course, and I'm the author of the best selling book R for Everyone. In this series of videos, we learn how to use R as a tool for data manipulation, visualization, and presentation. This live lessons video has a broad audience. It is meant both for people learning to program for the first time and series code slingers with knowledge of other languages. Expert Excel users looking to make their lives easier by automating tasks through scripting will greatly benefit from this series. And of course, statisticians old and new will gain a lot from learning R. Whether it is their first language or they are transitioning from SAS or STATA. We start at the very beginning downloading, installing, and getting used to the R environment, including R studio. Next comes the basics such as variables, data types, vectors, and basic functions. These are the building blocks that make up more advanced structures such as data frames, matrices, and lists. It is the data frame in particular that makes R so natural for working with data with its spreadsheet like mixed type structure. All of R's analysis capabilities depend on reading data. We explore reading CSVs, databases, Excel, and other sources. Once we have data, the first step is visualizing it. While we spend a good chunk of time dedicated solely to graphics, particularly digiplot 2, we demonstrate throughout the whole series when appropriate. Visualization is so important to the statistical process that it comes up repeatedly. To really harness the power of R, we need to learn how it handles the basics of programming such as writing functions, if statements, and loops. Even if loops are discouraged in favor of vectorization. Before we can analyze data, we usually have to get it in shape, which is often 80% of the work. We learn all sorts of techniques for munging data, using built in functions and packages such as plyr, Reshape2, Datatable, foreach, and dplyr. With all of today's unstructured data, it is important we know how to manipulate strings, both for creating them and extracting information. At the end of any good analysis, the results must be reported in a compelling fashion. Thanks to knitr and R Markdown, R easily generates great looking reports, websites, word documents, and slideshows with statistical results automatically embedded within. This is augmented with any number of javascript libraries thanks to HTML widgets packages like datatable, Bokeh, and leaflet. For an interactive user experience, we use Shiny, an increasingly popular tool to build web based data displays and analysis all in R. We take a look at building an R package for modular, portable code and giving back to the community. Lastly, we see how to use Rcpp to seamlessly integrate C++ into R code for even more speed. I hope you will find these lessons informative and entertaining as you learn how to use R as a data tool.