all short courses

Visualisation and data manipulation in R

Ebay, San Jose. 8-9 August 2011

Needed packages: install.packages(c("ggplot2", "plyr", "maps", "lubridate", "stringr", "reshape2", "profr"))

Course outline

    Introductions and course outline.

    ggplot2 basics

    Create informative scatterplots: add extra variables with aesthetics (like color, shape and size) or facetting. Create graphics for large data: histograms and bar charts for displaying distributional summaries; boxplots; scatterplots variations that overcome the over-plotting problems associated with large data.

    Data manipulation

    Manipulate and transform data: add extra information to your plots via group-wise summaries and transformations; visualize time series. Introduction to the plyr package.

    Graphics: critique and creation

    Basic tools for critiquing a graphic. Advanced layered techniques. Overlay graphic elements using ggplot layers: combining raw data with statistical summaries and contextual information.

    Polishing graphics for presentation

    Polish your plots: tweak your plots for maximum presentation impact; introduction to color theory; labels, legends and axes; tweaking the plot themes.

    Tidy data

    Learn about data tidying, the art of getting your data in the right form for visualisation, manipulation and modelling. You’ll learn to use the melt and dcast functions from the reshape2 package to deal with a wide range of untidy datasets.

    Group-wise modelling

    Advanced data aggregation. Build on your knowledge of plyr to fit large ensembles of simple models, then extract coefficients, predictions, residuals, and other summary statistics. Many examples of advanced layering. Key functions: dlply and ldply

    First-class functions

    First class functions. Learn how to take advantage of R’s functional programming capabilities to write code that is both simpler and more general.

    Development best practices

    Development best practices. How to write code that is correct, maintainable and fast. A survey of development best-practices including a discussion of code style, commenting, profiling, improving performance and testing. We’ll touch on the new byte-code compiler in R, and on writing high-performance code in C++ with the Rcpp package.