Instructor
William S. Cleveland, Haas 222, wsc@purdue.edu
Prerequisites
Knowledge of basic probability and statics, and mathematics through
calculus and linear algebra. No previous knowledge of R, Hadoop,
or RHIPE is needed.
Primary Audience
Graduate students in university departments where data are analyzed.
Credits
3
Time
TueThur 1:30 - 2:45
Location
Lectures and Labs: SC 277
Description
This course has two components: (1) The Divide and Recombine (D&R)
statistical approach to large complex data; (2) The Tessera computational
environment that implements D&R, allowing a data analyst to carry out deep
analysis of big data using D&R. Deep analysis means that the data are
analyzed in detail at their finest granularity, and the analyst has access to
any of the 1000s of methods of statistics, machine learning, and visualization
for use in the analysis.
Tessera has R at the front end. All analyst programming is in R. At the back end is the Hadoop distributed file system (HDFS) and parallel compute engine (MapReduce). Hadoop runs the analyst's R commands to carry out the D&R computations. Tessera software packages merge R and Hadoop, enabling communication between the two, and making programming D&R easy.
Students will have access to a Hadoop cluster provided by the Rosen Center for Advanced Computing, and with the Tessera software stack installed. Reading materials and lectures will be provided electronically.
Participant Responsibilities
Participants are expected to attend class and successfully carry out the class
assignments.
Get R and Login to Hathi Get Started
R Language: A Living Document An Introduction to R
R Language: A Living Document Visualization Methods
Trellis Display An Introduction to Trellis Display with Lattice Graphics
High Performance Computing for Data Analysis: D&R with Tessera D&R and Tessera
R Manual R Manual Written by R Core Team
Introduction of plyr package plyr package plyr package(continued)
Chicago Bears PSL Data in Text Format CSV data
Datasets and Functions that Make Plots in the Book Visualizing Data R objects