STAT 695T Visualizing Large Complex Data Spring 2015

Course Information

Instructor
William S. Cleveland, Haas 222, wsc@purdue.edu

Time
TueThur 1:30 - 2:45

Prerequisites
Knowledge of basic probability; mathematics through calculus and linear algebra; and basic statistics including least-squares fitting of parametric functions to data. No previous knowledge of data visualization is needed.

Primary Audience
Graduate students in university departments where data are analyzed.

Location
Lectures and Labs: Physics 26. If you are not familiar with Physics, fastest route is to enter main entrance on Northwestern, go down stairs and through glass doors, turn left, and it is straight ahead.

Description

BACKGROUND. Visual displays allow us to explore data to see overall patterns and to see detailed behavior; no other approach can compete in revealing the structure of data so thoroughly. Analyses without visualization run the risk of using inappropriate methods and models for the data. They run the risk of missing important unexpected behavior. They do not preserve the information in the data.

CONTENT. The course content will focus fundamentally on how to analyze data. Through many case studies, it will present visualization methods, going through a number of standard numerical methods and models for statistical analysis, showing how visualization enhances these methods and models. This illustrates the use of the visualization methods, and demonstrates why they are essential to valid analyses that preserve the information in data. In addition, lectures will cover the lattice graphics system in R, which can be used to carry out all methods discussed in the course. To support this, a certain number of classes will consist of labs in which participants will use lattice.

BIG DATA. Toward the end of the course we will address big data. Participants will use a small cluster of two nodes set up exclusively for the course, and learn about visualization of big data. Actually, only small data are used to this because, yes, it is possible to learn big data ideas using small data.

Participant Responsibilities

Participants are expected to attend class and complete all homework assignments. Homework will consist of analyzing data in R using lattice graphics to carry out visualization methods presented in the course. There will be no tests or final exam.

Course Lecture Slides

Lectures on Visualization Methods

Lectures on Lattice Graphics

Lectures on R

Class Reading

Visualization Methods and Their Application

Book: Visualizing Data . Provided by instructor.

Trellis Design and Control

Trellis Display for Modeling Data from Designed Experiments

R and Lattice Graphics

An Introduction to R

An Easy-Going Introduction to Lattice Graphics

Easy-Going Introduction to Lattice Graphics

lattice.Rdata
An R data file with R dataset objects and R functions that make most of the graphs in the book Visualizing Data.

Chicago Bears PSL Data in Text Format

One week observations from NYC Taxi data 2013

book functions in text
A .txt file with all book functions. Each is defined. In R one can use source() to read the functions into R.

Housing data installation source file

R code for datadr class

http://ml.stat.purdue.edu/stat695t/writings/sarkar.lattice.book
Deepayan Sarkar: Lattice: Multivariate Data Visualization in R
A book on trellis display in R with Lattice Graphics.

Tessera Software Documentation

RHIPE, datadr, and Trelliscope