Instructor
William S. Cleveland, Haas 222, wsc@purdue.edu
Time
TueThur 1:30 - 2:45
Prerequisites
Knowledge of basic probability; mathematics through
calculus and linear algebra; and basic statistics including least-squares
fitting of parametric functions to data. No previous knowledge of data
visualization is needed.
Primary Audience
Graduate students in university departments where data are analyzed.
Location
Lectures and Labs: Physics 26.
If you are not familiar with Physics, fastest route is to enter main entrance
on Northwestern, go down stairs and through glass doors, turn left, and it is
straight ahead.
Description
BACKGROUND. Visual displays allow us to explore data to see overall patterns and to see detailed behavior; no other approach can compete in revealing the structure of data so thoroughly. Analyses without visualization run the risk of using inappropriate methods and models for the data. They run the risk of missing important unexpected behavior. They do not preserve the information in the data.
CONTENT. The course content will focus fundamentally on how to analyze data. Through many case studies, it will present visualization methods, going through a number of standard numerical methods and models for statistical analysis, showing how visualization enhances these methods and models. This illustrates the use of the visualization methods, and demonstrates why they are essential to valid analyses that preserve the information in data. In addition, lectures will cover the lattice graphics system in R, which can be used to carry out all methods discussed in the course. To support this, a certain number of classes will consist of labs in which participants will use lattice.
BIG DATA. Toward the end of the course we will address big data. Participants will use a small cluster of two nodes set up exclusively for the course, and learn about visualization of big data. Actually, only small data are used to this because, yes, it is possible to learn big data ideas using small data.
Participant Responsibilities
Participants are expected to attend class and complete all homework assignments. Homework will consist of analyzing data in R using lattice graphics to carry out visualization methods presented in the course. There will be no tests or final exam.
Lectures on Visualization Methods
Visualization Methods and Their Application
Book: Visualizing Data . Provided by instructor.
Trellis Display for Modeling Data from Designed Experiments
R and Lattice Graphics
An Easy-Going Introduction to Lattice Graphics
Easy-Going Introduction to Lattice Graphics
lattice.Rdata
An R data file with R dataset objects and R functions that make most of the
graphs in the book Visualizing Data.
Chicago Bears PSL Data in Text Format
One week observations from NYC Taxi data 2013
book functions in text
A .txt file with all book functions. Each is defined. In R one can use source()
to read the functions into R.
Housing data installation source file
http://ml.stat.purdue.edu/stat695t/writings/sarkar.lattice.book
Deepayan Sarkar: Lattice: Multivariate Data Visualization in R
A book on trellis display in R with Lattice Graphics.
Tessera Software Documentation