1. Applied and Theoretical Statistics
- ‘‘Categorical Data Analysis’’ by Alan Agresti
Well-written, go-to reference for all things involving categorical data.
More information available through the causal inference reading group and online seminar
Communicating with Data
- ‘‘Communicating with Data The Art of Writing for Data Science’’ by Deborah Nolan and Sara Stoudt
- ‘‘Compositional Data Analysis’’ by Pawlowsky-Glahn and Buccianti
- ‘‘Generalized Linear Models’’ by McCullagh and Nelder
Theoretical take on GLMs. Does not have a lot of concrete data examples.
- ‘‘Statistical Models’’ by David A. Freedman
- ‘‘Linear Models with R’’ by Julian Faraway
Undergraduate-level textbook, has been used previously as a textbook for Stat 151A. Appropriate for beginners to R who would like to learn how to use linear models in practice. Does not cover GLMs.
- ‘‘Design of Comparative Experiments’’ by Rosemary A Bailey
Classic, approachable text, free for download here
Machine Learning (see also Probabilistic Modeling and Sampling)
- ‘‘The Elements of Statistical Learning’’ by Hastie, Tibshirani, and Friedman
Comprehensive but superficial coverage of all modern machine learning techniques for handling data. Introduces PCA, EM algorithm, k-means/hierarchical clustering, boosting, classification and regression trees, random forest, neural networks, etc. …the list goes on. Download the book here.
- ‘‘Computer Age Statistical Inference: Algorithms, Evidence, and Data Science’’ by Hastie and Efron.
- ‘‘Pattern Recognition and Machine Learning’’ by Bishop
- ‘‘Bayesian Reasoning and Machine Learning’’ by Barber
- ‘‘Probabilistic Graphical Models’’ by Koller and Friedman
- ‘‘Deep Learning’’ by Goodfellow, Bengio and Courville
Multiple Testing, Post-Selection Inference and Selective Inference
- ‘‘Multiple Comparisons: theory and methods’’ by Jason Hsu
One of many sources in this field of research. Most of the literature comes from research papers.
More information available through online seminar.
Probabilistic Modeling and Sampling (see also Machine Learning)
- ‘‘Monte Carlo Statistical Methods’’ by Robert and Casella
A comprehensive text on sampling approaches.
- ‘‘Handbook of Approximate Bayesian Computation’’ by Sisson, Fan and Beaumont
- ‘‘Graphical Models, Exponential Families, and Variational Inference’’ by Wainwright and Jordan
Assuming knowledge at the level of Stat 210AB, elucidates how exponential families can be used in large-scale and interpretable probabilistic modeling.
Theory and Foundations
- ‘‘Theoretical Statistics: Topics for a Core Course’’ by Keener
The primary text for Stat 210A. Download from SpringerLink.
- ‘‘Theory of Point Estimation’’ by Lehmann and Casella
A good reference for Stat 210A, covering estimation.
- ‘‘Testing Statistical Hypotheses’’ by Lehmann and Romano
A more advanced reference for Stat 210A, convering testing and a litany of related concepts.
- ‘‘Empirical Processes in M-Estimation’’ by van de Geer
- Some students find this helpful to supplement the material in 210B.
- ‘‘Concentration Inequalities’’ by Boucheron, Lugosi, and Massart
This is also useful to supplement 210B material.
Undergraduate Level Probability
- ‘‘Probability’’ by Pitman
What the majority of Berkeley undergraduates use to learn probability.
- ‘‘Introduction to Probability Theory’’ by Hoel, Port and Stone
This text is more mathematically inclined than Pitman’s, and more concise, but not as good at teaching probabilistic thinking.
- ‘‘Probability and Computing’’ by Upfal and Mitzenmacher
What students in EECS use to learn about randomized algorithms and applied probability.
Measure Theoretic Probability
- ‘‘Probability: Theory and Examples’’ by Durrett
This is the standard text for learning measure theoretic probability. Its style of presentation can be confusing at times, but the aim is to present the material in a manner that emphasizes understanding rather than mathematical clarity. It has become the standard text in Stat 205A and Stat 205B for good reason. Online here.
- ‘‘Foundations of Modern Probability’’ by Olav Kallenberg
This epic tome is the ultimate research level reference for fundamental probability. It starts from scratch, building up the appropriate measure theory and then going through all the material found in 205A and 205B before powering on through to stochastic calculus and a variety of other specialized topics. The author put much effort into making every proof as concise as possible, and thus the reader must put in a similar amount of effort to understand the proofs. This might sound daunting, but the rewards are great. This book has sometimes been used as the text for 205A.
- ‘‘Probability and Measure’’ by Billingsley
This text is often a useful supplement for students taking 205 who have not previously done measure theory. Download here.
- ‘‘Probability with Martingales’’ by David Williams
This delightful and entertaining book is the fastest way to learn measure theoretic probability, but far from the most thorough. A great way to learn the essentials.
Stochastic Calculus is an advanced topic that interested students can learn by themselves or in a reading group. There are three classic texts:
- ‘‘Continuous Martingales and Brownian Motion’’ by Revuz and Yor
- ‘‘Diffusions, Markov Processes and Martingales (Volumes 1 and 2)’’ by Rogers and Williams
- ‘‘Brownian Motion and Stochastic Calculus’’ by Karatzas and Shreve
Random Walk and Markov Chains
These are indispensable tools of probability. Some nice references are
- ‘‘Markov Chain and Mixing Times’’ by Levin, Peres and Wilmer.
- ‘‘Markov Chains’’ by Norris
Starting with elementary examples, this book gives very good hints on how to think about Markov Chains.
- ‘‘Continuous time Markov Processes’’ by Liggett
A theoretical perspective on this important topic in stochastic processes. The text uses Brownian motion as the motivating example.
- ‘‘Convex Optimization’’ by Boyd and Vandenberghe.
Download the book here
- ‘‘Introductory Lectures on Convex Optimization’’ by Nesterov.
- ‘‘The Matrix Cookbook’’ by Petersen and Pedersen: ‘‘Matrix identities, relations and approximations. A desktop reference for quick overview of mathematics of matrices.’’
- ‘‘Matrix Analysis’’ and ‘‘Topics in Matrix Analysis’’ by Horn and Johnson
Second book is more advanced than the first. Everything you need to know about matrix analysis.
- ‘‘A course in Convexity’’ by Barvinok.
A great book for self study and reference. It starts with the basis of convex analysis, then moves on to duality, Krein-Millman theorem, duality, concentration of measure, ellipsoid method and ends with Minkowski bodies, lattices and integer programming. Fairly theoretical and has many fun exercises.
- ‘‘Real Analysis and Probability’’ by Dudley
- ‘‘Probability and Measure Theory’’ by Ash
Nice and easy to digest. Good as companion for 205A
- ‘‘Enumerative Combinatorics Vol I and II’’ by Richard Stanley.
There’s also a course on combinatorics this semester in the math department called Math249: Algebraic Combinatorics. Despite the scary “algebraic” prefix it’s really fun. Download here.
4. Computational Biology
‘Big Picture’ Overview
- ‘‘Modern Statistics for Modern Biology’’ by Susan Holmes and Wolfgang Huber
Accessible ‘data analysis’-focused overview of the field, with numerous motivating examples and plentiful opportunities for hands-on practice. Although written for biologists, can indirectly help with developing an understanding of how to identify problems that impact on biology.
- ‘‘Statistical Methods in Bioinformatics’’ by Ewens and Grant
Great overview of sequencing technology for the unacquainted.
- ‘‘Computational Genome Analysis: An Introduction’’ by Deonier, Tavaré, and Waterman
Great R code examples from computational biology. Discusses the basics, such as the greedy algorithm, etc.
- ‘‘Probability Models for DNA Sequence Evolution’’ by Rick Durrett
- ‘‘Mathematical Population Genetics’’ by Warren Ewens
5. Computer Science
- ‘‘Numerical Analysis’’ by Burden and Faires
This book is a good overview of numerical computation methods for everything you’d need to know about implementing most computational methods you’ll run into in statistics. It is filled with pseudo-code but does use Maple as it’s exemplary language sometimes. It has been a great resource for the Computational Statistics courses (243/244). Depending on what happens with this course, this may be a good place to look when you’re lost in computation.
- ‘‘Introduction to Algorithms’’, Third Edition, by Cormen, Leiserson, Rivest, and Stein.
MIT OpenCourseWare 6.046J / 18.410J ‘‘Introduction to Algorithms’’ (SMA 5503) was taught by one of the authors, Prof. Charles Leiserson, in 2005. This is an undergraduate course and this book was used as the textbook
- ‘‘Algorithm Design’’, by Jon Kleinberg and Éva Tardos.