Chapter 2: The Shape of Data
Populations, samples, and estimation
Probability distributions
Chapter 3: Describing Relationships
Relationships between a categorical and a continuous variable
Relationships between two categorical variables
The relationship between two continuous variables
Chapter 4: Probability
A tale of two interpretations
Sampling from distributions
Chapter 5: Using Data to Reason About the World
The sampling distribution
Chapter 6: Testing Hypotheses
Null Hypothesis Significance Testing
Testing the mean of one sample
Testing more than two means
Testing independence of proportions
What if my assumptions are unfounded?
Chapter 7: Bayesian Methods
The big idea behind Bayesian analysis
Who cares about coin flips
Enter MCMC โ stage left
Fitting distributions the Bayesian way
The Bayesian independent samples t-test
Chapter 8: Predicting Continuous Variables
Simple linear regression with a binary predictor
Regression with a non-binary predictor
The bias-variance trade-off
Linear regression diagnostics
Chapter 9: Predicting Categorical Variables
Chapter 10: Sources of Data
Chapter 11: Dealing with Messy Data
Analysis with missing data
Analysis with unsanitized data
Chapter 12: Dealing with Large Data
Using a bigger and faster machine
Using another R implementation
Be smarter about your code
Chapter 13: Reproducibility and Best Practices
Chapter 14: R Graphics
Base graphics using the default package
Trellis graphs using lattice
Graphs inspired by Grammar of Graphics
Chapter 15: Basic Graph Functions
Creating basic scatter plots
Creating histograms and density plots
Adjusting x and y axes' limits
Creating multiple plot matrix layouts
Adding and formatting legends
Creating graphs with maps
Saving and exporting graphs
Chapter 16: Beyond the Basics โ Adjusting Key Parameters
Setting colors of points, lines, and bars
Setting plot background colors
Setting colors for text elements โ axis annotations, labels, plot titles, and legends
Choosing color combinations and palettes
Setting fonts for annotations and titles
Choosing plotting point symbol styles and sizes
Choosing line styles and width
Adjusting axis annotations and tick marks
Setting graph margins and dimensions
Chapter 17: Creating Scatter Plots
Grouping data points within a scatter plot
Highlighting grouped data points by size and symbol type
Correlation matrix using pairs plots
Using jitter to distinguish closely packed data points
Adding linear model lines
Adding nonlinear model curves
Adding nonparametric model curves with lowess
Creating three-dimensional scatter plots
Creating Quantile-Quantile plots
Displaying the data density on axes
Creating scatter plots with a smoothed density representation
Chapter 18: Creating Line Graphs and Time Series Charts
Adding customized legends for multiple-line graphs
Using margin labels instead of legends for multiple-line graphs
Adding horizontal and vertical grid lines
Adding marker lines at specific x and y values using abline
Plotting functions of a variable in a dataset
Formatting time series data for plotting
Plotting the date or time variable on the x axis
Annotating axis labels in different human-readable time formats
Adding vertical markers to indicate specific time events
Plotting data with varying time-averaging periods
Chapter 19: Creating Bar, Dot, and Pie Charts
Creating bar charts with more than one factor variable
Creating stacked bar charts
Adjusting the orientation of bars โ horizontal and vertical
Adjusting bar widths, spacing, colors, and borders
Displaying values on top of or next to the bars
Placing labels inside bars
Creating bar charts with vertical error bars
Modifying dot charts by grouping variables
Making better, readable pie charts with clockwise-ordered slices
Labeling a pie chart with percentage values for each slice
Adding a legend to a pie chart
Chapter 20: Creating Histograms
Visualizing distributions as count frequencies or probability densities
Setting the bin size and the number of breaks
Adjusting histogram styles โ bar colors, borders, and axes
Overlaying a density line over a histogram
Multiple histograms along the diagonal of a pairs plot
Histograms in the margins of line and scatter plots
Chapter 21: Box and Whisker Plots
Creating box plots with narrow boxes for a small number of variables
Varying box widths by the number of observations
Creating box plots with notches
Including or excluding outliers
Creating horizontal box plots
Adjusting the extent of plot whiskers outside the box
Showing the number of observations
Splitting a variable at arbitrary values into subsets
Chapter 22: Creating Heat Maps and Contour Plots
Creating heat maps of a single Z variable with a scale
Creating correlation heat maps
Summarizing multivariate data in a single heat map
Creating filled contour plots
Creating three-dimensional surface plots
Visualizing time series as calendar heat maps
Chapter 23: Creating Maps
Plotting global data by countries on a world map
Creating graphs with regional maps
Plotting data on Google maps
Creating and reading KML data
Working with ESRI shapefiles
Chapter 24: Data Visualization Using Lattice
Creating stacked bar charts
Creating bar charts to visualize cross-tabulation
Creating a conditional histogram
Visualizing distributions through a kernel-density plot
Creating a normal Q-Q plot
Visualizing an empirical Cumulative Distribution Function
Creating a conditional scatter plot
Chapter 25: Data Visualization Using ggplot2
Creating multiple bar charts
Creating a bar chart with error bars
Visualizing the density of a numeric variable
Creating a layered plot with a scatter plot and fitted line
Graph annotation with ggplot
Chapter 26: Inspecting Large Datasets
Multivariate continuous data visualization
Multivariate categorical data visualization
Chapter 27: Three-dimensional Visualizations
Three-dimensional scatter plots
Three-dimensional scatter plots with a regression plane
Three-dimensional bar charts
Three-dimensional density plots
Chapter 28: Finalizing Graphs for Publications and Presentations
Exporting graphs in high-resolution image formats โ PNG, JPEG, BMP, and TIFF
Exporting graphs in vector formats โ SVG, PDF, and PS
Adding mathematical and scientific notations (typesetting)
Adding text descriptions to graphs
Choosing font families and styles under Windows, Mac OS X, and Linux
Choosing fonts for PostScripts and PDFs
Chapter 29: Warming Up
Data attributes and description
Data transformation and discretization
Chapter 30: Mining Frequent Patterns, Associations, and Correlations
An overview of associations and patterns
Hybrid association rules mining
High-performance algorithms
Chapter 31: Classification
Generic decision tree induction
High-value credit card customers classification using ID3
Web spam detection using C4.5
Web key resource page judgment using CART
Trojan traffic identification method and Bayes classification
Identify spam e-mail and Naรฏve Bayes classification
Rule-based classification of player types in computer games and rule-based classification
Chapter 32: Advanced Classification
Biological traits and the Bayesian belief network
Protein classification and the k-Nearest Neighbors algorithm
Document retrieval and Support Vector Machine
Classification using frequent patterns
Classification using the backpropagation algorithm
Chapter 33: Cluster Analysis
Search engines and the k-means algorithm
Automatic abstraction of document texts and the k-medoids algorithm
Unsupervised image categorization and affinity propagation clustering
News categorization and hierarchical clustering
Chapter 34: Advanced Cluster Analysis
Customer categorization analysis of e-commerce and DBSCAN
Clustering web pages and OPTICS
Visitor analysis in the browser cache and DENCLUE
Recommendation system and STING
Web sentiment analysis and CLIQUE
Opinion mining and WAVE clustering
User search intent and the EM algorithm
Customer purchase data analysis and clustering high-dimensional data
SNS and clustering graph and network data
Chapter 35: Outlier Detection
Credit card fraud detection and statistical methods
Activity monitoring โ the detection of fraud involving mobile phones and proximity-based methods
Intrusion detection and density-based methods
Intrusion detection and clustering-based methods
Monitoring the performance of the web server and classification-based methods
Detecting novelty in text, topic detection, and mining contextual outliers
Collective outliers on spatial data
Outlier detection in high-dimensional data
Chapter 36: Mining Stream, Time-series, and Sequence Data
The credit card transaction flow and STREAM algorithm
Predicting future prices and time-series analysis
Stock market data and time-series clustering and classification
Web click streams and mining symbolic sequences
Mining sequence patterns in transactional databases
Chapter 37: Graph Mining and Network Analysis
Mining frequent subgraph patterns
Chapter 38: Mining Text and Web Data
Text mining and TM packages
The question answering system
Genre categorization of web pages
Categorizing newspaper articles and newswires into topics
Web usage mining with web logs
Chapter 39: Time Series Analysis
Multivariate time series analysis
References and reading list
Chapter 40: Factor Models
Chapter 41: Forecasting Volume
The volume forecasting model
Chapter 42: Big Data โ Advanced Analytics
Getting data from open sources
Introduction to big data analysis in R
K-means clustering on big data
Big data linear regression analysis
Chapter 43: FX Derivatives
Terminology and notations
Chapter 44: Interest Rate Derivatives and Models
The Cox-Ingersoll-Ross model
Parameter estimation of interest rate models
Chapter 45: Exotic Options
A general pricing approach
The role of dynamic hedging
Greeks โ the link back to the vanilla world
Pricing the Double-no-touch option
Another way to price the Double-no-touch option
The life of a Double-no-touch option โ a simulation
Exotic options embedded in structured products
Chapter 46: Optimal Hedging
Hedging in the presence of transaction costs
Chapter 47: Fundamental Analysis
The basics of fundamental analysis
Including multiple variables
Separating investment targets
Setting classification rules
Industry-specific investment
Chapter 48: Technical Analysis, Neural Networks, and Logoptimal Portfolios
Chapter 49: Asset and Liability Management
Interest rate risk measurement
Liquidity risk measurement
Modeling non-maturity deposits
Chapter 50: Capital Adequacy
Principles of the Basel Accords
Chapter 51: Systemic Risks
Systemic risk in a nutshell
The dataset used in our examples
Core-periphery decomposition
Possible interpretations and suggestions
Chapter 52: Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
Machine learning in practice
Chapter 53: Managing and Understanding Data
Exploring and understanding data
Chapter 54: Lazy Learning โ Classification Using Nearest Neighbors
Understanding nearest neighbor classification
Example โ diagnosing breast cancer with the k-NN algorithm
Chapter 55: Probabilistic Learning โ Classification Using Naive Bayes
Understanding Naive Bayes
Example โ filtering mobile phone spam with the Naive Bayes algorithm
Chapter 56: Divide and Conquer โ Classification Using Decision Trees and Rules
Understanding decision trees
Example โ identifying risky bank loans using C5.0 decision trees
Understanding classification rules
Example โ identifying poisonous mushrooms with rule learners
Chapter 57: Forecasting Numeric Data โ Regression Methods
Example โ predicting medical expenses using linear regression
Understanding regression trees and model trees
Example โ estimating the quality of wines with regression trees and model trees
Chapter 58: Black Box Methods โ Neural Networks and Support Vector Machines
Understanding neural networks
Example โ Modeling the strength of concrete with ANNs
Understanding Support Vector Machines
Example โ performing OCR with SVMs
Chapter 59: Finding Patterns โ Market Basket Analysis Using Association Rules
Understanding association rules
Example โ identifying frequently purchased groceries with association rules
Chapter 60: Finding Groups of Data โ Clustering with k-means
Example โ finding teen market segments using k-means clustering
Chapter 61: Evaluating Model Performance
Measuring performance for classification
Estimating future performance
Chapter 62: Improving Model Performance
Tuning stock models for better performance
Improving model performance with meta-learning
Chapter 63: Specialized Machine Learning Topics
Working with proprietary files and databases
Working with online data and services
Working with domain-specific data
Improving the performance of R