Learning Data Mining with Python

Harness the power of Python to analyze data and create insightful predictive models

Learning Data Mining with Python

Learning
Robert Layton

Harness the power of Python to analyze data and create insightful predictive models
$35.99
$44.99
RRP $35.99
RRP $44.99
eBook
Print + eBook
$12.99 p/month

Get Access

Get Unlimited Access to every Packt eBook and Video course

Enjoy full and instant access to over 3000 books and videos โ€“ youโ€™ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

+ Collection
Free Sample

Book Details

ISBN 139781784396053
Paperback344 pages

Book Description

The next step in the information age is to gain insights from the deluge of data coming our way. Data mining provides a way of finding this insight, and Python is one of the most popular languages for data mining, providing both power and flexibility in analysis.

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. Next, we move on to more complex data types including text, images, and graphs. In every chapter, we create models that solve real-world problems.

There is a rich and varied set of libraries available in Python for data mining. This book covers a large number, including the IPython Notebook, pandas, scikit-learn and NLTK.

Each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will gain a large insight into using Python for data mining, with a good knowledge and understanding of the algorithms and implementations.

Table of Contents

Chapter 1: Getting Started with Data Mining
Introducing data mining
Using Python and the IPython Notebook
A simple affinity analysis example
A simple classification example
What is classification?
Summary
Chapter 2: Classifying with scikit-learn Estimators
scikit-learn estimators
Preprocessing using pipelines
Pipelines
Summary
Chapter 3: Predicting Sports Winners with Decision Trees
Loading the dataset
Decision trees
Sports outcome prediction
Random forests
Summary
Chapter 4: Recommending Movies Using Affinity Analysis
Affinity analysis
The movie recommendation problem
The Apriori implementation
Extracting association rules
Summary
Chapter 5: Extracting Features with Transformers
Feature extraction
Feature selection
Feature creation
Creating your own transformer
Summary
Chapter 6: Social Media Insight Using Naive Bayes
Disambiguation
Text transformers
Naive Bayes
Application
Summary
Chapter 7: Discovering Accounts to Follow Using Graph Mining
Loading the dataset
Finding subgraphs
Summary
Chapter 8: Beating CAPTCHAs with Neural Networks
Artificial neural networks
Creating the dataset
Training and classifying
Improving accuracy using a dictionary
Summary
Chapter 9: Authorship Attribution
Attributing documents to authors
Function words
Support vector machines
Character n-grams
Using the Enron dataset
Summary
Chapter 10: Clustering News Articles
Obtaining news articles
Extracting text from arbitrary websites
Grouping news articles
Clustering ensembles
Online learning
Summary
Chapter 11: Classifying Objects in Images Using Deep Learning
Object classification
Application scenario and goals
Deep neural networks
GPU optimization
Setting up the environment
Application
Summary
Chapter 12: Working with Big Data
Big data
Application scenario and goals
MapReduce
Application
Summary

What You Will Learn

  • Apply data mining concepts to real-world problems
  • Predict the outcome of sports matches based on past results
  • Determine the author of a document based on their writing style
  • Use APIs to download datasets from social media and other online services
  • Find and extract good features from difficult datasets
  • Create models that solve real-world problems
  • Design and develop data mining applications using a variety of datasets
  • Set up reproducible experiments and generate robust results
  • Recommend movies, online celebrities, and news articles based on personal preferences
  • Compute on big data, including real-time data from the Internet

Authors

Table of Contents

Chapter 1: Getting Started with Data Mining
Introducing data mining
Using Python and the IPython Notebook
A simple affinity analysis example
A simple classification example
What is classification?
Summary
Chapter 2: Classifying with scikit-learn Estimators
scikit-learn estimators
Preprocessing using pipelines
Pipelines
Summary
Chapter 3: Predicting Sports Winners with Decision Trees
Loading the dataset
Decision trees
Sports outcome prediction
Random forests
Summary
Chapter 4: Recommending Movies Using Affinity Analysis
Affinity analysis
The movie recommendation problem
The Apriori implementation
Extracting association rules
Summary
Chapter 5: Extracting Features with Transformers
Feature extraction
Feature selection
Feature creation
Creating your own transformer
Summary
Chapter 6: Social Media Insight Using Naive Bayes
Disambiguation
Text transformers
Naive Bayes
Application
Summary
Chapter 7: Discovering Accounts to Follow Using Graph Mining
Loading the dataset
Finding subgraphs
Summary
Chapter 8: Beating CAPTCHAs with Neural Networks
Artificial neural networks
Creating the dataset
Training and classifying
Improving accuracy using a dictionary
Summary
Chapter 9: Authorship Attribution
Attributing documents to authors
Function words
Support vector machines
Character n-grams
Using the Enron dataset
Summary
Chapter 10: Clustering News Articles
Obtaining news articles
Extracting text from arbitrary websites
Grouping news articles
Clustering ensembles
Online learning
Summary
Chapter 11: Classifying Objects in Images Using Deep Learning
Object classification
Application scenario and goals
Deep neural networks
GPU optimization
Setting up the environment
Application
Summary
Chapter 12: Working with Big Data
Big data
Application scenario and goals
MapReduce
Application
Summary

Book Details

ISBN 139781784396053
Paperback344 pages
Read More