Book Description
You will learn techniques to develop high performance Scrapers, know how to deal with cookies, hidden form fields, ajax-based sites, proxying etc, and explore a number of real-world scenarios where every part of the development/product life cycle will be fully covered. You will not only develop skills to design and develop reliable, performant data flow, but also how to deploy your code-base to an infrastructure like Aws and Heroku. If you are in the fields of software engineering, product development, data mining or are interested in building data-driven products, you will find this book useful as each each recipe has a clear purpose and objective.
Right from extracting data from the websites to writing a sophisticated web crawler, the independent recipes will be there for your rescue on the job. This book covers Python libraries - requests and BeautifulSoup. You will learn about crawling, spidering, working with AJAX websites, paginated items, and more. You will also learn to tackle problems such as 403 errors, working with proxy, scraping images, lxml, and more.
With this book, you will be able to scrape websites more efficiently with more accurate data , and how to put data together.