Skip to content

G-Theuri/scraping-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

476 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Scraping Projects

A collection of web scraping and data extraction projects built with Python and various scraping libraries.


Technologies Used

Scraping & Automation

  • Selenium β€” Browser automation and dynamic content scraping
  • SeleniumBase β€” Enhanced Selenium with anti-bot bypass
  • Undetected ChromeDriver β€” Stealth browser automation
  • Selenium Driverless β€” CDP-based browser control without chromedriver
  • Playwright β€” Modern browser automation
  • Scrapy β€” Large-scale web crawling and scraping framework
  • Crawlee β€” Web scraping and crawling library
  • BeautifulSoup β€” HTML and XML parsing
  • Requests β€” HTTP requests and REST API interactions
  • curl_cffi β€” HTTP client with browser fingerprint impersonation

Data & Visualization

  • Pandas β€” Data manipulation and analysis
  • Streamlit β€” Interactive data dashboards and visualization

Structure

scraping-projects/
  β”œβ”€β”€ milestone-projects/
  β”œβ”€β”€ other-projects/
  β”œβ”€β”€.gitignore
  β”œβ”€β”€README.md


Note

Some projects in this repository were built for learning and experimentation purposes. All scraping was done responsibly and in accordance with the respective website's terms of service.

About

A collection of my data engineering projects spanning web scraping, pipeline orchestration, and personal experiments across the data stack.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors