Rozetka scraping
A reliable web scraper, ready for use in a production environment, designed to extract data about laptops from the largest Ukrainian e-commerce platform (Rozetka.ua), with automatic cloud synchronization and advanced bot anti-detection bypassing.
Key technical features:
Cloudflare and bot bypassing: Based on Playwright and playwright-stealth to simulate real human behavior, allowing successful bypassing of strict anti-bot systems and Turnstile CAPTCHAs.
Fault-tolerant architecture: Integrated with an SQLite database to track scraping progress page by page. If the script is interrupted or fails, it automatically resumes from the last page.
Automatic cloud synchronization: Direct integration with the Google Sheets API (gspread) for real-time data dumping and formatting.
Smart filtering: Pre-configured backend filters (1 TB SSD, price < 25,000 UAH, specific leading brands).
Technology stack: Python 3.10+, Playwright, Playwright-Stealth, SQLite, Google Sheets API.
This project demonstrates clear code architecture, proper API integration, database state tracking, and professional web automation skills.
Key technical features:
Cloudflare and bot bypassing: Based on Playwright and playwright-stealth to simulate real human behavior, allowing successful bypassing of strict anti-bot systems and Turnstile CAPTCHAs.
Fault-tolerant architecture: Integrated with an SQLite database to track scraping progress page by page. If the script is interrupted or fails, it automatically resumes from the last page.
Automatic cloud synchronization: Direct integration with the Google Sheets API (gspread) for real-time data dumping and formatting.
Smart filtering: Pre-configured backend filters (1 TB SSD, price < 25,000 UAH, specific leading brands).
Technology stack: Python 3.10+, Playwright, Playwright-Stealth, SQLite, Google Sheets API.
This project demonstrates clear code architecture, proper API integration, database state tracking, and professional web automation skills.