Otodom scraping
A reliable, ready-to-implement automation tool has been developed in Python, specifically designed for extracting large volumes of data from the Otodom.pl website. This project demonstrates a high level of competence in browser automation, database management, and data structuring.
Key technical features:
Constant data extraction state: an SQLite3 database backend has been integrated to track the progress of data extraction. This allows the script to remember the last processed page, ensuring instant recovery after interruptions — a critical feature for reliable data extraction.
Advanced browser automation: Playwright has been used to handle dynamic content, bypass cookie consent pop-ups, and simulate human interaction through smooth scrolling and random delays.
Clean data pipeline: automatically extracts and cleans complex fields, including title, price, price per m², area, number of rooms, and location.
Real-time export: a continuous data export system has been developed that saves results in Excel (.xlsx) after each page to prevent data loss.
Code quality: developed using an object-oriented programming (OOP) approach for maximum maintainability and scalability.
Technical stack:
Backend: Python
Automation: Playwright (Chromium)
Key technical features:
Constant data extraction state: an SQLite3 database backend has been integrated to track the progress of data extraction. This allows the script to remember the last processed page, ensuring instant recovery after interruptions — a critical feature for reliable data extraction.
Advanced browser automation: Playwright has been used to handle dynamic content, bypass cookie consent pop-ups, and simulate human interaction through smooth scrolling and random delays.
Clean data pipeline: automatically extracts and cleans complex fields, including title, price, price per m², area, number of rooms, and location.
Real-time export: a continuous data export system has been developed that saves results in Excel (.xlsx) after each page to prevent data loss.
Code quality: developed using an object-oriented programming (OOP) approach for maximum maintainability and scalability.
Technical stack:
Backend: Python
Automation: Playwright (Chromium)