Automated data collection platform Autoria

Data Parsing
Job 1 of 10
A high-performance platform for automated data collection from the automotive marketplace has been developed, which ensures regular monitoring of listings, automatic information updates, and centralized data storage.

The system is built on an asynchronous architecture using Playwright and AsyncIO, supports parallel processing of a large number of pages, automatic scheduling of runs, and database backups.

Core functionality

• automatic collection of listings on a schedule;
• asynchronous multithreaded data processing;
• parallel launching of multiple browsers;
• automatic detection and skipping of duplicates;
• saving information in PostgreSQL;
• administrative panel for launching and controlling the data collection process;
• automatic database backup;
• project deployment using Docker.

Architectural features

• AsyncIO;
• Producer–Consumer Architecture;
• Browser Pool;
• Queue Processing;
• Parallel Workers;
• Scheduled Tasks;
• Docker Deployment.

Technologies used

Python • Playwright • AsyncIO • Django • PostgreSQL • Docker • HTML • CSS

Result

A scalable data collection platform has been developed that operates automatically, provides high processing speed, and is easily scalable to handle large volumes of data.

GitHub:
https://github.com/ShotPuter/autorio_parser