Product parser for import to Prom
Project Description:
I developed a scalable parser for product pages that collects data about products from various e-commerce websites and prepares it in a format suitable for import to the Prom marketplace (or other platforms).
The parser automatically extracts the product name, description, specifications, category, prices, stock levels, article number (SKU), barcode (GTIN/EAN), variants (sizes/colors), as well as links to media files — and generates valid exports (CSV/Excel/XML + archive with images) for quick upload to Prom.
The system is designed for mass runs: support for multiple sources, reliability during long-term collections, mechanisms to bypass anti-bot protection, and convenient tools for mapping fields to the marketplace requirements.
Functionality:
Automatic collection of products by categories, search queries, and lists of URLs.
Collection of a complete set of fields: name, description, brand, category, specifications (attributes), prices (retail/wholesale), availability/stock, article number (SKU), GTIN/EAN, links to images and galleries.
Collection of product variants (sizes, colors) and formation of separate items or combinations for import.
Uploading and caching images; creating a ZIP archive with prepared images.
Mapping fields to the Prom import format (CSV/XML) with customizable templates and transformation rules.
Validation of the export feed: checking mandatory fields, correctness of prices and availability, error reports.
Proxy rotation, changing User-Agent, timings, and semaphores to minimize the risk of blocks.
Processing dynamic pages through Playwright/Selenium for websites with JS rendering.
Deduplication by article number/URL and incremental updates to avoid product duplication.
Scheduler/queue for regular updates of prices and stock levels (cron / Celery).
Logs, metrics, and detailed reports on the run (number of processed products, errors, omissions).
I developed a scalable parser for product pages that collects data about products from various e-commerce websites and prepares it in a format suitable for import to the Prom marketplace (or other platforms).
The parser automatically extracts the product name, description, specifications, category, prices, stock levels, article number (SKU), barcode (GTIN/EAN), variants (sizes/colors), as well as links to media files — and generates valid exports (CSV/Excel/XML + archive with images) for quick upload to Prom.
The system is designed for mass runs: support for multiple sources, reliability during long-term collections, mechanisms to bypass anti-bot protection, and convenient tools for mapping fields to the marketplace requirements.
Functionality:
Automatic collection of products by categories, search queries, and lists of URLs.
Collection of a complete set of fields: name, description, brand, category, specifications (attributes), prices (retail/wholesale), availability/stock, article number (SKU), GTIN/EAN, links to images and galleries.
Collection of product variants (sizes, colors) and formation of separate items or combinations for import.
Uploading and caching images; creating a ZIP archive with prepared images.
Mapping fields to the Prom import format (CSV/XML) with customizable templates and transformation rules.
Validation of the export feed: checking mandatory fields, correctness of prices and availability, error reports.
Proxy rotation, changing User-Agent, timings, and semaphores to minimize the risk of blocks.
Processing dynamic pages through Playwright/Selenium for websites with JS rendering.
Deduplication by article number/URL and incremental updates to avoid product duplication.
Scheduler/queue for regular updates of prices and stock levels (cron / Celery).
Logs, metrics, and detailed reports on the run (number of processed products, errors, omissions).