Amazon parser - scraper, Product parser!

C & C++, PHP — incorrectly specified categories?

223 USD

Project translated automatically. Log in or register, to view the original

Technical assignment for the development of a parser (scraper) for Amazon

Task: It is necessary to develop a reliable and fault-tolerant scraper for obtaining information from Amazon for a large number of products (millions of ASINs). The scraper must operate stably in 24/7 mode and minimize the occurrence of HTTP 503 errors (blocking or access restrictions).

Mandatory requirements:

Data parsing:
- Obtaining product information: name, price, rating, number of reviews, stock availability, product description, images, and other information from the product page based on a given list of ASINs.
- Support for a large volume of requests (from 100,000 to several million products).
Stability and scalability:
- The system must operate around the clock (24/7), without regular stops and the need for manual restarts.
- Provide mechanisms for request balancing, use of proxy servers, IP address rotation, as well as request delay mechanisms to minimize the risk of blocks and HTTP 503 errors.
Bypassing Amazon's protection and restrictions:
- Provide methods for bypassing Amazon's anti-bot protection (CAPTCHA, IP blocking, User-Agent restrictions, etc.).
- Use mechanisms for automatic recognition and solving of CAPTCHA (for example, using anti-captcha services).
Proxy management:
- The system must integrate the use of proxy servers with the ability for automatic rotation and monitoring of their performance.
- Set up monitoring of proxy quality, excluding blocked and slow IPs.
Error management and logging:
- Implement logging of all scraper actions: successful requests, errors, blocks, and response times.
- Implement a system for automatic request retries in case of errors, with configurable retry counts and intervals between them.
Data format and storage:
- Ability to export data in convenient formats (CSV, JSON, databases).
- Implement a fast and efficient structure for storing the obtained data.
Management interface (optional):
- Ability to conveniently manage tasks, view statistics, and the status of the scraper through a web interface or API.

Requirements for the performer:

Experience with web scraping from Amazon.
Knowledge of technologies and tools for bypassing protection (proxy, anti-captcha).
Experience with large volumes of data and asynchronous requests.

Expected result: A working, stable, and scalable tool capable of performing tasks for parsing a large amount of data from Amazon around the clock, minimizing the likelihood of blocks and errors.

Proposals 4 Discussions 2

Andrey Pevkin
Kyiv, Ukraine

Projects 2
Rating -
Rating 165

Amazon parser - scraper, Product parser!

Proposals concealed

Proposals are currently absent