Parser for the otomoto.pl website
The script collects data (name, price, phone) from listings on the Otomoto website.
The main issues were protecting the site from bots and decoding phone numbers without using browsers for maximum speed and reduced resource usage. The solution involves specially simulating the behavior of a real browser. Also, reverse engineering of the phone number decoding algorithm was carried out.
Features:
* Multithreaded processing of a list of links from a file
* Support for SOCKS5 proxies and their rotation for each request
* Automatic retries on failed requests
* Flexible configuration via launch parameters
* Saving results to a CSV file and detailed error logging
* Ability to work without proxies and in a single thread
Used technologies:
* Programming language: Node.js
* Libraries: got-scraping, p-queue, fast-csv, socks-proxy-agent
#scraping #parsing #Node.js #otomoto #contacts
The main issues were protecting the site from bots and decoding phone numbers without using browsers for maximum speed and reduced resource usage. The solution involves specially simulating the behavior of a real browser. Also, reverse engineering of the phone number decoding algorithm was carried out.
Features:
* Multithreaded processing of a list of links from a file
* Support for SOCKS5 proxies and their rotation for each request
* Automatic retries on failed requests
* Flexible configuration via launch parameters
* Saving results to a CSV file and detailed error logging
* Ability to work without proxies and in a single thread
Used technologies:
* Programming language: Node.js
* Libraries: got-scraping, p-queue, fast-csv, socks-proxy-agent
#scraping #parsing #Node.js #otomoto #contacts