Python/Selenium data parser from avto.pro
The task was to extract data about sellers, their stores, and auto services from the site avto.pro.
Since most of the required content is loaded on the pages dynamically, I accomplished this using #selenium. The main difficulty in this project was finding the URLs of the seller pages, as there is no specific list or page where they can be found.
The site is designed in such a way (possibly intentionally) that in order to access a seller's page, one must go through the process of selecting a car or spare part to choose a specific detail. Only then, by clicking on it, do we see its seller.
Therefore, my parser will click through all the details on all pages (and there are more than a million) and extract the necessary information.
The parsing result is stored in a #sqlite database and exported to an #Excel #xlsx file.
Since most of the required content is loaded on the pages dynamically, I accomplished this using #selenium. The main difficulty in this project was finding the URLs of the seller pages, as there is no specific list or page where they can be found.
The site is designed in such a way (possibly intentionally) that in order to access a seller's page, one must go through the process of selecting a car or spare part to choose a specific detail. Only then, by clicking on it, do we see its seller.
Therefore, my parser will click through all the details on all pages (and there are more than a million) and extract the necessary information.
The parsing result is stored in a #sqlite database and exported to an #Excel #xlsx file.