Etsy parser with asynchronous data collection and visualization in Excel
Used libraries:
● playwright: for browser initialization, obtaining cookies, loading Etsy pages, and bypassing protection mechanisms.
● BeautifulSoup (bs4): for parsing HTML pages, finding necessary elements (e.g., product links) and filtering them by required parameters.
● openpyxl: for creating tables, formatting cells, inserting text data and images into .xlsx files.
● Pillow (PIL): for image processing, including resizing images before inserting them into Excel.
● httpx: for asynchronous loading of product images from high-quality links.
Main tasks:
● Data collection automation – Searching for products by keywords, filtering by store name, bypassing CAPTCHA.
● Information processing – Parsing HTML to obtain product names, IDs, images, resizing images.
● Saving results – Creating an Excel table, inserting text and images, formatting the table.
● Asynchronicity – Simultaneous processing of requests and loading images.
● Flexibility of settings – Configuring input parameters, scan depth, and pauses.
Implementation process:
1. Data collection:
Data is collected using the EtsyClient class, which encapsulates functions for interacting with the Etsy platform, collecting keywords, loading pages, and processing results. BeautifulSoup libraries are used for parsing and httpx for loading product images. Data is organized into a structure ready for saving to a file.
2. Data processing and saving:
The openpyxl library is used to save the collected data. An Excel table is created, into which both text data about products and product images are recorded. For each product, the image size is automatically adjusted before insertion to ensure correct display in the table.
3. Asynchronicity and efficiency:
The data collection and processing process is implemented asynchronously, allowing multiple requests to be processed and images to be loaded simultaneously. Thanks to the asynchronous approach, the data collection process is significantly accelerated, reducing the program's execution time.
4. Flexibility of settings:
The program is easily configurable to work with different stores on Etsy due to the use of class variables, allowing parameters to be changed without the need to modify the code.
Tags:
#python #parsers #Parsing #playwright #webscraping #Parsers #scrape #beautifulsoup #beautifulsoup4 #bs4 #pillow #openpyxl
● playwright: for browser initialization, obtaining cookies, loading Etsy pages, and bypassing protection mechanisms.
● BeautifulSoup (bs4): for parsing HTML pages, finding necessary elements (e.g., product links) and filtering them by required parameters.
● openpyxl: for creating tables, formatting cells, inserting text data and images into .xlsx files.
● Pillow (PIL): for image processing, including resizing images before inserting them into Excel.
● httpx: for asynchronous loading of product images from high-quality links.
Main tasks:
● Data collection automation – Searching for products by keywords, filtering by store name, bypassing CAPTCHA.
● Information processing – Parsing HTML to obtain product names, IDs, images, resizing images.
● Saving results – Creating an Excel table, inserting text and images, formatting the table.
● Asynchronicity – Simultaneous processing of requests and loading images.
● Flexibility of settings – Configuring input parameters, scan depth, and pauses.
Implementation process:
1. Data collection:
Data is collected using the EtsyClient class, which encapsulates functions for interacting with the Etsy platform, collecting keywords, loading pages, and processing results. BeautifulSoup libraries are used for parsing and httpx for loading product images. Data is organized into a structure ready for saving to a file.
2. Data processing and saving:
The openpyxl library is used to save the collected data. An Excel table is created, into which both text data about products and product images are recorded. For each product, the image size is automatically adjusted before insertion to ensure correct display in the table.
3. Asynchronicity and efficiency:
The data collection and processing process is implemented asynchronously, allowing multiple requests to be processed and images to be loaded simultaneously. Thanks to the asynchronous approach, the data collection process is significantly accelerated, reducing the program's execution time.
4. Flexibility of settings:
The program is easily configurable to work with different stores on Etsy due to the use of class variables, allowing parameters to be changed without the need to modify the code.
Tags:
#python #parsers #Parsing #playwright #webscraping #Parsers #scrape #beautifulsoup #beautifulsoup4 #bs4 #pillow #openpyxl