Data collection from the Slovak Ministry of Justice register
Development of a Python script for automating the collection of data from the Commercial Register of the Slovak Ministry of Justice.
The script uses:
requests for fetching web pages,
BeautifulSoup for parsing HTML,
ThreadPoolExecutor for multithreading and speeding up the process,
xlsxwriter and openpyxl for saving data in Excel format.
Key Tasks:
Overcome the website’s limitation on the number of records returned per query.
Implement an iterative and optimized data scraping process.
Results:
Successfully collected and processed over 300,000 records.
The solution demonstrated high scalability and reliability.
Data was prepared in a format convenient for analysis.
The script uses:
requests for fetching web pages,
BeautifulSoup for parsing HTML,
ThreadPoolExecutor for multithreading and speeding up the process,
xlsxwriter and openpyxl for saving data in Excel format.
Key Tasks:
Overcome the website’s limitation on the number of records returned per query.
Implement an iterative and optimized data scraping process.
Results:
Successfully collected and processed over 300,000 records.
The solution demonstrated high scalability and reliability.
Data was prepared in a format convenient for analysis.