TGStat parser with graphical interface
Desktop application in Python for collecting data on Telegram channels and chats from the TGStat website. The application features a full graphical interface implemented in PyQt6 and uses the DrissionPage library to control the Chromium browser and parse web pages.
Key features:
- Graphical User Interface (GUI): An intuitive interface in PyQt6 allows easy configuration of collection parameters, starting/stopping the process, and monitoring execution logs in real-time.
- Browser Management: DrissionPage is used for browser automation, supporting both regular and headless modes.
- Cloudflare Bypass: A class is implemented for automatic passage of basic Cloudflare checks ("Just a moment...").
- Authentication Support: The application checks for an active session on TGStat and, if necessary, waits for manual user login, saving the profile for future runs.
- Two parsing modes:
- By Categories: Flexible selection of countries and categories for bulk data collection.
- By Links: Collecting information from a provided list of direct URLs.
- Filtering: Ability to select type (channels/chats) and set a minimum threshold for the number of subscribers.
- Multithreading: Browser connection and parsing tasks are executed in separate threads (QThread), preventing the interface from freezing.
- Data Export: Collected data (name, subscribers, link, category, etc.) is automatically saved to an .xlsx file using pandas.
Stack: Python, PyQt6, DrissionPage, pandas.
Key features:
- Graphical User Interface (GUI): An intuitive interface in PyQt6 allows easy configuration of collection parameters, starting/stopping the process, and monitoring execution logs in real-time.
- Browser Management: DrissionPage is used for browser automation, supporting both regular and headless modes.
- Cloudflare Bypass: A class is implemented for automatic passage of basic Cloudflare checks ("Just a moment...").
- Authentication Support: The application checks for an active session on TGStat and, if necessary, waits for manual user login, saving the profile for future runs.
- Two parsing modes:
- By Categories: Flexible selection of countries and categories for bulk data collection.
- By Links: Collecting information from a provided list of direct URLs.
- Filtering: Ability to select type (channels/chats) and set a minimum threshold for the number of subscribers.
- Multithreading: Browser connection and parsing tasks are executed in separate threads (QThread), preventing the interface from freezing.
- Data Export: Collected data (name, subscribers, link, category, etc.) is automatically saved to an .xlsx file using pandas.
Stack: Python, PyQt6, DrissionPage, pandas.