Viktor Gayoha
Rating
Language proficiency level
Skills and abilities
Portfolio
-
Parsing a protected SPA website, Bypassing Cloudflare and anti-bot systems
Data ParsingObjective: Gather 100% accurate data on over 1000 exhibitors (name, country, booth number, hidden emails and phone numbers, categories) from the official Salone del Mobile website.
Main challenges:
… Aggressive anti-bot protection (Cloudflare): Standard requests (requests/httpx) returned 403 Forbidden. Regular headless browsers (Selenium, Playwright) and even frameworks like undetected-chromedriver were instantly blocked.
Complex SPA architecture (React / Next.js): The website did not have standard HTML links. All navigation occurred exclusively through React event handlers (onClick), making traditional URL collection impossible. Additionally, contact details were hidden in non-semantic tags (for example,).
My solution:
To achieve perfect accuracy and bypass protection, I developed a custom hybrid approach:
Connection via Chrome DevTools Protocol (CDP): Instead of launching a new instance of an automated browser, my script used Playwright to connect to an already running, "live" session of Google Chrome (http://localhost:9222). This provided a 100% "trust factor" of a legitimate user (along with real cookies, history, and Canvas fingerprints). Cloudflare was bypassed without any solved captchas.
Intelligent navigation: The script visually mimicked human behavior — intercepting dynamic locators, physically clicking the mouse to trigger React states, and using the site's internal router to return to the list while maintaining pagination.
HTML parsing: The captured page state was processed through BeautifulSoup and complex regular expressions (Regex) for accurate extraction of "broken" or poorly formatted links and phone numbers.
Technologies used:
Python 3.12
Playwright (Sync API): interaction with the DOM and connection via CDP.
BeautifulSoup4 & Regex: precise searching and data extraction.
Pandas: structuring and exporting data into clean CSV (UTF-8 with BOM) and Excel.
Result:
The script autonomously collected and perfectly formatted data for over 1200 companies. The created architecture allows for scalable parsing without the risk of getting banned by IP.
Scraper for generating B2B leads (Corporate databases)
Data ParsingObjective: Develop an automated web scraper in Python to collect structured contact and financial data of potential B2B clients from public business directories.
My solution and technical implementation:
… Parsing HTML tables: The script efficiently navigates through directory pages and extracts the necessary information from the complex tabular structure of the websites using the BeautifulSoup library.
Operational stability: To prevent blocking by target servers, custom HTTP headers were configured to mimic requests from a real browser. This ensured uninterrupted data collection during long sessions.
Deep data cleaning: The collected "raw" information often contained extraneous characters and formatting artifacts. Using the Pandas library, I implemented logic for automatic cleaning of key metrics. For example, the fields "Company Revenue" and "Number of Employees" were programmatically cleaned of text and converted into strict numerical values.
Preparation for CRM: The final dataset is automatically exported in a valid CSV format with the correct column structure.
Technologies used:
Python, BeautifulSoup, Pandas, HTTP Headers Configuration.
Result:
The client received a fully automated lead generation tool. The output is a perfectly clean CSV file that can be instantly imported into any CRM system without the need for additional manual processing or formatting error corrections.
Extended E-commerce parser (Selenium and bypassing anti-bot protection)
Data ParsingObjective: Develop a robust web scraper to collect real-time product data from dynamic e-commerce platforms (such as eBay) for price monitoring and analytics.
Main challenges:
… Dynamic content: Data was loaded through complex JavaScript/AJAX requests rather than being simply present in HTML.
Anti-bot systems: Platforms used advanced algorithms to block automated actions.
Unstable layout: The structure of the pages (DOM) could change, causing regular hard-coded parsers to break instantly.
My solution:
Bypassing protection: I used Selenium with flexible stealth configurations for the webdriver. To make the script appear like a real person, I added natural behavior simulation (random delays between clicks, scrolling), which allowed data collection without the risk of being blocked.
Code resilience (Fallback Selectors): I implemented a system of dynamic fallback selectors. If the online store slightly changed its design or layout, the script did not crash with an error but automatically switched to a backup method of element searching and continued working.
Automatic navigation: I set up reliable pagination, allowing the autonomous collection of hundreds of listings from multiple pages in a single run.
Deep data cleaning: Raw data from online stores often contains junk. I applied regular expressions (Regex) to clean the text (for example, extracting the pure price without currency and spaces) and used Pandas to sort the final dataset by ascending price.
Technologies used: Python, Selenium (Stealth), Pandas, Regex (Regular expressions).
Result:
The client received not just a script, but a reliable tool. The output consisted of perfectly formatted, sorted, and production-ready CSV files that could be immediately uploaded to analytical systems or databases.
Reviews and compliments on completed projects 2
7 April
188 USD
Parsing product images for an online store
Incredibly satisfied with the collaboration! Very cool approach, the performer does not just wait for instructions, but shows initiative and finds optimal solutions to complex issues. Always in touch, responds instantly, communication is top-notch. A professional who truly understands their craft. Completed everything quickly, efficiently, and thoughtfully. I will definitely reach out again!
Thank you very much!
Excellent performer - did everything quickly and clearly
Super support - accommodating - we received even more than was specified in the terms of reference
We will work together again!
![]()
| Response review
Activity
| Latest proposals 10 | Budget | Added | Deadlines | Proposal | |
|---|---|---|---|---|---|
|
Parsing PDF bank statements
68 USD
|
|||||
|
PDF book parser (text + images)
225 USD
|
|||||
|
Development of an AI assistant for automated call monitoring and analytics
394 USD
|
|||||
|
Telegram Scipt
150 USD
|
|||||
|
Telegram chatbot for booking detailing studio
68 USD
|
|||||
|
It is necessary to collect and launch 10 websites using AI.
56 USD
|
|||||
|
Parsing product images for an online store
188 USD
|
|||||
|
Парсинг даних товарів з сайту постачальника
45 USD
|
|||||
|
Automation/Software for reading bank PUSH notifications (P2P, crypto, banks)
101 USD
|
|||||
|
Create a parser with Allegro for the niche of special equipment.
338 USD
|