Python Web Scraping / Data Extraction Specialist
We are looking for an outsourcing specialist for tasks related to parsing catalogs, manufacturer websites, and OEM sources.
We need to collect structured information from various websites: product catalogs, equipment models, parts compatibility, OEM part numbers, product names, source links, and other fields depending on the specific site.
Main tasks:
- analyzing websites and finding the optimal way to obtain data: API, HTML, JSON, CSV/XLSX, PDF, or other available sources;
- writing scripts for data collection;
- cleaning, normalizing, and structuring data;
- delivering the result in CSV, Google Sheets, or an agreed structure for further import into our database;
- implementing re-runs without duplicating records;
- logic for data updates: new / updated / unchanged;
- logging runs, errors, and the number of collected/updated records;
- brief documentation: how to run the script, what dependencies are needed, what fields are collected, what unique key is used.
Required skills:
- Python or another relevant language for scraping / data extraction;
- requests, BeautifulSoup, lxml, pandas;
- Selenium or Playwright for JavaScript websites;
- working with API, JSON, CSV, XLSX;
- basic understanding of SQL or data preparation for database import;
- Git / GitHub;
- ability to work with regular data updates and deduplication / upsert logic;
- attention to data structure and script stability.
It would be a plus:
- experience in parsing e-commerce websites, parts catalogs, OEM catalogs, or technical documentation;
- experience working with Google Sheets API;
- experience processing PDF catalogs or tables;
- experience setting up regular script runs;
- ability to describe source limitations and risks of maintaining the parser.
Collaboration format:
We plan to work hourly. For each new site, a brief technical discovery needs to be done first: analyze the source, understand the data retrieval method, assess complexity, risks, and estimated implementation time.
After that, we agree on the scope of work and the limit of hours for implementation.
In your response, please send:
- examples of scraping / data extraction projects;
- GitHub or code samples, if available;
- your optimal hourly rate;
- what tools you usually use.
-
I will start with a technical discovery for each source: I will check API/HTML/JSON/CSV/XLSX/PDF, propose a collection method, write a script, prepare the data structure, deduplication, upsert logic, export, and a brief documentation.
Do you already have a reference sample for one catalog to check that the parser did not confuse OEM part numbers, compatibility, categories, and did not miss any products for import into the database?
Hourly rate, limit of hours, and the first site for testing will be discussed in personal correspondence after reviewing the data source.
Similar completed project: В модулі OpenCart виправити 5 проблем повязаних з Facebook API
-
5097 37 2 Hello!
I have relevant experience specifically for your tasks:
— Developed commercial scrapers for collecting product catalogs from e-commerce sites (Playwright, BeautifulSoup, requests) with bypassing anti-bot protection and proxy rotation
— Implemented upsert logic (new / updated / unchanged) and deduplication during repeated runs
— Collected and normalized large volumes of data (27,000+ records) with subsequent storage in PostgreSQL and export to CSV
— Worked with APIs, JSON, XLSX, as well as dynamic JS sites through Playwright
— Set up logging for runs, errors, and statistics of collected records
… — Wrote brief documentation for each script
Tools: Python, Playwright, BeautifulSoup, requests, pandas, lxml, PostgreSQL, Git
Portfolio and work examples:Freelancehunt
Hourly rate: from $12/hour — final rate after technical discovery of the first source.
Ready to start with the analysis of the first site and provide an assessment of complexity and timelines. Please send the link to the first source!
-
312 1 0 Good day, I see that parsers are needed. The optimal hourly rate is 400 UAH. Contact me and I will send an example of a parser for the lowest prices of car parts with an admin panel. I have experience.
-
3411 32 0 Hello! I regularly engage in parsing of various complexities, and I have examples of my work in my portfolio. For clarification of all details, please write in private messages.
-
6216 74 1 Good day. I have extensive experience in various parsing.
https://freelancehunt.com/showcase/work/p2p-aggregator-agregator-kursiv-7h-kripto/1821723.html
https://freelancehunt.com/showcase/work/nextdoor-parser/1759679.html
Freelancehunt
10-15 USD - depending on the complexity.
Frameworks: Scrapy, aiohttp, requests, lxml.
In any database or tables.
-
2335 37 0 Good day, I have parsed a lot, here are examples
https://freelancehunt.com/project/parsing-massove-stvorennya-storinok-na/1261589.html
https://freelancehunt.com/project/parser-dannyih-dlya-parser-yutub/1266572.html
https://freelancehunt.com/project/parser-saytyi-muzhskoy-kosmetiki-2/1239346.html
I have worked with all the listed technologies.
Payment is better per project, not hourly.
… I can provide a link to the first site, I will do a technical discovery for you.
-
3088 31 0 Hello! I have reviewed the task — this is my main profile. I have extensive experience in developing fault-tolerant data collection systems in Python (BeautifulSoup, Playwright/Selenium, asynchronous requests) with proper architecture: deduplication (idempotency), logging record states (new/updated/unchanged), error handling, and working through proxies to bypass protection.
I fully support the format with a preliminary Technical Discovery — this is the only professional approach that protects against hidden pitfalls. First, I analyze the API/HTML source, assess the complexity (structure, protection, volume), agree with you on the hourly limit, and only then proceed to coding. I deliver the result in a structured format (CSV/Google Sheets/JSON/SQL-ready) along with a concise README for execution.
My optimal rate for long-term collaboration is $20-25/hour (depending on the volume and regularity of tasks). Tools: Python (asyncio, aiohttp/requests, BS4), Playwright (for JS-heavy sites), Pandas (data normalization), Git. I am ready to show examples of architecture and similar cases in personal messages. Let's discuss the first source!
-
4975 41 4 1 Good day!
I specialize in Python web scraping and data extraction. I have significant experience working with APIs, Google Sheets, deduplication, and information structuring, providing stable and efficient solutions for your needs.
Message me privately, and we will clarify the details.
-
738 4 0 Good day, I am an expert in the field of parsing. I write everything in Go and Node.js. If you need complex and high-quality parsing, feel free to contact me.