Freelance projects

Freelance projects

Parsing and classification of data

Data Parsing, Web Programming — incorrectly specified categories?

Project translated automatically. Log in or register, to view the original

We are looking for a developer to implement a system for collecting and structuring data from open sources.

We have a database of small business owners in the USA, which contains the person's name, company name, address, and state. It is necessary to build a process for enriching these records with additional information from publicly available sources, primarily LinkedIn and possibly Facebook.

The main task is to search for and verify the profiles of business owners and their corresponding business pages. For each record, it is necessary to find and collect available data, including a profile picture on the LinkedIn social network, email address, links to social networks, company website, and phone number. All this data is available on the LinkedIn business page.

Search engines and operators such as can be used for searching:

linkedin.com/in "First Last Name" "Company Name"

site:linkedin.com/in "First Last Name" "Company Name"

The system should match the found data with existing records by the owner's name, business name, address, state, and other available attributes to minimize false matches.

A solution is expected that can process large arrays of records across all states in the USA and generate a structured result in JSON or CSV format for further use.

Experience in building data enrichment systems, OSINT solutions, automating data collection, working with Python, Playwright, Selenium, Scrapy, as well as implementing verification and deduplication mechanisms for found data will be an advantage.

In your response, please briefly describe relevant experience in implementing similar projects, the technology stack used, and the approach to searching, verifying, and structuring data from open sources.

Update #1 from 16 June

We will not respond to applications written by AI.

Proposals 43 Rejected 5

Oleg Grigoryev

32 0

Budget: 25000 USD Deadline: 14 days

We can take on such a system. The benchmark for the first working stage is from 45,000 UAH and 10-14 days. This is not just a parser; the key aspects are the quality of matches, deduplication, control of false profiles, and a proper structure of results in JSON or CSV =)

Based on experience, we have created data enrichment systems, searches through open sources, automation of collection, internal CRMs, and analytical pipelines. For this task, I would use Python, Playwright, or Scrapy, a separate search module through search engines, a processing queue, cache, verification rules, and scoring matches by name, company, address, state, website, and phone number.

I see the approach as follows:
> we take a small sample of your records and create a search prototype
> separately search for personal profiles, business pages, company websites, and available contacts
> each found match receives a trust score to avoid mixing people with the same names
> we deliver the result in a structure with sources, trust level, verification date, and reason for the match

Look, there’s a nuance here - LinkedIn and Facebook have restrictions on automated collection, so I wouldn’t build a solution on a fragile account entry. It’s better to combine search results, open pages, company websites, business directories, and attribute verification. This way, the system will be more stable, rather than like a house of cards in the wind.

Please clarify:
> what is the volume of the database at the first stage - 1,000, 50,000, or more records
> what is the acceptable error margin and what is more important - more found contacts or fewer false matches

Relevant examples from Ingello:
> https://business.ingello.com/vorfahr - automation and complex data processing for business processes
> https://business.ingello.com/fractal - agency approach and automation of complex workflows
> https://business.ingello.com/forma-crm - corporate system with data, roles, and structured logic

Main page for FLH - https://systems-fl.ingello.com/ua

After sampling 100-300 records, it will be possible to more accurately estimate the total budget for the entire dataset. Usually, the pilot shows the real quality of sources and prevents spending the budget on beautiful but blind automation.

Shavkatbek Ro'zibekov

1 1

Projects -
Rating -
Rating 328

Budget: 15000 USD Deadline: 6 days

Hello. I have worked on similar data collection and enrichment systems using Python with Playwright and Scrapy: searching for profiles using search operators, parsing LinkedIn, verifying matches by name, company, address, and state, deduplication, and outputting in JSON or CSV. First, I will create a working pilot with a sample of your records so you can see the quality of the matching, then I will scale it to all states. Approximately 15,000 rubles and 6 working days for the pilot, I will provide an exact estimate of the volume after reviewing the structure of your database. I am ready to start immediately.

Polly Pol

100 0

Budget: 100 USD Deadline: 2 days

Good day
can we gather
in this form
https://docs.google.com/spreadsheets/d/1UEFtX5ozBW2PQDThucQljxZYdMdY4k8l4gQnF4T34Sg/edit?gid=1776920200#gid=1776920200
Write who exactly is interested

Yevgeniy Rybin

0 0

Projects -
Rating -
Rating 561

Budget: 1000 USD Deadline: 20 days

Hello!

My name is Evgeny, and I have been professionally developing mobile applications, websites, web services, and web applications for 7 years.

*The cost mentioned is for 1 hour of work. To provide a more detailed price, I would like to connect/call and discuss the details.

- You can review my portfolio, feedback, and awards in my profile.

Why should you choose me?
- I have taken 1st and 2nd places in international championships and competitions in the IT field.
- I have verified video testimonials and letters of appreciation.
- I am always available, honest, and reasonable.
- I work under a contract.
- I lead my own development team.

I would be happy to talk to you in more detail about the project.

Corporate website for the organization "Ritual 77"

Petro Demchuk

2 1

Projects -
Rating -
Rating 620

Budget: 325 USD Deadline: 10 days

Good day.
I am ready to implement a system for enriching a database from open sources: LinkedIn, Facebook, company websites, and search engines.
I work with Python, Playwright/Selenium, CSV/JSON, parsing, deduplication, and data verification. I can set up profile searches, matching by name, company, address/state, and generate the final result in CSV or JSON.

Oleksandr Stinkovyi

117 0

Budget: 50 USD Deadline: 1 day

Hello.

I develop parsers in NodeJS. I am ready to take on the task. Write to me, and we will discuss.

Oleksandr Mittsykh

11 2

Projects 12
Rating -
Rating 510

Budget: 25 USD Deadline: 1 day

Hello, I am ready to complete your project. If you are interested, we can move to private messages and discuss the details there.

Yevhen Volovyk

0 0

Projects -
Rating -
Rating 475

Budget: 4500 USD Deadline: 7 days

Hello.

I have done something similar before — enriching databases from LinkedIn and other public sources.

Approach: for each entry in your CSV, I perform a Google search like site:linkedin.com/in "Name" "Company" USA, Playwright opens the results, checks for matches by name + state, then visits the profile and collects: photo, email (if public), website, social media, phone. The output will be JSON or CSV ready for use.

For large volumes, I will rotate the user-agent and pause between requests — to avoid getting blocked. If speed is needed — I will use proxies.

Stack: Python + Playwright + rapidfuzz for verifying matches and removing duplicates.

How many entries are in the database? This will determine the exact timeline and price.

Petro Bezsmertnyi

0 0

Projects -
Rating -
Rating 335

Budget: 160 USD Deadline: 5 days

Hello! I saw your project, and it seems I can do it.

I have written bots for CS2 trading and collected data from various websites, matched them, and removed duplicates. The idea here is the same: we search for a person on LinkedIn through Google (site:linkedin.com + name + company), open the page using Playwright, gather what is available, compare it with your database, and put it in CSV.

The only thing I want to say honestly is that LinkedIn really dislikes bots, so proxies and delays are needed; it won't be quick. This is a reality that should be agreed upon from the start.

Vlad Rudenko

0 0

Projects -
Rating -
Rating 112

Budget: 100 USD Deadline: 4 days

Hello! I read your specifications, and I do not plan to use AI (neural networks) for this task, as they often fabricate data where 100% accuracy is required. I will collect contacts exclusively using technical code — I will write a script in Python + Playwright/Selenium. It will automatically find profiles through Google dorks (site:linkedin.com/in), visit the pages, and extract real emails, phone numbers, and links. I will definitely perform a check by company name and state to ensure the data is not mixed up if there are complete namesakes (people with the same surname). I will deliver the result in a clean CSV or JSON file. I am ready to do a free test for 3-5 companies from your database so you can verify the quality of the collection. Write to me, and we will discuss the details!

Yehor Hohlov

0 0

Projects -
Rating -
Rating 272

Budget: 220 USD Deadline: 3 days

Good day! I have experience in automating data collection and processing in Python: parsing public sources, API integrations, asyncio, validation, and structuring results in JSON/CSV. I have worked on projects where it is necessary to match records by several fields and minimize false matches.

Approach to your task:

Search — Google/Bing with operators site:linkedin.com/in, name + company + state; additionally, public business registries in the USA, the company website from your database.
Matching — scoring by name, title, address, state; confidence threshold (high / medium / low match).
Verification — cross-checking LinkedIn ↔ company website ↔ address; deduplication by profile URL and email.
Stack — Python, asyncio, Playwright (where allowed), pandas, JSON/CSV export, logging, and recovery from failures.
Important: mass automated parsing of LinkedIn/Facebook is limited by their rules and the risk of blocks. I recommend a hybrid approach: searching through search engines + enrichment API (Apollo, Hunter, etc.) + manual verification of low score records — this is more stable for large volumes by state.

Relevant experience: Telegram bots with channel parsing (Telethon), external API integrations, working with JSON databases, and data filtering. Portfolio: https://yegor10.github.io/PortFolioWeb3/

I am ready to describe the architecture in more detail after clarifying the volume of the database (number of records) and acceptable sources. Please write in response — we will discuss the terms of reference.

Viacheslav K.

6 0

Budget: 1500 USD Deadline: 7 days

Good day, Roman!

I see that other specialists have already responded to your project. Allow me to help as well.

For now, I will refrain from making an offer, as a clear vision of the final goals of the data collection system is needed for development. To better understand your vision and propose optimal solutions, please clarify:
- The timeline for project implementation and plans for a quick MVP launch.
- Do you have a detailed technical specification or a formed vision of the system?
- Are you considering specific technologies, or can I recommend optimal solutions?
- The volume of records to be processed.
- Examples of similar projects for reference.

The following factors affect the timeline and cost of development:
1. The volume of data and frequency of updates.
2. Availability of ready-made tools for data collection.
3. Integration with other systems.
4. The level of detail in data verification and deduplication.
5. Scalability of the solution for large volumes of data.

At the initial stage, it is important to form and agree on a vision of the final result of the data collection and classification system. I prefer to develop such a vision based on an analysis of existing solutions from competitors and your wishes.

I suggest discussing the project details to understand how well we fit each other. We can document all nuances in correspondence or during a meeting.

I have experience in developing data enrichment systems and automating information collection from open sources. I know how important it is to minimize false matches and ensure data accuracy, especially when working with large volumes of information from platforms like LinkedIn.

Anastasia Safronova

24 0

Budget: 30 USD Deadline: 3 days

Good day.

I have experience in collecting and enriching business data, searching for company contacts, business owners, and verifying information from open sources. I have worked with large datasets for B2B databases, where it was important not only to find information but also to correctly match it with existing records and minimize false matches.

For a similar task, I see the process as follows: searching for potential profiles through LinkedIn and search engines, matching by full name, company name, address, and state, further verification of the found data, and forming a structured result in CSV or JSON. If needed, I can also assist with preparing the logic for deduplication and quality checking of the results.

I work with Python, data collection automation, and processing of tables and structured datasets. For a more accurate assessment, I would like to see an example of the source database and the estimated volume of records.

I would be happy to discuss the project details.

Vladislav R.

3 0

Budget: 1000 USD Deadline: 7 days

I have experience in parsing both regular news aggregators and more secure American auctions. I can already say that there will be difficulties with LinkedIn in terms of its protection and restrictions. If you simply follow the link, we will get limited information, and there will likely be restrictions on the number of pages viewed from the current IP address. However, if you log in, there will be greater access, but there is probably also a limit on the number of pages viewed. I will read more about them later if I win the competition. I can say right away that in the most difficult case, this will involve additional LinkedIn accounts and proxies, possibly premium ones.

Stack: Python, pyTelegramBotAPI, MySQL, Redis, requests, curl_cffi, BeautifulSoup4, lxml, PySocks, possibly Selenium/Playwright, but I would try not to use them to save server resources and increase data processing speed.

How I see the result:
- The employee uploads a document with the appropriate structure to the bot.
- The bot parses and fills in the fields in the database.
- At the set time, it starts searching.
- First, it searches for information for empty fields, while simultaneously updating existing ones with a timestamp of the update.
- If necessary, the employee presses a button, and the bot exports all found data in one of the formats of choice: json, csv, xlsx.

Similar project: Telegram bot для поиска новых обьявлений

Telegram Auction Monitor — real-time monitoring of Copart + IAAI

Matvii Marchenko

20 0

Projects 20
Rating -
Rating 2 116

Budget: 365 USD Deadline: 14 days

I understood the brief: at the input, a database of small business owners in the USA (name, company, address, state) is needed, along with an enrichment pipeline from LinkedIn and Facebook using search operators (site:linkedin.com/in "Name" "Company"), verification of found profiles by name plus company plus state, and at the output for each record, a photo, email, social media, company website, and phone in JSON or CSV. The scale is all states in the USA, meaning tens of thousands of records.

For a production-grade pipeline, I usually use Python plus Playwright (more stable than Selenium on LinkedIn), Scrapy for massive parallel crawls, a residential proxy pool to reduce the ban rate, deduplication and verification through fuzzy matching (rapidfuzz), and LLM verification for edge cases (one Smith may be in several states). I store the data in PostgreSQL with phased export to CSV or JSON, with source flags and confidence levels for each field.

Realistic coverage on large datasets: the LinkedIn profile of the owner is found 50-70 percent of the time (depending on the uniqueness of the name plus company), email and phone from LinkedIn usually 5-15 percent (closed by most users), and if supplemented through email-finders (Hunter, Apollo, Snov.io), the email can be raised to 25-40 percent. Company website and social media are better — 40-60 percent.

To provide an accurate cost and timeline, it is important to know: what the volume of the database is (5 thousand, 50 thousand, 500 thousand records), what the budget for proxies and email-finder API is, and what the expected timeline is (it won't be possible in a week, but a month is feasible). Based on experience in the portfolio: I have conducted LinkedIn enrichment on projects with several thousand records, consistently with a low ban rate.

I am ready to discuss the volumes via a call or in correspondence, after which a precise breakdown by days and budget will be available.

Andrii Tyupa

53 0

Budget: 100 USD Deadline: 2 days

I have worked on similar tasks: enriching databases through public sources, Google Maps API, website parsing, and aggregation into a structured format. I see it like this: we take each record, run it through several sources (LinkedIn, Yelp, Google Business, possibly official state registries), normalize it, and store it in a database with a history of updates, so it can be re-enriched. Question: what specific fields need to be added, phone and email or something deeper like revenue, number of employees, social media? I am ready to discuss the scope and approach.

Yaroslav S.

2 1

Projects -
Rating -
Rating 522

Budget: 1000 USD Deadline: 22 days

Hello! I have experience in writing a bot that uses Chrome Driver. It emulated a browser and collected the necessary data. The implementation was done in Rust. I can develop a program in Go that will work and parse the required information. I suggest choosing Go because it best suits your needs; it can handle more requests, requires fewer resources to operate, and is faster than Python. There are also all the solutions needed for this task. If the stack must be only in Python, then FASTAPI + httpx. I have experience working with Selenium and have written automated tests.

Aleksandr A.

0 0

Projects -
Rating -
Rating 221

Budget: 350 USD Deadline: 10 days

Hello. I have experience working with OSINT tasks and automating data collection.

Here is a step-by-step implementation plan:

1. Bypassing restrictions: I will use a Playwright-based architecture (or Selenium with proxy rotation) to simulate real user behavior when working with LinkedIn/Facebook, in order to minimize the risk of blocks.

2. Validation and matching: To match the found profiles with the database, I will apply not only text matching of names but also additional attributes: geolocation (state), company name (through fuzzy matching/Levenshtein distance), to filter out irrelevant results.

3. Deduplication: I will implement a check at the database writing stage to avoid duplicates.

As a result, you will receive a structured JSON/CSV file.

I have previously implemented similar data collection systems (worked with parsing contacts for CRM). I am ready to discuss the details of the technical task.

Rumzik Matvey

15 0

Budget: 160 USD Deadline: 1 day

Good day, Roman!

The task is quite clear to me: to enrich the database of small business owners in the USA with data from open sources (LinkedIn/social media) — find the profile, match it with the existing record (name/company/address/state), verify, remove duplicates, and provide structured JSON/CSV for all states. This is exactly my niche.

Relevant experience: I built a bulk scraper/enricher for email marketing (Node.js, 250 parallel processes) that extracted emails and phone numbers from the pages of domain name databases in the CIS and deduplicated against the existing database — this is essentially your record enrichment task;
+ multi-marketplace scraper for boards like vinted, bazos, jofogas, olx with anti-detect proxy rotation and account validation; Python scrapers for real estate OLX/Dom.ria (aiohttp/asyncio + deduplication at the database level); bots on Selenium/Playwright for ticket purchases.

Stack: Python (Playwright/Selenium for dynamics, async HTTP + BeautifulSoup for statics, Scrapy as needed), proxy rotation + throttling, fuzzy matching for matching, export to JSON/CSV.

Approach: search using operators (site:linkedin.com/in "Name" "Company") → extract public data → fuzzy match by name/company/address/state with a confidence score (minimizes false matches) → dedup → structured export.

Honestly about the limits: LinkedIn aggressively cuts bots, and emails/phones are often not public — the actual coverage will not be 100%, and I account for this in the architecture (proxies, throttling, match score, fallback sources).

Real feedback from clients is in my profile: [https://freelancehunt.com/project/vosstanovlenie-podderzhka-dorabotka-telegram-bota-dlya/1596685.html], [https://freelancehunt.com/project/parser-na-node-js/634091.html].

Question: what is the volume of records and which fields are critical? This will affect the range. Details are in the correspondence.

I work on an hourly basis by agreement: +-20$.

Alisa S.

1 0

Projects -
Rating -
Rating 387

Budget: 600 USD Deadline: 7 days

I specialize in automating data collection and enrichment using Python, so I will gladly develop a reliable system for you to find contacts of American small businesses. Based on your database, the algorithm will use Scrapy or Playwright to find owner profiles on LinkedIn and Facebook. To completely eliminate false matches due to similar company names, I will set up smart data matching by name, state, and address. For stable operation without blocks, I will connect rotating proxies, and I will clean the final result in JSON or CSV from duplicates and validate the found emails.

Andrii D.

50 2

Budget: 450 USD Deadline: 7 days

Hello! I have developed dozens of parsers, and I can handle this as well, but I want to propose a more stable and potentially cheaper approach: direct scraping of LinkedIn profiles gets banned quickly even with proxies - residential proxies are needed (datacenter ones get blocked instantly). Residential proxies are approximately $3.6-7.35/GB, while paid search APIs like SerpAPI cost $0.001-0.01 per request - at scale, this is significantly cheaper and more stable than direct scraping + proxies. Also, after the recent Cloudflare updates (this has been going on for about six months), it's a bit challenging to set up unique device fingerprints for anti-detection.

Therefore, I suggest using the search API approach instead of direct scraping - lower risk of bans and more predictable costs.

Taras O.

4 0

Budget: 400 USD Deadline: 10 days

Hello!

I have extensive experience in developing solutions for parsing and processing data (various sources, protection against blocking, automation). I am ready to complete the assigned task.

I suggest we discuss the details in private messages.

Yaroslav Kolesnik

6 1

Projects 6
Rating -
Rating 956

Budget: 100 USD Deadline: 4 days

Hello, I have experience with the stack you listed, and I have also worked on similar parsing projects. The most interesting and challenging project was one involving parsing and automating tour bookings, where there were issues with limits and blocking.

Nick Osipov

41 4

Budget: 1000 USD Deadline: 3 days

Good day!

I have extensive experience in developing OSINT solutions and data enrichment systems in Python using Playwright/Selenium/Scrapy. I effectively implement the search, verification, and structuring of data from open sources, ensuring accuracy and scalability.

Message me privately, and we will clarify the details.

Denis Gavrischuk

32 1

Budget: 200 USD Deadline: 1 day

Good afternoon, I have been working in web programming for over 9 years. I work with REST APIs, frameworks, and CMS such as Django, Laravel, Yii2, WordPress, OpenCart, CodeIgniter, etc. This is not AI.

Bohdan Yanishevskyi

7 0

Budget: 333 USD Deadline: 3 days

Feel free to contact me, I am ready to perform. I am waiting for the specifications. The deadline and cost are approximate until I am fully familiar with the specifications.

Artur Boiko

5 0

Budget: 50 USD Deadline: 1 day

Good day!

I have done similar things — enriching contact databases through searching and matching profiles, so the task is clear with half a word.

Regarding the approach: I run records from your database (name + company + state) through Google with operators like site:linkedin.com/in "Name" "Company" — this way I find the profile itself without immediately hitting LinkedIn's blocking. Next is the matching: I compare the found profile with the original record by name, business name, state, and address to avoid irrelevant matches (this is the main problem with identical names, so I do the matching based on several attributes + a confidence threshold). I clean duplicates at the output.

Stack: Python + Playwright (for rendered pages) and Scrapy/requests where it can be simpler. Proxies are a must — otherwise LinkedIn cuts off on volumes. I deliver the result in JSON or CSV, whichever is more convenient for you.

Honestly about one point, so there are no surprises: photos, links to social networks, and the company website from LinkedIn are retrieved normally, but emails and phone numbers are often hidden there — they are not publicly visible to everyone. I will gather what is open; where contacts are not accessible, the field will be empty (I can additionally pull from other sources if needed — we can discuss).

What volume do you plan to start with and is there an example of your current table? I will look at the structure — and I will give you realistic timelines.

I will provide free consultation on the project in private 🙂

Ilya P.

41 0

Budget: 250 USD Deadline: 10 days

Good day, I can create such a product using Python. Scraping, deduplication, etc.

Maksim Sheptookha

0 0

Projects -
Rating -
Rating 427

Budget: 600 USD Deadline: 7 days

Good day.

Implementing search through the operators "site:linkedin.com/in" is the right choice that will allow enriching the database without the risk of instant account bans on LinkedIn itself. However, when working with large datasets in the USA, there are two critical engineering points that need to be built into the architecture from the very beginning:

1. Bypassing Google and LinkedIn limits
Directly launching a browser emulator for search queries in Google will quickly hit a CAPTCHA (after just a few dozen iterations). For stable system operation in multithreaded mode, I use PHP in conjunction with rotating residential proxies and automation tools (such as Symfony Panther or within Laravel via Spatie Browsershot / headless Chrome). An alternative and more stable option for large volumes of search is integration through the Search API, which completely removes the Google CAPTCHA issue. The profile photos and business data are downloaded through browser emulation to bypass LinkedIn's JS protection.

2. Verification and minimization of false matches (Matching)
To avoid merging namesakes from different states, the system performs multi-level validation using PHP:
- Normalization of company names (cleaning from Ltd, Corp, LLC).
- String comparison using text similarity algorithms (built-in "levenshtein()", "similar_text()" or Jaro-Winkler implementation) for names and business titles.
- Strict geo-filter for matching the state/address specified in your database with the data from the found profile.
Based on these factors, each record is assigned a confidence score. Only results that meet the established accuracy threshold are exported to the final CSV/JSON.

Technological stack: PHP (CLI / Laravel), Symfony Panther / Headless Chrome (browser automation), Laravel Queues (Redis) for reliable queuing and multithreading, string-matching algorithms for data cleansing.

Estimated cost of development and setup of such a solution: $400 – $600 (depending on the final volume of data and the need for integration of third-party APIs).
Implementation time: 5–7 working days until the first stable result is delivered.

I am ready to test the logic on a small test sample of your database (for example, 20–50 rows) to demonstrate the accuracy of collection and matching on my stack. I look forward to your feedback in the chat.

Vladyslav B.

1 0

Projects -
Rating -
Rating 514

Budget: 50 USD Deadline: 1 day

Good day!

I am ready to take on the implementation of a data enrichment system from open sources.

I have experience in parsing, data enrichment, search automation, processing large CSV/JSON arrays, deduplication, and data verification based on several attributes.

I propose the stack:
• Python;
• Playwright / Selenium for dynamic pages;
• Scrapy / Requests / BeautifulSoup for static sources;
• pandas for data processing;
• fuzzy matching for matching names, companies, addresses, and states;
• exporting results to CSV / JSON.

I see the approach as follows:

1. Loading the initial database.
2. Generating search queries by name, company, state, and address.
3. Searching for profiles and business pages through open sources.
4. Matching results with records based on several parameters.
5. Verifying matches and assigning a confidence score.
6. Collecting available fields: LinkedIn/Facebook, website, phone, email, profile photo, social networks.
7. Deduplication and forming the final CSV/JSON.

I can also foresee logging, reprocessing failed records, and manual verification of questionable matches to minimize errors.

I am ready to discuss the volume of the database, an example of the input file, and the desired structure of the result.

Tetyana S.

74 4

Budget: 130 USD Deadline: 2 days

Good day! The task is clear, so I can implement such a system in a couple of days!!! Ready for productive and quality collaboration!!!

Oleksii Manziuk

6 0

Budget: 100 USD Deadline: 1 day

Good day.

I have extensive experience in developing web projects using PHP and Python, automating data processing, integrating with external services, and working with large datasets. I have also worked with data parsing from open sources, processing results, and further structuring them for use in business processes. In the past, I managed networks of mfa websites from scraped data of companies.

For implementing a similar project, I see a solution in the form of a multi-step pipeline:

- searching for potential profiles through search engines and open sources;
- automated data collection using Python (Selenium/Scrapy, we will see what fits);
- verification of matches by full name, company name, address, state, and additional attributes;
- deduplication and assessment of the reliability of the found results;
- formation of structured results in JSON or CSV formats.

In terms of technologies, I have experience with Python, Selenium, SQL, REST API, data processing, and business process automation. I also have significant experience working with legacy systems and projects where it is necessary to quickly understand the logic of processing large volumes of data.

I am ready to discuss the details, expected volumes of records, and requirements for data matching accuracy.

Rostislav Chuvurin

0 0

Projects -
Rating -
Rating 182

Budget: 25 USD Deadline: 2 days

Good day.

I have experience in developing parsers and data collection/enrichment systems in Python (Playwright, Selenium). I have worked with searching and verifying contacts, company profiles, and business owners from open sources.

I can offer a solution for matching data by full name, company, and location, with the results exported to CSV or JSON. If there is a sample database, please send it, and I will quickly assess the complexity and scope of work.

Denis D.

6 1

Budget: 25 USD Deadline: 1 day

Hello. I have relevant experience in Python automation, parsing open sources, OSINT approaches, deduplication, and data structuring.

I have worked on tasks involving data collection from websites, social networks, Telegram/web sources, profile processing, matching searches, filtering irrelevant results, and exporting to CSV/Excel/JSON.

Stack: Python, Playwright, Selenium, Scrapy/BeautifulSoup, requests/httpx, Pandas, PostgreSQL/SQLite, SQLAlchemy, Docker. If needed, I can add queues, proxies, rate limits, logging, and a resume mechanism for large volumes.

I see the approach as follows:

1. We take the input records: name, company, address, state.
2. We generate search queries through Google/Bing with operators `site:linkedin.com/in`, `site:linkedin.com/company`, as well as searching the company website.
3. We collect candidates: LinkedIn profile, company page, website, phone, email, social links.
4. We perform verification scoring: matching name, company, state, address/city, position, company domain.
5. We filter out weak matches, duplicates, and suspicious results.
6. We form a structured result in CSV or JSON with confidence score and sources.

I can implement an MVP that processes part of the database, shows the quality of matching, and then scale it for large volumes across all states in the USA.

Daria Kratofil

0 0

Projects -
Rating -
Rating 196

Budget: 25000 USD Deadline: 16 days

we have an almost ready solution for enriching databases and classifying found profiles, we can quickly adapt it to your records and discuss the details here, I am available ))
I initially see the first working stage lasting 16 days, with a rate of 65000 UAH for the pilot involving search, verification of matches, deduplication, and export to JSON or CSV.
Technically, I would do this in Python, Playwright, or Scrapy, with task queues, result caching, scoring matches by name, company, address, state, domain, and phone number.
I would also include rate limiting, re-checks, a log of match reasons, and a manual list of questionable records, because in such tasks it’s better to measure seven times than to clean the entire database manually later.
I have relevant experience in automating data collection, structuring, and verification for business processes.
https://business.ingello.com/vorfahr - close in terms of the logic of automating search and working with data.
https://business.ingello.com/fractal - an example of agency automation and complex information processing processes.
our profile and approach for FLH - https://systems-fl.ingello.com/ua
I just want to clarify two things.
what is the volume of the first batch - 1000, 10000, 100000 records or more?
should the profile photo be stored as a link or uploaded as a file?

Dmytro Parkhomenko

20 0

Budget: 50 USD Deadline: 1 day

Good day, I am ready to complete your task quickly and efficiently. I have extensive experience in creating various parsers. Please write to me in private messages to discuss the details. I would be happy to help :)

The list does not show proposals concealed by the client or freelancer with a Plus profile, as well as proposals violating rules

Andzhey R.

8 0

Budget: 25 USD Deadline: 1 day

Good day.
Our team has many years of experience in developing ERP, CRM, CMS, and specialized software for businesses. We create effective digital solutions that help automate processes, increase productivity, and scale companies.

We already have a ready-made solution for a parser.

We work with modern technologies — from bots and scripts to AI agents and analytical systems. We develop websites of varying complexity. In our portfolio, we have implemented ERP solutions for the hotel business, as well as for companies engaged in the import and sale of goods, and our own product XFitness — an ERP system created specifically for fitness clubs.

We are ready to implement your project and offer the best solution tailored to your needs.
Our portfolio: Freelancehunt

We specialize in the following areas:
- Development of ERP Systems
- Development of CRM Systems
- Development of Websites of any complexity
- Development of CMS Systems
- Support for Websites
- Development of OpenCart
- Support for OpenCart
- Modification of OpenCart
- Enhancement of OpenCart
- Development of WordPress
- Support for WordPress
- Modification of WordPress
- Enhancement of WordPress
- Development of ECommerce
- Support for ECommerce
- Modification of ECommerce
- Enhancement of ECommerce
- Development of Web Applications
- Support for 1C Servers
- Support for Web Servers
- Development of mobile applications
- Data parsing
- Development of bots
- Development of AI agents

and on the following technologies:
- Python
- PHP
- Laravel
- Symfony
- Yii2
- JS
- NodeJS
- jQuery
- TypeScript
- MySQL
- HTML
- CSS
- Vue
- Nuxt.js
- React
- React Native
- C++

Maksym Potashov

6 2

Budget: 25 USD Deadline: 1 day

Good day.
I have experience in developing data collection and enrichment systems, parsers, and automating work with large data sets. For such tasks, I usually use Python, Playwright, Selenium, Scrapy, PostgreSQL, and tools for deduplication and data verification.
I can implement the process of searching and matching business owner profiles based on name, company name, address, state, and other attributes to minimize false matches. The result can be formatted in JSON or CSV with the necessary data structure for further processing.

I also have experience in building data enrichment pipelines, where it is important not just to find information but to verify its relevance and quality before saving it to the database.
Please let me know:
* What is the estimated size of the database at the start (thousands or tens of thousands of records)?
* Is a one-time processing needed or regular data updates?
* Is there an example of the desired JSON/CSV format for the final result?

Roman Sovan
United States

Projects 313
Rating 5.0
Rating 22 179

Oleg Grigoryev

Shavkatbek Ro'zibekov

Polly Pol

Yevgeniy Rybin

Petro Demchuk

Oleksandr Stinkovyi

Oleksandr Mittsykh

Yevhen Volovyk

Petro Bezsmertnyi

Vlad Rudenko

Yehor Hohlov

Viacheslav K.

Anastasia Safronova

Vladislav R.

Matvii Marchenko

Andrii Tyupa

Yaroslav S.

Aleksandr A.

Rumzik Matvey

Alisa S.

Andrii D.

Taras O.

Yaroslav Kolesnik

Nick Osipov

Denis Gavrischuk

Bohdan Yanishevskyi

Artur Boiko

Ilya P.

Maksim Sheptookha

Vladyslav B.

Tetyana S.

Oleksii Manziuk

Rostislav Chuvurin

Denis D.

Daria Kratofil

Dmytro Parkhomenko

Proposals are currently absent

Andzhey R.

Maksym Potashov

Proposals concealed

Current freelance projects in the category Data Parsing