On the commission of Scraper / Parser / Land and Mortgage Registers
I will commission the creation of a program either in a version for Windows or as software running on a server (I would prefer the latter) that:
-downloads land and mortgage registers -saves 5 subpages/sections of each register to an HTML file -Complete View (i.e., a complete overview of the situation regarding a given property)
-a parser that extracts data from the registers=HTML files and creates a structured database that will be searchable by various parameters
-during the parsing process, it will analyze the content and assign tags to each section in a specified manner
The entire operation of the program should be continuous and should run multiple threads in parallel
The performance of such a program is at least 150,000 records in 24 hours
The program should also have a function for monitoring changes in a selected record=land and mortgage register - such a list was taken by the user.
-
1 day20 USD1 day20 USD
Hello, I am interested in your suggestion, I have a lot of experience in parsing this kind of tasks. Can you provide an example of one book for more detailed study? Also, we can discuss all details in private messages. I will be glad to cooperate.
-
14 days21 USD
1312 25 1 14 days21 USDHello!
I have many years of experience in developing web applications, parsers, and databases, as well as performance optimization for large volumes of data. I can complete your task of developing a program for downloading, parsing land books, creating a structured database, and monitoring changes.
Proposed resources and technologies:
Programming language:
Python is the optimal choice for creating parsers and multithreaded applications. Its libraries, such as BeautifulSoup or lxml (for parsing HTML), as well as asyncio or threading (for multithreading), will ensure high performance.
Database:
For fast storage and retrieval, I recommend using SQL databases such as MySQL or PostgreSQL, or NoSQL solutions like MongoDB (for flexible storage of unstructured data). The choice depends on the volume of data and the required access speed.
…
Multithreading:
To achieve a performance of 150 thousand records in 24 hours, I will use an asynchronous or multithreaded model, which will allow loading and parsing to be done in parallel.
Monitoring changes:
The program can be configured for periodic scanning of selected records and notifying about changes. This can be implemented through background tasks using a scheduler, such as Celery for Python or Quartz for Java.
Running on a server:
It is better to run the program on a server, which will allow it to operate continuously, process large volumes of data, and easily scale the system as needed. Cloud servers, such as AWS or DigitalOcean, will be an excellent choice for this project.
-
7 days2116 USD
603 4 0 7 days2116 USDInteresting. I have experience in writing parsers and automation. I can implement a server version in Python using multithreading, however, I would like to see the technical specifications to discuss the terms and cost in detail.
-
Hello Jan,
I have a few Qs on your project
- Are you able to communicate in English, or only Polish is acceptable?
- "Wydajność takiego programu to min 150 tys rekordów na 24 godziny". If I got it right, this means that (roughly) every second app needs to process at least 2 records. what is the maximum possible number of records?
- processing 2 records/sec does require some computing power. does your project assumes some spendings on computing power?
- how often processing of 15ok of records is supposed to happen? is it a one time thing and later only "difference" should be taken care of? or it should happen with some regularity?
- is it a time-boxed project?
- may I know a bottom (and/or top) levels of funding for actual coding of this task?
- is there any document that holds full version of the requirements? or text that we see here is the only thing you have as of now?
thanks in advance for your answers. sending them through personal messages is also OK
-
Current freelance projects in the category Databases & SQL
Accounting, planning, and sales system for a mushroom farm
607 USD
Here is the complete, final text of the Technical Assignment (TA). It combines all your requirements: 16 chambers, 20 contractors, a schedule by days, accounting for containers, profitability calculation, and a mandatory division into three grades of mushrooms. You can fully… Databases & SQL, Client Management & CRM ∙ 9 hours 33 minutes back ∙ 43 proposals |
External report 1C 8.3 — forecast of goods balances
22 USD
An external report (.erf) is needed for 1C:Enterprise 8.3 (configuration to be specified). What it should do: Extract product balances from the database Analyze sales history for the last 30 days Calculate the average sales rate for each product Determine how many days until the… Databases & SQL, Client Management & CRM ∙ 10 hours 9 minutes back ∙ 10 proposals |
Web Application & Database Security Audit for Custom CRM — BaaS / Database-as-API Specialist (PenetrProject Overview We operate a custom-built customer relationship management (CRM) platform that runs two service businesses on a single system. It is a modern JavaScript web application backed by a backend-as-a-service (BaaS) database and deployed on a serverless hosting… Databases & SQL, Testing & QA ∙ 22 hours 32 minutes back ∙ 9 proposals |
Database synchronizationSynchronization of Microsoft Access programs and CRM SalesDrive. Data transfer from CRM to Microsoft Access in the first stage (changing the funnel status). Data transfer from Microsoft Access to CRM in the second stage (changing the status in the program). Databases & SQL ∙ 1 day 4 hours back ∙ 10 proposals |
Setting up a backup system and optimizing server infrastructureObjective of the work: Ensure reliable data storage for the CRM system and application by implementing an automated backup system, as well as carry out a series of server improvements to enhance the stability, security, and performance of the infrastructure. DevOps, Databases & SQL ∙ 2 days 2 hours back ∙ 23 proposals |