Freelance projects

Freelance projects

Parsing text PDF with tables

Data Parsing, Python — incorrectly specified categories?

133 USD

Project translated automatically. Log in or register, to view the original

It is necessary to parse text PDFs with tables and create a dynamic object with all the data present in the document. The tables may vary in the number of records, and this needs to be taken into account. Also, the tables can be either at the beginning or at the end of the document; however, they can be easily located by the "reference" labels.

There are 2 main tables that can be "merged" into one, and then for each record from this table, there is a detailed information table just below the main tables. The first table is ECU SUMMARY INFO and the second table is ECU SUMMARY INFO (CONT...). Then after the tables, there is ECU DETAILS, which consists of more detailed tables for each ECU, known as the format parameters NAME=VALUE.

Ideally, I would like to be able to work with this data later using Python.

Thank you in advance.

Proposals 20

Artem Plakha

150 0

Budget: 500 PLN Deadline: 1 day

Good day. We have already discussed this project with you. I am ready to execute it. I will be happy to collaborate.

Looking forward to working with you!

Karlen Abelyan

0 0

Projects -
Rating -
Rating 100

Budget: 500 PLN Deadline: 3 days

Hello Artem, I can help you with your task of outputting data in the required format for further processing. I will be waiting for your message.

Bogdan Kovalenko

6 1

Projects 6
Rating -
Rating 547

Budget: 500 PLN Deadline: 2 days

Good afternoon, Artem. There is a ready-made solution with a web interface that allows you to upload tables in PDF format and parse them. The program works great with your example, and after parsing, the data can be conveniently worked with in Python.

Alla Pankovska

0 0

Projects -
Rating -
Rating 204

Budget: 500 PLN Deadline: 4 days

Good day!
I reviewed your PDF sample. I propose the following approach:

Table extraction

Main tool: pdfplumber (stable text extraction).

Fallback for complex grids: camelot/tabula-py in lattice/stream mode.

Automatic search for section markers: “ECU SUMMARY INFO”, “ECU DETAILS” (works on different pages/positions).

Normalization

Merging broken lines, removing line breaks and extra spaces.

Correct merging of multi-line cells and columns.

Aligning parameter names NAME=VALUE in ECU DETAILS.

Single data model

{
"vin": "...",
"publication_date": "...",
"summary": [
{"ecu":"ABS","name":"Anti Lock Brakes","bus_type":"CAN-CH", "flash_part":"...", "current_vin":"...", "original_vin":"...", "part":"..."},
...
],
"details": [
{"ecu":"ABS","params":{"Param1":"Value1","Param2":"Value2", ...}},
...
]
}

Export to CSV/Excel (separate sheets Summary / Details) and/or SQLite.

Quality control

Validations (mandatory columns, number of rows, unique ECUs).

Logs and small unit tests to easily maintain the process.

Result: reproducible script + launch instructions, ready files (JSON/CSV/Excel/SQLite).
Ready to complete in 3–4 days. Cost — to be agreed upon after clarifying the format of the final export and possible nuances of other PDF markup.

Thank you!
Alla

Marcin Grzechnik

0 0

Projects -
Rating -
Rating 124

Budget: 500 PLN Deadline: 4 days

Proposed technical approach
1. Tools and libraries:

PyMuPDF (fitz) or pdfplumber for text extraction from PDF
pandas for structuring tabular data
re (regex) for pattern identification and parsing NAME=VALUE formats
Custom functions for merging and normalizing data

2. Solution architecture:

Function identifying sections based on "stubborn" labels
Parser for main tables with automatic record count detection
Module merging data from both main tables
Parser for ECU DETAILS section with flexible NAME=VALUE format
Dynamic object generator (dictionary/DataFrame) with complete data structure

3. Functionalities:

Support for varying record counts in tables
Flexible positioning of tables in the document
Data validation and cleansing
Export to formats facilitating further work (JSON, CSV, pickle)

My experience
I have experience in:

Processing PDF documents using Python
Parsing and structuring data from various formats
Working with pandas, numpy libraries, and data analysis tools
Creating scalable solutions for document processing automation

I offer:
✅ Complete solution - ready Python script with documentation
✅ Flexibility - code adapting to different document structures
✅ Code quality - readable, commented code with error handling
✅ Tests - usage examples and validation on provided files
✅ Support - assistance with implementation and possible modifications

I am ready to start work immediately.

Yurii Shtibel

8 0

Budget: 500 PLN Deadline: 2 days

If you need to work easily on Python later, ideally parsing into a database like SQL Lite, if you want, I can parse into xlsx format Excel. Write to me for discussion, I can implement this functionality.

Iryna Lytvyn

0 0

Projects -
Rating -
Rating 328

Budget: 500 PLN Deadline: 2 days

Hello!
I have prepared a fully working solution for your task.

🔹 The script **parse\_ecu\_pdf.py** is written in Python and does exactly what you described:

* Reads PDF (both local and via link) using PyMuPDF.
* Finds tables **ECU SUMMARY INFO** and **ECU SUMMARY INFO (CONT...)**, parsing them line by line.
* Finds blocks **ECU DETAILS** and collects `NAME=VALUE` pairs.
* Combines everything into a dynamic object: each summary line is automatically supplemented with the `details` dictionary.

🔹 The output is a ready JSON structure that is convenient to work with in Python.

📌 Usage:

```bash
python parse_ecu_pdf.py path/to/your_ecu_report.pdf
```

The screen displays JSON with data for each ECU.

The script is universal — the number of rows in the tables can be any, and the location of the tables (at the beginning or end of the PDF) does not matter.

I am ready to connect and help you with running, testing on your PDF, and any modifications.

Ihor Doronin

9 0

Budget: 499 PLN Deadline: 3 days

Good afternoon, Artem!
In general, the task is clear, but for an accurate answer regarding deadlines and price, I would like to clarify some questions that arose after analyzing your task.
Please write in private messages — we will discuss the details and your wishes.
P.S. I am guided by your budget, but I think I can fit into a smaller amount — after clarifying the details, I will offer an exact figure.

Denys Ternopolskyi

0 0

Projects -
Rating -
Rating 309

Budget: 500 PLN Deadline: 1 day

Hello, I am ready to complete your task as a training practice, message me privately and we will discuss all the details.

Tamara Ibrahim Sule A.

4 0

Budget: 600 PLN Deadline: 2 days

Hello!

I can create a tool in Python that reads your PDF files, finds ECU SUMMARY tables regardless of their location in the file, and combines them into one complete dataset. Immediately after that, the script will also gather ECU DETAILS tables and link each set of parameters NAME=VALUE with the corresponding ECU record. This way, you will get one clean object that combines all the information and can be used directly in Python or converted to a DataFrame for analysis.

I will not rely on page numbers or fixed positions. Instead, the script will look for reference labels and section titles, so it will work even if the layout or the number of records changes. The final structure will be flexible, easy to query, and exportable to JSON or CSV for later use.

Thank you!

Andrii-Serhii Pavlenko

1 0

Projects -
Rating -
Rating 232

Budget: 500 PLN Deadline: 1 day

Hello, Artem!

I am a Python developer with a lot of experience working with PDFs.
In what format would you prefer to work on the output?

Feel free to write, we will discuss your project!

Best regards,
Andriy

Vladimir B

35 1

Budget: 500 PLN Deadline: 2 days

Good evening. I worked with PDF and did a similar task. But in PHP, on a VPS on Linux. There are nuances, I don't know how it is for you, but sometimes the tables do not go sequentially, and then it will not be easy. We need to try.

Viktor Piven

18 3

Budget: 500 PLN Deadline: 1 day

Good evening, Artem. I am working on automation in Python. I can develop a parser for you with the necessary functionality, as one of the options, after processing the function will return a list of dictionaries []{} that you can work with further in the code. If you are interested - write to me, I will be happy to discuss the details.

Oleksandr D.

70 1

Budget: 500 PLN Deadline: 3 days

Hello.
I have experience in automatic data extraction from pdf.
We can discuss.

Nazar Poturayko

1 0

Projects -
Rating -
Rating 176

Budget: 500 PLN Deadline: 1 day

Good day! 👋

I have carefully reviewed your task.
I can complete it quickly and fully according to your requirements.
There are a few points I would like to clarify.

I am ready to start immediately after agreeing on the details.

Roman Z.

7 0

Budget: 600 PLN Deadline: 1 day

Good day!
My name is Roman, and I am among the top 6 developers in the "Artificial Intelligence and Machine Learning" category out of ~1600 specialists on the platform.
I guarantee:
- Fast and quality task execution
- Strict adherence to deadlines
- Regular communication throughout the entire process
I would be happy to discuss the details of your project in private messages.

Gustavo Gaviria Ivanov

0 0

Projects -
Rating -
Rating 219

Budget: 500 PLN Deadline: 1 day

I have already completed your assignment—I can demonstrate it.

Artem Ro
Poland

Projects 1
Rating -
Rating 128

Parsing text PDF with tables

Artem Plakha

Karlen Abelyan

Bogdan Kovalenko

Alla Pankovska

Marcin Grzechnik

Yurii Shtibel

Iryna Lytvyn

Ihor Doronin

Denys Ternopolskyi

Tamara Ibrahim Sule A.

Andrii-Serhii Pavlenko

Vladimir B

Viktor Piven

Oleksandr D.

Nazar Poturayko

Roman Z.

Gustavo Gaviria Ivanov

Proposals concealed

Proposals are currently absent

Current freelance projects in the category Data Parsing

Collection of contacts for newly established individual entrepreneurs

Creation of a stable parser/monitor for prices and availability for a demanding RTV/AGD store

Commodity nomenclature management system with flexible rights allocation

Bot/program for parsing channels, chats in TG

Parsing prices and product relevance