Parsing text PDF with tables
It is necessary to parse text PDFs with tables and create a dynamic object with all the data present in the document. The tables may vary in the number of records, and this needs to be taken into account. Also, the tables can be either at the beginning or at the end of the document; however, they can be easily located by the "reference" labels.
There are 2 main tables that can be "merged" into one, and then for each record from this table, there is a detailed information table just below the main tables. The first table is ECU SUMMARY INFO and the second table is ECU SUMMARY INFO (CONT...). Then after the tables, there is ECU DETAILS, which consists of more detailed tables for each ECU, known as the format parameters NAME=VALUE.
Ideally, I would like to be able to work with this data later using Python.
Thank you in advance.
Applications 1
-
1 day135 USD1 day135 USD
Good day. We have already discussed this project with you. I am ready to execute it. I will be happy to collaborate.
-
3 days135 USD
100 3 days135 USDHello Artem, I can help you with your task of outputting data in the required format for further processing. I will be waiting for your message.
-
2 days135 USD
571 6 1 1 2 days135 USDGood afternoon, Artem. There is a ready-made solution with a web interface that allows you to upload tables in PDF format and parse them. The program works great with your example, and after parsing, the data can be conveniently worked with in Python.
-
4 days135 USD
180 4 days135 USDGood day!
I reviewed your PDF sample. I propose the following approach:
Table extraction
Main tool: pdfplumber (stable text extraction).
Fallback for complex grids: camelot/tabula-py in lattice/stream mode.
… Automatic search for section markers: “ECU SUMMARY INFO”, “ECU DETAILS” (works on different pages/positions).
Normalization
Merging broken lines, removing line breaks and extra spaces.
Correct merging of multi-line cells and columns.
Aligning parameter names NAME=VALUE in ECU DETAILS.
Single data model
{
"vin": "...",
"publication_date": "...",
"summary": [
{"ecu":"ABS","name":"Anti Lock Brakes","bus_type":"CAN-CH", "flash_part":"...", "current_vin":"...", "original_vin":"...", "part":"..."},
...
],
"details": [
{"ecu":"ABS","params":{"Param1":"Value1","Param2":"Value2", ...}},
...
]
}
Export to CSV/Excel (separate sheets Summary / Details) and/or SQLite.
Quality control
Validations (mandatory columns, number of rows, unique ECUs).
Logs and small unit tests to easily maintain the process.
Result: reproducible script + launch instructions, ready files (JSON/CSV/Excel/SQLite).
Ready to complete in 3–4 days. Cost — to be agreed upon after clarifying the format of the final export and possible nuances of other PDF markup.
Thank you!
Alla
-
4 days135 USD
124 4 days135 USDProposed technical approach
1. Tools and libraries:
PyMuPDF (fitz) or pdfplumber for text extraction from PDF
pandas for structuring tabular data
re (regex) for pattern identification and parsing NAME=VALUE formats
Custom functions for merging and normalizing data
2. Solution architecture:
…
Function identifying sections based on "stubborn" labels
Parser for main tables with automatic record count detection
Module merging data from both main tables
Parser for ECU DETAILS section with flexible NAME=VALUE format
Dynamic object generator (dictionary/DataFrame) with complete data structure
3. Functionalities:
Support for varying record counts in tables
Flexible positioning of tables in the document
Data validation and cleansing
Export to formats facilitating further work (JSON, CSV, pickle)
My experience
I have experience in:
Processing PDF documents using Python
Parsing and structuring data from various formats
Working with pandas, numpy libraries, and data analysis tools
Creating scalable solutions for document processing automation
I offer:
✅ Complete solution - ready Python script with documentation
✅ Flexibility - code adapting to different document structures
✅ Code quality - readable, commented code with error handling
✅ Tests - usage examples and validation on provided files
✅ Support - assistance with implementation and possible modifications
I am ready to start work immediately.
-
2 days135 USD
834 8 0 2 days135 USDIf you need to work easily on Python later, ideally parsing into a database like SQL Lite, if you want, I can parse into xlsx format Excel. Write to me for discussion, I can implement this functionality.
-
2 days135 USD
340 2 days135 USDHello!
I have prepared a fully working solution for your task.
🔹 The script **parse\_ecu\_pdf.py** is written in Python and does exactly what you described:
* Reads PDF (both local and via link) using PyMuPDF.
* Finds tables **ECU SUMMARY INFO** and **ECU SUMMARY INFO (CONT...)**, parsing them line by line.
* Finds blocks **ECU DETAILS** and collects `NAME=VALUE` pairs.
* Combines everything into a dynamic object: each summary line is automatically supplemented with the `details` dictionary.
…
🔹 The output is a ready JSON structure that is convenient to work with in Python.
📌 Usage:
```bash
python parse_ecu_pdf.py path/to/your_ecu_report.pdf
```
The screen displays JSON with data for each ECU.
The script is universal — the number of rows in the tables can be any, and the location of the tables (at the beginning or end of the PDF) does not matter.
I am ready to connect and help you with running, testing on your PDF, and any modifications.
-
3 days134 USD
656 9 0 3 days134 USDGood afternoon, Artem!
In general, the task is clear, but for an accurate answer regarding deadlines and price, I would like to clarify some questions that arose after analyzing your task.
Please write in private messages — we will discuss the details and your wishes.
P.S. I am guided by your budget, but I think I can fit into a smaller amount — after clarifying the details, I will offer an exact figure.
-
1 day135 USD
309 1 day135 USDHello, I am ready to complete your task as a training practice, message me privately and we will discuss all the details.
-
2 days161 USD
1117 4 0 2 days161 USDHello!
I can create a tool in Python that reads your PDF files, finds ECU SUMMARY tables regardless of their location in the file, and combines them into one complete dataset. Immediately after that, the script will also gather ECU DETAILS tables and link each set of parameters NAME=VALUE with the corresponding ECU record. This way, you will get one clean object that combines all the information and can be used directly in Python or converted to a DataFrame for analysis.
I will not rely on page numbers or fixed positions. Instead, the script will look for reference labels and section titles, so it will work even if the layout or the number of records changes. The final structure will be flexible, easy to query, and exportable to JSON or CSV for later use.
Thank you!
-
1 day135 USD
232 1 0 1 day135 USDHello, Artem!
I am a Python developer with a lot of experience working with PDFs.
In what format would you prefer to work on the output?
Feel free to write, we will discuss your project!
Best regards,
Andriy
-
2 days135 USD
1328 35 1 2 days135 USDGood evening. I worked with PDF and did a similar task. But in PHP, on a VPS on Linux. There are nuances, I don't know how it is for you, but sometimes the tables do not go sequentially, and then it will not be easy. We need to try.
-
1 day135 USD
2248 18 3 1 day135 USDGood evening, Artem. I am working on automation in Python. I can develop a parser for you with the necessary functionality, as one of the options, after processing the function will return a list of dictionaries []{} that you can work with further in the code. If you are interested - write to me, I will be happy to discuss the details.
-
3 days135 USD
3298 70 1 3 days135 USDHello.
I have experience in automatic data extraction from pdf.
We can discuss.
-
1 day135 USD
200 1 0 1 day135 USDGood day! 👋
I have carefully reviewed your task.
I can complete it quickly and fully according to your requirements.
There are a few points I would like to clarify.
I am ready to start immediately after agreeing on the details.
-
1 day161 USD
1562 7 0 1 day161 USDGood day!
My name is Roman, and I am among the top 6 developers in the "Artificial Intelligence and Machine Learning" category out of ~1600 specialists on the platform.
I guarantee:
- Fast and quality task execution
- Strict adherence to deadlines
- Regular communication throughout the entire process
I would be happy to discuss the details of your project in private messages.
-
1 day135 USD
267 1 day135 USDI have already completed your assignment—I can demonstrate it.
Current freelance projects in the category Data Parsing
Consultation on parsing Instagram account subscribersHello. It is necessary to conduct a preliminary assessment of the feasibility of the following task. I have a list of Instagram accounts. The goal is to obtain contact information (primarily email addresses) of users who follow these accounts. Previously, I encountered companies… Data Parsing ∙ 3 days 14 hours back ∙ 12 proposals |
A specialist is needed to find contacts of decision-makers in Ukraine.It is necessary to gather a database (or ready database) of contacts of decision-makers (DMs) in companies in Ukraine. Information Gathering, Data Parsing ∙ 3 days 19 hours back ∙ 17 proposals |
Need to scrape data from LinkedInWe need to scrape data from LinkedIn based on our list. For each entry, we need to find and collect available data if it exists on the LinkedIn profile, including the profile picture on the LinkedIn social network, email address, links to social media, company website, and… Data Parsing ∙ 4 days back ∙ 28 proposals |
Parsing and classification of dataWe are looking for a developer to implement a system for collecting and structuring data from open sources. We have a database of small business owners in the USA, which contains the person's name, company name, address, and state. It is necessary to build a process for… Web Programming, Data Parsing ∙ 4 days 1 hour back ∙ 41 proposals |
Svitlahata
17 USD
It is necessary to import 1819 products from the XML/YML feed of Prom.ua to OpenCart 3. A ready XML file is available, which contains product names, descriptions, prices, photos, specifications, manufacturers, and categories. Requirements: import all products to OpenCart… Content Management Systems, Data Parsing ∙ 5 days 4 hours back ∙ 34 proposals |