Switch to English?
Yes
Переключитись на українську?
Так
Переключиться на русскую?
Да
Przełączyć się na polską?
Tak
Post your project for free and start receiving proposals from freelancers within minutes after publication!

Parsing text PDF with tables

Translated135 USD

Applications 1

Application viewing is only available registered users.
  1. 10738
     149  0

    1 day135 USD

    Good day. We have already discussed this project with you. I am ready to execute it. I will be happy to collaborate.

  2. 100  
    3 days135 USD

    Hello Artem, I can help you with your task of outputting data in the required format for further processing. I will be waiting for your message.

  3. 571    6  1   1
    2 days135 USD

    Good afternoon, Artem. There is a ready-made solution with a web interface that allows you to upload tables in PDF format and parse them. The program works great with your example, and after parsing, the data can be conveniently worked with in Python.

  4. 180  
    4 days135 USD

    Good day!
    I reviewed your PDF sample. I propose the following approach:

    Table extraction

    Main tool: pdfplumber (stable text extraction).

    Fallback for complex grids: camelot/tabula-py in lattice/stream mode.

    Automatic search for section markers: “ECU SUMMARY INFO”, “ECU DETAILS” (works on different pages/positions).

    Normalization

    Merging broken lines, removing line breaks and extra spaces.

    Correct merging of multi-line cells and columns.

    Aligning parameter names NAME=VALUE in ECU DETAILS.

    Single data model

    {
    "vin": "...",
    "publication_date": "...",
    "summary": [
    {"ecu":"ABS","name":"Anti Lock Brakes","bus_type":"CAN-CH", "flash_part":"...", "current_vin":"...", "original_vin":"...", "part":"..."},
    ...
    ],
    "details": [
    {"ecu":"ABS","params":{"Param1":"Value1","Param2":"Value2", ...}},
    ...
    ]
    }

    Export to CSV/Excel (separate sheets Summary / Details) and/or SQLite.

    Quality control

    Validations (mandatory columns, number of rows, unique ECUs).

    Logs and small unit tests to easily maintain the process.

    Result: reproducible script + launch instructions, ready files (JSON/CSV/Excel/SQLite).
    Ready to complete in 3–4 days. Cost — to be agreed upon after clarifying the format of the final export and possible nuances of other PDF markup.

    Thank you!
    Alla

  5. 124  
    4 days135 USD

    Proposed technical approach
    1. Tools and libraries:

    PyMuPDF (fitz) or pdfplumber for text extraction from PDF
    pandas for structuring tabular data
    re (regex) for pattern identification and parsing NAME=VALUE formats
    Custom functions for merging and normalizing data

    2. Solution architecture:

    Function identifying sections based on "stubborn" labels
    Parser for main tables with automatic record count detection
    Module merging data from both main tables
    Parser for ECU DETAILS section with flexible NAME=VALUE format
    Dynamic object generator (dictionary/DataFrame) with complete data structure

    3. Functionalities:

    Support for varying record counts in tables
    Flexible positioning of tables in the document
    Data validation and cleansing
    Export to formats facilitating further work (JSON, CSV, pickle)

    My experience
    I have experience in:

    Processing PDF documents using Python
    Parsing and structuring data from various formats
    Working with pandas, numpy libraries, and data analysis tools
    Creating scalable solutions for document processing automation

    I offer:
    ✅ Complete solution - ready Python script with documentation
    ✅ Flexibility - code adapting to different document structures
    ✅ Code quality - readable, commented code with error handling
    ✅ Tests - usage examples and validation on provided files
    ✅ Support - assistance with implementation and possible modifications

    I am ready to start work immediately.

  6. 834    8  0
    2 days135 USD

    If you need to work easily on Python later, ideally parsing into a database like SQL Lite, if you want, I can parse into xlsx format Excel. Write to me for discussion, I can implement this functionality.

  7. 340  
    2 days135 USD

    Hello!
    I have prepared a fully working solution for your task.

    🔹 The script **parse\_ecu\_pdf.py** is written in Python and does exactly what you described:

    * Reads PDF (both local and via link) using PyMuPDF.
    * Finds tables **ECU SUMMARY INFO** and **ECU SUMMARY INFO (CONT...)**, parsing them line by line.
    * Finds blocks **ECU DETAILS** and collects `NAME=VALUE` pairs.
    * Combines everything into a dynamic object: each summary line is automatically supplemented with the `details` dictionary.

    🔹 The output is a ready JSON structure that is convenient to work with in Python.

    📌 Usage:

    ```bash
    python parse_ecu_pdf.py path/to/your_ecu_report.pdf
    ```

    The screen displays JSON with data for each ECU.

    The script is universal — the number of rows in the tables can be any, and the location of the tables (at the beginning or end of the PDF) does not matter.

    I am ready to connect and help you with running, testing on your PDF, and any modifications.

  8. 656    9  0
    3 days134 USD

    Good afternoon, Artem!
    In general, the task is clear, but for an accurate answer regarding deadlines and price, I would like to clarify some questions that arose after analyzing your task.
    Please write in private messages — we will discuss the details and your wishes.
    P.S. I am guided by your budget, but I think I can fit into a smaller amount — after clarifying the details, I will offer an exact figure.

  9. 309  
    1 day135 USD

    Hello, I am ready to complete your task as a training practice, message me privately and we will discuss all the details.

  10. 1117    4  0
    2 days161 USD

    Hello!

    I can create a tool in Python that reads your PDF files, finds ECU SUMMARY tables regardless of their location in the file, and combines them into one complete dataset. Immediately after that, the script will also gather ECU DETAILS tables and link each set of parameters NAME=VALUE with the corresponding ECU record. This way, you will get one clean object that combines all the information and can be used directly in Python or converted to a DataFrame for analysis.

    I will not rely on page numbers or fixed positions. Instead, the script will look for reference labels and section titles, so it will work even if the layout or the number of records changes. The final structure will be flexible, easy to query, and exportable to JSON or CSV for later use.

    Thank you!

  11. 232    1  0
    1 day135 USD

    Hello, Artem!

    I am a Python developer with a lot of experience working with PDFs.
    In what format would you prefer to work on the output?

    Feel free to write, we will discuss your project!

    Best regards,
    Andriy

  12. 1328    35  1
    2 days135 USD

    Good evening. I worked with PDF and did a similar task. But in PHP, on a VPS on Linux. There are nuances, I don't know how it is for you, but sometimes the tables do not go sequentially, and then it will not be easy. We need to try.

  13. 2248    18  3
    1 day135 USD

    Good evening, Artem. I am working on automation in Python. I can develop a parser for you with the necessary functionality, as one of the options, after processing the function will return a list of dictionaries []{} that you can work with further in the code. If you are interested - write to me, I will be happy to discuss the details.

  14. 3298    70  1
    3 days135 USD

    Hello.
    I have experience in automatic data extraction from pdf.
    We can discuss.

  15. 200    1  0
    1 day135 USD

    Good day! 👋

    I have carefully reviewed your task.
    I can complete it quickly and fully according to your requirements.
    There are a few points I would like to clarify.

    I am ready to start immediately after agreeing on the details.

  16. 1562    7  0
    1 day161 USD

    Good day!
    My name is Roman, and I am among the top 6 developers in the "Artificial Intelligence and Machine Learning" category out of ~1600 specialists on the platform.
    I guarantee:
    - Fast and quality task execution
    - Strict adherence to deadlines
    - Regular communication throughout the entire process
    I would be happy to discuss the details of your project in private messages.

  17. 267  
    1 day135 USD

    I have already completed your assignment—I can demonstrate it.

  18. Another 3 proposals concealed

Current freelance projects in the category Data Parsing

Consultation on parsing Instagram account subscribers

Hello. It is necessary to conduct a preliminary assessment of the feasibility of the following task. I have a list of Instagram accounts. The goal is to obtain contact information (primarily email addresses) of users who follow these accounts. Previously, I encountered companies…

Data Parsing ∙ 3 days 11 hours back ∙ 12 proposals

A specialist is needed to find contacts of decision-makers in Ukraine.

It is necessary to gather a database (or ready database) of contacts of decision-makers (DMs) in companies in Ukraine.

Information GatheringData Parsing ∙ 3 days 16 hours back ∙ 17 proposals

Need to scrape data from LinkedIn

We need to scrape data from LinkedIn based on our list. For each entry, we need to find and collect available data if it exists on the LinkedIn profile, including the profile picture on the LinkedIn social network, email address, links to social media, company website, and…

Data Parsing ∙ 3 days 21 hours back ∙ 27 proposals

Parsing and classification of data

We are looking for a developer to implement a system for collecting and structuring data from open sources. We have a database of small business owners in the USA, which contains the person's name, company name, address, and state. It is necessary to build a process for…

Web ProgrammingData Parsing ∙ 3 days 23 hours back ∙ 41 proposals

Svitlahata

17 USD

It is necessary to import 1819 products from the XML/YML feed of Prom.ua to OpenCart 3. A ready XML file is available, which contains product names, descriptions, prices, photos, specifications, manufacturers, and categories. Requirements: import all products to OpenCart…

Content Management SystemsData Parsing ∙ 5 days 1 hour back ∙ 34 proposals

Client
Artem Ro
Poland Poland  1  0
Project published
9 months 14 days back
263 views
Tags
  • python
  • PDF