Mapping PDF Data to Excel Columns with Coordinates
Job Description for Freelancer
We are looking for a skilled freelancer to develop a script or tool that extracts specific data fields from a Hebrew-language PDF document and populates these into an Excel file. The task involves utilizing OCR to read the Hebrew text and mapping specific keywords to predefined columns in Excel. The output Excel file must be formatted according to our requirements.
Requirements
1. Develop a script or tool to:
a. Extract text from a PDF file in Hebrew using OCR technology (e.g., Tesseract with Hebrew support).
b. Identify and extract the following data fields based on specific keywords (provided below).
c. Map these fields to specific columns in an Excel file.
d. Save the populated Excel file with the data in the correct format.
2. Ensure the tool/script can:
a. Handle multiple loan plans in the PDF.
b. Perform basic error handling for missing or incorrect data in the PDF.
Data Fields and Mapping
The script should extract the following fields from the PDF and insert them into the specified columns in the Excel file.
PDF Field | Excel Column | Notes |
ืืกืคืจ ืืกืืื (Loan No.) | Column A | Sequential numbering for each loan plan. |
ืกืื ืืืืืืื (Loan Type) | Column B | Example: ืจืืืืช ืงืืืขื ืฆืืืื ืืืื โ Fixed Interest, Linked to Index. |
ืงืืืืช (Exists) | Column C | Optional field, typically indicates a period for variable loans, e.g., '5'. |
ืฆืืื ืืืื (Index Linkage) | Column D | ืฆืืื ืืืื โ Linked to Index, ืื ืฆืืื โ Not Linked. |
ืกืืื (Amount) | Column E | Loan amount from the PDF, e.g., 318,857. |
ืจืืืืช (Interest Rate) | Column F | The interest rate or margin for Prime loans, e.g., 2.65% or +0.24%. |
ืชืืจืื ืกืืื (End Date) | Column G | End date of the loan, e.g., 15/08/2039. |
ืืืืจ ืืืืฉื (Monthly Payment) | Column H | Monthly payment from the PDF, e.g., 1,879 โช. |
ืชืฉืืื ืืืืฉื (ืืืืฉื) (Monthly Payment Calculated) | Column I | Leave empty; calculated automatically in Excel. |
ืืื ืกืืืืงืื (Repayment Type) | Column J | e.g., ืฉืคืืฆืจ (Spitzer) or ืงืจื ืฉืืื (Equal Principal). |
ืขืืืช ืชืฉืืื ืืจืืฉ | ืืืจ ืง | ืขืืืช ืชืฉืืื ืืจืืฉ, ืื ืืืื ื, ืืืฉื, 8,841 โช. |
ืฉืืืื ืืืืฉืื
1. ืืฉืชืืฉื ืืืื ืืืืืืืช OCR ืืื ืืืืฅ ืืงืกื ืืขืืจืืช ืื-PDF. ืืืฉื, Tesseract ืขื ืืืืืช ืืฉืคื ืืขืืจืืช.
2. ืืื ืืช ืืฉืืืช ืฉืฆืืื ื ืืืืฆืขืืช ืืืืืช ืืคืชื ืืขืืจืืช:
- ืกืื ืืืืืื: ืืคืฉ ืืช ืืืงืกื ืืื ืืื ืืืฆืื ืืช ืกืื ืืืืืืื.
- ืกืืื (ืืืืช): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืืกืคืจ ืืืื, ืืืฉื, 318,857.
- ืจืืืืช (ืจืืืืช): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืกืื ืืืื (%).
- ืชืืจืื ืกืืื (ืชืืจืื ืกืืื): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืคืืจืื ืชืืจืื (ืืืฉื, 09/10/2047).
- ืืืืจ ืืืืฉื (ืชืฉืืื ืืืืฉื): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืืกืคืจ ืขื ืกืื ืืืืืข โช.
- ืฆืืื ืืืื (ืงืืฉืืจ ืืื ืืงืก): ืืคืฉ ืืช ืืืืืื ืฆืืื ืืืื (ืืงืืฉืจ ืืืื ืืงืก) ืื ืื ืฆืืื (ืื ืืงืืฉืจ).
3. ืืืืืก ืืช ืื ืชืื ืื ืฉืืืืฆื ืืขืืืืืช ืืืชืืืืืช ืืืงืกื.
4. ืฉืืืจ ืืช ืงืืืฅ ืืืงืกื ืืคืืจืื ืฉืฆืืื.
i have few pdf files, each pdf is in deffrent format, and i want the windows app. but each pdf have the same type of data that i need, and the data should be exported to the same xl file.
in total i have 5 pdf files. each file have diffrent format.
Applications 2
-
1 day150โฏUSD
469 2 0 1 day150โฏUSDHello, Adi Yancher
Hope you are well.
Nice challenge for a modern programmer..
The simplest way is to use python libraries for your case.
Investigated this issue in python. Nice. Has some drawbacks. As well as distribution size.
Investigated apache pdfbox for java. More consize results.
… There is no need for OCR-ing. But not investigated apache POI yet.
Anyway, there should be a graphical user interface, parsing/mapping rules, and text to excel mapping as well as. common pdf document templates and so on.
May be thinking ahead for web-service for another users.
Solution:
OS platforms - where java is running on.
java, apache pdfbox 3 apache POI
Optionally: tesseract-ocr.
Optionally: tesseract-ocr. model extra training.
Will be glad to hear your mind.
With regards.
-
7 days200โฏUSD
316 7 days200โฏUSDHello! ๐
I am excited to assist with your project of creating a powerful and efficient script/tool for extracting Hebrew-language data from PDF files and populating it into Excel. Here's why I'm the perfect fit for this task:
Why Choose Me?
Expertise in OCR and Automation:
I have extensive experience with Tesseract OCR, including working with Hebrew-language support, ensuring high accuracy in text extraction.
Proven track record in creating automated tools for complex data extraction and mapping.
… Flawless Data Mapping:
I specialize in designing scripts that accurately identify keywords in PDFs and map them to the correct Excel columns, following predefined structures.
I can implement error handling for missing or incorrect data, ensuring clean and reliable output.
Attention to Detail:
I understand the importance of handling multiple loan plans and parsing complex fields like dates, interest rates, and monthly payments (including symbols like โช).
I'll make sure your Excel output is professionally formatted and meets your requirements.
Efficient Workflow and Communication:
I work quickly without compromising quality. The task will be delivered on time with updates at every stage.
I value clear communication and will ensure the tool/script is easy to use and customizable for future needs.
My Plan to Execute Your Task
OCR Setup:
Configure Tesseract with Hebrew language support to extract text efficiently from PDF files.
Data Extraction and Mapping:
Develop a robust script to identify specific fields like Loan Type, Amount, Interest Rate, and map them to their respective Excel columns.
Error Handling and Formatting:
Build error-checking mechanisms to handle missing data gracefully.
Format the output Excel file with precision, ensuring it aligns with your specifications.
Delivery and Support:
Provide a fully functional and tested script or tool.
Offer post-delivery support to ensure seamless integration and use.
Letโs Get Started!
Iโm confident that I can deliver a high-quality solution tailored to your needs. Letโs discuss your requirements further, and Iโll make sure this project exceeds your expectations. I look forward to collaborating with you! ๐
-
7 days200โฏUSD
5195 210 0 7 days200โฏUSDHello,
I can implement a solution for your project as a .exe program for Windows.
However, I have a few questions to discuss:
- Do all PDF files follow the same template as the attached file?
- To better understand the information connections, could you record a video showing how you manually fill in an Excel file based on a PDF file?
Current freelance projects in the category Databases & SQL
Need a Power BI specialist to build management reporting based on BAS Accounting CORPAbout the CompanyWe are a distributor of international sports brands in Ukraine. Accounting is maintained in BAS Accounting CORP.We are looking for a specialist who can help build a management reporting system for the company's management based on Power BI.Important: we areโฆ Databases & SQL, Accounting Services โ 1 day 6 hours back โ 4 proposals |
Technical task: Integration of Telegram chatbot with BAS1. General Description It is necessary to implement the integration of the chatbot with the BAS system for the transfer and recording of data about products (orders). 2. Input Data (sent by the chatbot): Group ID Product name (with product code) Product price 3. Logic ofโฆ Enterprise Resource Planning (ERP), Databases & SQL โ 1 day 9 hours back โ 17 proposals |
1C data integrationOrganize quality preparation and data transfer from 1C to BigQuery for further use in Looker:Organize the data according to the required fields.Prepare a clear structure of tables and intermediate datasets on which dashboards will be built.Set up data loading, gather keyโฆ Databases & SQL, Data Processing โ 3 days back โ 9 proposals |
Development of a simple controlling and management record system in Google Sheets "turnkey"Task name: Development of a simple controlling and management record system in Google Sheets "turnkey" for a construction company in Poland (+ online training and implementation assistance) About the company, context, and my main problem Hello, I am looking for an experiencedโฆ Databases & SQL, Accounting Services โ 4 days 5 hours back โ 36 proposals |
Development of an analytical Power BI dashboard
45โฏUSD
This is our request, we need a person who understands Power BI: Screen 1: Strategic Cockpit (Financial Health of the Plant) Goal: To understand in 5 seconds, "where are we losing money and how much?". KPI Tiles (Top Bar): Overall margin (Actual vs Plan) in %. Amount of "lostโฆ Databases & SQL โ 4 days 9 hours back โ 13 proposals |