Mapping PDF Data to Excel Columns with Coordinates
Job Description for Freelancer
We are looking for a skilled freelancer to develop a script or tool that extracts specific data fields from a Hebrew-language PDF document and populates these into an Excel file. The task involves utilizing OCR to read the Hebrew text and mapping specific keywords to predefined columns in Excel. The output Excel file must be formatted according to our requirements.
Requirements
1. Develop a script or tool to:
a. Extract text from a PDF file in Hebrew using OCR technology (e.g., Tesseract with Hebrew support).
b. Identify and extract the following data fields based on specific keywords (provided below).
c. Map these fields to specific columns in an Excel file.
d. Save the populated Excel file with the data in the correct format.
2. Ensure the tool/script can:
a. Handle multiple loan plans in the PDF.
b. Perform basic error handling for missing or incorrect data in the PDF.
Data Fields and Mapping
The script should extract the following fields from the PDF and insert them into the specified columns in the Excel file.
PDF Field | Excel Column | Notes |
ืืกืคืจ ืืกืืื (Loan No.) | Column A | Sequential numbering for each loan plan. |
ืกืื ืืืืืืื (Loan Type) | Column B | Example: ืจืืืืช ืงืืืขื ืฆืืืื ืืืื โ Fixed Interest, Linked to Index. |
ืงืืืืช (Exists) | Column C | Optional field, typically indicates a period for variable loans, e.g., '5'. |
ืฆืืื ืืืื (Index Linkage) | Column D | ืฆืืื ืืืื โ Linked to Index, ืื ืฆืืื โ Not Linked. |
ืกืืื (Amount) | Column E | Loan amount from the PDF, e.g., 318,857. |
ืจืืืืช (Interest Rate) | Column F | The interest rate or margin for Prime loans, e.g., 2.65% or +0.24%. |
ืชืืจืื ืกืืื (End Date) | Column G | End date of the loan, e.g., 15/08/2039. |
ืืืืจ ืืืืฉื (Monthly Payment) | Column H | Monthly payment from the PDF, e.g., 1,879 โช. |
ืชืฉืืื ืืืืฉื (ืืืืฉื) (Monthly Payment Calculated) | Column I | Leave empty; calculated automatically in Excel. |
ืืื ืกืืืืงืื (Repayment Type) | Column J | e.g., ืฉืคืืฆืจ (Spitzer) or ืงืจื ืฉืืื (Equal Principal). |
ืขืืืช ืชืฉืืื ืืจืืฉ | ืืืจ ืง | ืขืืืช ืชืฉืืื ืืจืืฉ, ืื ืืืื ื, ืืืฉื, 8,841 โช. |
ืฉืืืื ืืืืฉืื
1. ืืฉืชืืฉื ืืืื ืืืืืืืช OCR ืืื ืืืืฅ ืืงืกื ืืขืืจืืช ืื-PDF. ืืืฉื, Tesseract ืขื ืืืืืช ืืฉืคื ืืขืืจืืช.
2. ืืื ืืช ืืฉืืืช ืฉืฆืืื ื ืืืืฆืขืืช ืืืืืช ืืคืชื ืืขืืจืืช:
- ืกืื ืืืืืื: ืืคืฉ ืืช ืืืงืกื ืืื ืืื ืืืฆืื ืืช ืกืื ืืืืืืื.
- ืกืืื (ืืืืช): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืืกืคืจ ืืืื, ืืืฉื, 318,857.
- ืจืืืืช (ืจืืืืช): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืกืื ืืืื (%).
- ืชืืจืื ืกืืื (ืชืืจืื ืกืืื): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืคืืจืื ืชืืจืื (ืืืฉื, 09/10/2047).
- ืืืืจ ืืืืฉื (ืชืฉืืื ืืืืฉื): ืืคืฉ ืืืืช ืืคืชื ืื ืื ืืกืคืจ ืขื ืกืื ืืืืืข โช.
- ืฆืืื ืืืื (ืงืืฉืืจ ืืื ืืงืก): ืืคืฉ ืืช ืืืืืื ืฆืืื ืืืื (ืืงืืฉืจ ืืืื ืืงืก) ืื ืื ืฆืืื (ืื ืืงืืฉืจ).
3. ืืืืืก ืืช ืื ืชืื ืื ืฉืืืืฆื ืืขืืืืืช ืืืชืืืืืช ืืืงืกื.
4. ืฉืืืจ ืืช ืงืืืฅ ืืืงืกื ืืคืืจืื ืฉืฆืืื.
i have few pdf files, each pdf is in deffrent format, and i want the windows app. but each pdf have the same type of data that i need, and the data should be exported to the same xl file.
in total i have 5 pdf files. each file have diffrent format.
Applications 2
-
1 day150โฏUSD
580 2 0 1 day150โฏUSDHello, Adi Yancher
Hope you are well.
Nice challenge for a modern programmer..
The simplest way is to use python libraries for your case.
Investigated this issue in python. Nice. Has some drawbacks. As well as distribution size.
Investigated apache pdfbox for java. More consize results.
… There is no need for OCR-ing. But not investigated apache POI yet.
Anyway, there should be a graphical user interface, parsing/mapping rules, and text to excel mapping as well as. common pdf document templates and so on.
May be thinking ahead for web-service for another users.
Solution:
OS platforms - where java is running on.
java, apache pdfbox 3 apache POI
Optionally: tesseract-ocr.
Optionally: tesseract-ocr. model extra training.
Will be glad to hear your mind.
With regards.
-
7 days200โฏUSD
282 7 days200โฏUSDHello! ๐
I am excited to assist with your project of creating a powerful and efficient script/tool for extracting Hebrew-language data from PDF files and populating it into Excel. Here's why I'm the perfect fit for this task:
Why Choose Me?
Expertise in OCR and Automation:
I have extensive experience with Tesseract OCR, including working with Hebrew-language support, ensuring high accuracy in text extraction.
Proven track record in creating automated tools for complex data extraction and mapping.
… Flawless Data Mapping:
I specialize in designing scripts that accurately identify keywords in PDFs and map them to the correct Excel columns, following predefined structures.
I can implement error handling for missing or incorrect data, ensuring clean and reliable output.
Attention to Detail:
I understand the importance of handling multiple loan plans and parsing complex fields like dates, interest rates, and monthly payments (including symbols like โช).
I'll make sure your Excel output is professionally formatted and meets your requirements.
Efficient Workflow and Communication:
I work quickly without compromising quality. The task will be delivered on time with updates at every stage.
I value clear communication and will ensure the tool/script is easy to use and customizable for future needs.
My Plan to Execute Your Task
OCR Setup:
Configure Tesseract with Hebrew language support to extract text efficiently from PDF files.
Data Extraction and Mapping:
Develop a robust script to identify specific fields like Loan Type, Amount, Interest Rate, and map them to their respective Excel columns.
Error Handling and Formatting:
Build error-checking mechanisms to handle missing data gracefully.
Format the output Excel file with precision, ensuring it aligns with your specifications.
Delivery and Support:
Provide a fully functional and tested script or tool.
Offer post-delivery support to ensure seamless integration and use.
Letโs Get Started!
Iโm confident that I can deliver a high-quality solution tailored to your needs. Letโs discuss your requirements further, and Iโll make sure this project exceeds your expectations. I look forward to collaborating with you! ๐
-
7 days200โฏUSD
5086 198 0 7 days200โฏUSDHello,
I can implement a solution for your project as a .exe program for Windows.
However, I have a few questions to discuss:
- Do all PDF files follow the same template as the attached file?
- To better understand the information connections, could you record a video showing how you manually fill in an Excel file based on a PDF file?
-
7 days200โฏUSD
359 1 0 7 days200โฏUSDHello, the task is clear, and Iโm ready to take it on. I look forward to collaborating with you!
Current freelance projects in the category Databases & SQL
Wordpress WooCommerce problem with product updates
12โฏUSD
Website on Wordpress WooCommerce. Recently, there has been a problem when changing photos / updating product descriptions, it takes a very long time to update after clicking the "refresh" button or the changes in the products do not update at all and it just hangs until youโฆ Databases & SQL, Web Programming โ 19 hours 19 minutes ago โ 11 proposals |
Set up the import of products and prices from the supplier on OpenCart 3.1. Set up the import of products, content, and prices from the supplier (Ukrainian, Russian). Prices should be updated automatically, and new products should also be loaded if they appear. The supplier has three options for exports, but you may suggest your own. The supplier'sโฆ Databases & SQL, Data Parsing โ 20 hours 37 minutes ago โ 3 proposals |
Oracle Simphonyะะฐะฑะตะทะฟะตัะธัะธ ะตัะตะบัะธะฒะฝั ัะพะฑะพัั ะฒััั ะฑัะทะฝะตั-ะฟัะพัะตััะฒ ัะตััะพัะฐะฝั ัะปัั ะพะผ ะฝะฐะปะฐัััะฒะฐะฝะฝั ัะธััะตะผะธ Oracle Simphony ะะฐัะฐะทั ั ะฑะฐะทะพะฒะฐ ะบะพะฝััะณััะฐััั,ะฟะพัััะฑะฝะพ ะฟัะดะบะปััะธัะธ ัะฐ ะบะพะฝััะณัััะฒะฐัะธ ััั ะฟัะธัััะพั Oracle MICROS Simphony Kitchen Diasplay Services Cloud Service Kitchen Display Client โ 1 ััโฆ Databases & SQL โ 21 hours 44 minutes ago โ 1 proposal |
Make an automatic export web scraping
24โฏUSD
It is necessary to extract images and descriptions from the goodshop website for further use. Database Details Databases & SQL, Data Parsing โ 1 day 16 hours ago โ 21 proposals |
Working with XML files (data from different files into one)Hello! There are 35 files (1 - main, 34 - additional). The final result will be 34 Excel table files (to easily transfer to Word) with the following columns: 1. Serial number 2. Disease name (in the main file this is "Name", the name in Ukrainian) 3. OrphaCode (this is commonโฆ Databases & SQL, Data Processing โ 1 day 20 hours ago โ 11 proposals |