Convert pdf to json
Hello, there is a task, you need to convert 2 pdf files into json. Specifically, on the page https://hsc.gov.ua/index/poslugi/vidacha-posvidchennya-vodiya/pitannya-ta-ispit-z-pdr/ in the first paragraph, there are Examination Questions and answers. These need to be converted into json.
Questions in json should have the following structure:
{
"topics": [
{
"topicId": 1,
"topicTitle": "1. GENERAL PROVISIONS",
"questions": [
{
"questionId": 1,
"text": "1. Tramway track – an element of the road intended for the movement of rail vehicles, which is limited in width:",
"image": "1_1.png",
"options": [
{
"optionId": 1,
"text": "1) Specially designated paving of the tram line."
},
{ "optionId": 2, "text": "2) Road marking." },
{
"optionId": 3,
"text": "3) Answers indicated in items 1 and 2."
}
]
},
{
"questionId": 2,
"text": "2. An element of the road intended for the movement of rail vehicles, which is limited in width by specially designated paving of the tram line or road marking is:",
"image": null,
"options": [
{
"optionId": 1,
"text": "1) Tramway track."
},
{ "optionId": 2, "text": "2) Improved surface." },
{
"optionId": 3,
"text": "3) Lane."
}
]
}
]
},
{
"topicId": 2,
"topicTitle": "2. DUTIES AND RIGHTS OF DRIVERS OF MECHANICAL VEHICLES",
"questions": [
{
"questionId": 1,
"text": "1. If it is impossible to take measures to provide first aid to the victim and to call an emergency medical team, as well as if there is no possibility to send the victim to a medical institution with the help of others from the accident scene, the driver is obliged:",
"image": "2_1.png",
"options": [
{
"optionId": 1,
"text": "1) answer"
},
{ "optionId": 2, "text": "2) answer" },
{
"optionId": 3,
"text": "3) answer"
},
{
"optionId": 4,
"text": "4) answer"
}
]
}
]
}
]
}
So nothing complicated, repeat the structure from the pdf, where there are 63 main topics (with subtopics like 16, 16.1, and 16.2, maybe more, subtopics in json do not need to be made in topic 16, they should also go as a separate topic, so that they can be processed correctly later.)
Regarding the photos. The photo is the topic number_question number, for example, topic 34 and question 8 has a photo, the name of the photo will be 34_8.png. All photos should be saved in one folder "images" with the corresponding names like 34_8.png so that they can be marked from the processed json.
It is important that the order of all topics, questions, and answers is preserved, as in the second pdf we have answers, they need to be formatted like this:
{
"1": {
"1": 3,
"2": 1,
"3": 4
},
"2": {
"1": 2,
"2": 3
}
}
That is, topic 1, the first question will have answer 3, and so on. Then I will process the questions with answers. Therefore, the structure and sequence are important!.
If you have any questions, feel free to ask! We can also discuss the price. The project is not commercial, but a pet project, so within reason)!
And please, before responding to the task, try to see if you can do it, so the task doesn't get stuck, thank you!
Work results
Client's review of cooperation with Anton T.
Convert pdf to jsonThank you, Anton, for your help; he completed the task very quickly and efficiently, with the desired result!
Freelancer's review of cooperation with Kostyantin Budankov
Convert pdf to jsonThe client quickly reviewed the results, provided clarifications, and after the final version successfully closed the project.
-
I will parse the PDF and convert it to JSON according to your structure using a Python script, preserving all images and the correct sequence of responses. I have extensive experience in data parsing and automation, and I will do everything cleanly and without errors. I will complete the work in 2 days, and a budget of 1000 UAH is acceptable.
Do you have any restrictions on the resolution for the extracted .png images?
-
1090 11 1 I can do it. The format is clear: the 1st PDF - topics/questions/options + image field (topic_question.png), all images in /images with names like 34_8.png, subtopics (16.1/16.2/…) - as separate topics, I will keep the order of topics/questions/options 1:1. The 2nd PDF - a separate JSON with answers in your structure { "topicId": { "questionId": correctOptionId } }. Before starting, I will do a quick test on 1-2 topics and show you a piece of JSON + 2-3 images with correct names, so you can confirm that everything is readable and matches.
-
5928 345 0 I played around with PDF files, wrote test scripts, one extracts text in a structured form (topics, questions, and answer options) and images into separate files linked to the topic and question from the PDF, the other converts all of this into JSON. In some places, the PDF is somewhat crooked (or maybe I am crooked, anything is possible), hence the work is in two stages, between which some things are checked and corrected manually, but still, the vast majority of the data is extracted correctly.
-
580 11 0 Good day! I have experience, I once did a similar task. I will complete it quickly!
-
3012 73 4 2 Hello! I can do it in this format!!! Feel free to contact me!!!!!!!!!
-
2556 38 0 Good day! I have reviewed the task, I will do it quickly today. I have already had experience converting to json from pdf.
-
184 Good day!
I have reviewed both PDFs and the JSON structure. I am ready to convert the questions and answers while fully preserving the order of topics and numbering.
I will place all images in a separate folder with correct file names for further markup.
I guarantee compliance with the structure and sequence of data.
The deadline for completion is 2 days.
-
1101 7 1 Hello!
I have experience in processing PDF files and converting them to JSON. Recently, I worked on a project where I converted documents into machine-readable formats using Python and the PyPDF2 or pdfplumber library.
I implement PDF file parsing, extract information, and structure it in JSON format as specified. I will use parsing libraries to ensure data accuracy and save images in the "images" folder with the correct names.
My work guarantees convenience for further processing and the correct format for your project. I am ready to start!
-
841 26 4 1 It is possible to analyze the original document, and it is even interesting; however, the proposed reward is clearly too low, don't you think? I would analyze it and recode it. The price is not realistic for now. It will take 3-5 days to try to complete the task several times; success is predetermined.
-
654 2 0 Hello!
I can convert the PDF from the website hsc.gov.ua into JSON with the required structure, including images in the images folder.
I will preserve the exact order of topics and questions.
Execution: 3–5 days, cost: 1000 UAH.
-
Hello, Konstantin! Your project looks interesting and clear, and converting PDF to JSON is important for any of your future applications. As an experienced web designer and specialist in processing various file formats, I am ready to apply my knowledge for the accurate reproduction of data in the required format. My approach involves careful preservation of the order of topics, questions, and answers, which is critical for further work with them. Let's discuss how I can help you implement your project efficiently and on time!
-
Good day. I will complete it within a few hours. Please contact me. I will start immediately.
-
213 1 0 Hello! I am interested in your project.
I have experience in automating the processing of large volumes of data. For your task (539 pages of questions + 11 pages of answers), I have developed a special algorithm in Python that allows:
To guarantee 100% accuracy: to eliminate the human factor when converting thousands of questions.
To automatically name images: to save and link photos according to the mask {topicId}_{questionId}.png exactly according to your structure.
To maintain hierarchy: to correctly process all topics and subtopics in the specified JSON format.
I am ready to perform a demo version (the first topic) for free, so you can verify the quality and speed of my approach. If you are interested in automated processing with guaranteed results — I would be happy to discuss the details.
-
8495 38 0 1 I can run it through GPT. If the prompt works correctly, then everything should be good according to the picture.
-
1860 21 0 Hello. As you requested, I tried to analyze the pdf in advance. The entire difficulty lies in the second pdf (with the answers), which is not just a scan, but also a poor scan, where even some numbers are so unclear that they cannot be visually restored by a person. If we only had the first pdf, it would be cheap and very quick, but due to the second pdf, the price becomes significantly higher (about 70% of the total price is for the second pdf with the answers), but everything is doable. It can be done within a day.
-
3356 70 1 Hello.
I have reviewed the PDF.
I am engaged in writing scripts from scratch for specific tasks. I will be able to complete the project.
-
10130 117 0 Hello.
I can write a script in NodeJS. I am ready to take it on. Write to me, we will discuss.
Current freelance projects in the category Data Parsing
Create a dashboard in https://airtable.com/ for the performance of advertising creatives from Facebook ads.Full specification https://docs.google.com/document/d/1_n_oYRNZWYxalUA---DM5AD1b5ZSrtePw5J4G42svGw/edit?usp=sharing Databases & SQL, Data Parsing ∙ 16 hours 9 minutes back ∙ 13 proposals |
Creation of an Excel file for uploading products to the websites of other partners.I am interested in creating an Excel table with all parameters. Here is the website - https://heiztechnik.com.ua/ And the positions I am interested in to be transferred: Manual boilers: 1) TIS UNI 15-95 kW (10) pcs 2)TIS HARD 150-500 kW (7) pcs Pellet boilers: 1)TIS PELLET… Data Parsing ∙ 20 hours 17 minutes back ∙ 30 proposals |
A developer is required for parsing the catalog and automating data import.Detailed technical specifications in the attached document Please indicate the estimated cost and timeline in your response Do you have experience working with parsing large catalogs What possible difficulties or limitations do you see in this task Databases & SQL, Data Parsing ∙ 23 hours 10 minutes back ∙ 33 proposals |
Find a product feed (Google Merchant XML) for a website on OpenCart
16 USD
It is necessary to find a direct link to the active product feed (XML) of a competitor for Google Merchant Center Platform (CMS): OpenCart / ocStore Find the original feedRequirements for the result: Working link to the XML file Python, Data Parsing ∙ 1 day 4 hours back ∙ 19 proposals |
Parsing products from supermarketsNeed a specialist (parsing + Google Sheets + automation). Goal: Create a system for automatic retrieval and updating of food prices from Silpo and NOVUS supermarkets with data output to Google Sheets. What needs to be implemented: Create a main Google Sheets table with a product… Data Parsing, Information Gathering ∙ 1 day 12 hours back ∙ 43 proposals |