Hello, there is a task, you need to convert 2 pdf files into json. Specifically, on the page https://hsc.gov.ua/index/poslugi/vidacha-posvidchennya-vodiya/pitannya-ta-ispit-z-pdr/ in the first paragraph, there are Examination Questions and answers. These need to be converted into json.

Questions in json should have the following structure:

{

"topics": [

{

"topicId": 1,

"topicTitle": "1. GENERAL PROVISIONS",

"questions": [

{

"questionId": 1,

"text": "1. Tramway track – an element of the road intended for the movement of rail vehicles, which is limited in width:",

"image": "1_1.png",

"options": [

{

"optionId": 1,

"text": "1) Specially designated paving of the tram line."

{ "optionId": 2, "text": "2) Road marking." },

{

"optionId": 3,

"text": "3) Answers indicated in items 1 and 2."

}

]

{

"questionId": 2,

"text": "2. An element of the road intended for the movement of rail vehicles, which is limited in width by specially designated paving of the tram line or road marking is:",

"image": null,

"options": [

{

"optionId": 1,

"text": "1) Tramway track."

{ "optionId": 2, "text": "2) Improved surface." },

{

"optionId": 3,

"text": "3) Lane."

}

]

}

]

{

"topicId": 2,

"topicTitle": "2. DUTIES AND RIGHTS OF DRIVERS OF MECHANICAL VEHICLES",

"questions": [

{

"questionId": 1,

"text": "1. If it is impossible to take measures to provide first aid to the victim and to call an emergency medical team, as well as if there is no possibility to send the victim to a medical institution with the help of others from the accident scene, the driver is obliged:",

"image": "2_1.png",

"options": [

{

"optionId": 1,

"text": "1) answer"

{ "optionId": 2, "text": "2) answer" },

{

"optionId": 3,

"text": "3) answer"

{

"optionId": 4,

"text": "4) answer"

}

]

}

]

}

]

}

So nothing complicated, repeat the structure from the pdf, where there are 63 main topics (with subtopics like 16, 16.1, and 16.2, maybe more, subtopics in json do not need to be made in topic 16, they should also go as a separate topic, so that they can be processed correctly later.)

Regarding the photos. The photo is the topic number_question number, for example, topic 34 and question 8 has a photo, the name of the photo will be 34_8.png. All photos should be saved in one folder "images" with the corresponding names like 34_8.png so that they can be marked from the processed json.

It is important that the order of all topics, questions, and answers is preserved, as in the second pdf we have answers, they need to be formatted like this:

{

"1": {

"1": 3,

"2": 1,

"3": 4

"2": {

"1": 2,

"2": 3

}

That is, topic 1, the first question will have answer 3, and so on. Then I will process the questions with answers. Therefore, the structure and sequence are important!.

If you have any questions, feel free to ask! We can also discuss the price. The project is not commercial, but a pet project, so within reason)!

And please, before responding to the task, try to see if you can do it, so the task doesn't get stuck, thank you!

Proposals 13 Rejected 10

Arseny Antonenko

14 0

Budget: 1000 UAH Deadline: 4 days

I will parse the PDF and convert it to JSON according to your structure using a Python script, preserving all images and the correct sequence of responses. I have extensive experience in data parsing and automation, and I will do everything cleanly and without errors. I will complete the work in 2 days, and a budget of 1000 UAH is acceptable.

Do you have any restrictions on the resolution for the extracted .png images?

Mykhailo Zamryka

11 1

Budget: 1000 UAH Deadline: 2 days

I can do it. The format is clear: the 1st PDF - topics/questions/options + image field (topic_question.png), all images in /images with names like 34_8.png, subtopics (16.1/16.2/…) - as separate topics, I will keep the order of topics/questions/options 1:1. The 2nd PDF - a separate JSON with answers in your structure { "topicId": { "questionId": correctOptionId } }. Before starting, I will do a quick test on 1-2 topics and show you a piece of JSON + 2-3 images with correct names, so you can confirm that everything is readable and matches.

Anton T.

Winning proposal

345 0

Budget: 1000 UAH Deadline: 2 days

I played around with PDF files, wrote test scripts, one extracts text in a structured form (topics, questions, and answer options) and images into separate files linked to the topic and question from the PDF, the other converts all of this into JSON. In some places, the PDF is somewhat crooked (or maybe I am crooked, anything is possible), hence the work is in two stages, between which some things are checked and corrected manually, but still, the vast majority of the data is extracted correctly.

Angelina Popova

13 0

Budget: 1000 UAH Deadline: 2 days

Good day! I have experience, I once did a similar task. I will complete it quickly!

Tetyana S.

75 4

Budget: 1600 UAH Deadline: 2 days

Hello! I can do it in this format!!! Feel free to contact me!!!!!!!!!

Evgeniy Chupakhin

39 0

Budget: 2000 UAH Deadline: 1 day

Good day! I have reviewed the task, I will do it quickly today. I have already had experience converting to json from pdf.

Artem Boldirev

0 0

Projects -
Rating -
Rating 165

Budget: 3000 UAH Deadline: 2 days

Good day!
I have reviewed both PDFs and the JSON structure. I am ready to convert the questions and answers while fully preserving the order of topics and numbering.

I will place all images in a separate folder with correct file names for further markup.

I guarantee compliance with the structure and sequence of data.

The deadline for completion is 2 days.

Maksim N.

8 1

Budget: 1000 UAH Deadline: 1 day

Hello!

I have experience in processing PDF files and converting them to JSON. Recently, I worked on a project where I converted documents into machine-readable formats using Python and the PyPDF2 or pdfplumber library.

I implement PDF file parsing, extract information, and structure it in JSON format as specified. I will use parsing libraries to ensure data accuracy and save images in the "images" folder with the correct names.

My work guarantees convenience for further processing and the correct format for your project. I am ready to start!

Hryhorii Pelipenko

26 4

Budget: 1000 UAH Deadline: 1 day

It is possible to analyze the original document, and it is even interesting; however, the proposed reward is clearly too low, don't you think? I would analyze it and recode it. The price is not realistic for now. It will take 3-5 days to try to complete the task several times; success is predetermined.

Oleksandr Zabolotnii

2 0

Projects -
Rating -
Rating 651

Budget: 997 UAH Deadline: 3 days

Hello!
I can convert the PDF from the website hsc.gov.ua into JSON with the required structure, including images in the images folder.
I will preserve the exact order of topics and questions.
Execution: 3–5 days, cost: 1000 UAH.

The list does not show proposals concealed by the client or freelancer with a Plus profile, as well as proposals violating rules

Vladyslav Drobyshev

111 1

Budget: 1750 UAH Deadline: 3 days

Hello, Konstantin! Your project looks interesting and clear, and converting PDF to JSON is important for any of your future applications. As an experienced web designer and specialist in processing various file formats, I am ready to apply my knowledge for the accurate reproduction of data in the required format. My approach involves careful preservation of the order of topics, questions, and answers, which is critical for further work with them. Let's discuss how I can help you implement your project efficiently and on time!

Oleksandr Stinkovyi

117 0

Budget: 2000 UAH Deadline: 1 day

Hello.

I can write a script in NodeJS. I am ready to take it on. Write to me, we will discuss.

Viktor N.

24 1

Budget: 2500 UAH Deadline: 1 day

Good day. I will complete it within a few hours. Please contact me. I will start immediately.

Andrii K.

1 0

Projects -
Rating -
Rating 184

Budget: 4000 UAH Deadline: 3 days

Hello! I am interested in your project.
I have experience in automating the processing of large volumes of data. For your task (539 pages of questions + 11 pages of answers), I have developed a special algorithm in Python that allows:
To guarantee 100% accuracy: to eliminate the human factor when converting thousands of questions.
To automatically name images: to save and link photos according to the mask {topicId}_{questionId}.png exactly according to your structure.
To maintain hierarchy: to correctly process all topics and subtopics in the specified JSON format.

I am ready to perform a demo version (the first topic) for free, so you can verify the quality and speed of my approach. If you are interested in automated processing with guaranteed results — I would be happy to discuss the details.

Aleksandr M.

38 0

Budget: 2400 UAH Deadline: 2 days

I can run it through GPT. If the prompt works correctly, then everything should be good according to the picture.

Oleksandr Ovsiannikov

21 0

Budget: 5000 UAH Deadline: 1 day

Hello. As you requested, I tried to analyze the pdf in advance. The entire difficulty lies in the second pdf (with the answers), which is not just a scan, but also a poor scan, where even some numbers are so unclear that they cannot be visually restored by a person. If we only had the first pdf, it would be cheap and very quick, but due to the second pdf, the price becomes significantly higher (about 70% of the total price is for the second pdf with the answers), but everything is doable. It can be done within a day.

Oleksandr D.

70 1

Budget: 5000 UAH Deadline: 3 days

Hello.
I have reviewed the PDF.
I am engaged in writing scripts from scratch for specific tasks. I will be able to complete the project.

The list does not show proposals concealed by the client or freelancer with a Plus profile, as well as proposals violating rules

Current freelance projects in the category Data Parsing

Reddit API

Web Programming 26 proposals 30 July

Not specified
Website parsing, bypassing Akamai protection

Python 39 proposals 30 July

Not specified
Парсинг маркетплейсу

Bot Development 31 proposals 30 July

38 USD
Automatic import of prices from supplier price lists in Google Sheets CSV format to HOROSHOP

Web Programming 66 proposals 29 July

111 USD
Find and add links to photos for 900 airplanes.

Data Processing 33 proposals 28 July

45 USD

Kostyantin Budankov
Kyiv, Ukraine

Projects 1
Rating -
Rating 40

Kostyantin Budankov

Anton T.

Arseny Antonenko

Mykhailo Zamryka

Anton T.

Angelina Popova

Tetyana S.

Evgeniy Chupakhin

Artem Boldirev

Maksim N.

Hryhorii Pelipenko

Oleksandr Zabolotnii

Proposals are currently absent

Vladyslav Drobyshev

Oleksandr Stinkovyi

Viktor N.

Andrii K.

Aleksandr M.

Oleksandr Ovsiannikov

Oleksandr D.

Current freelance projects in the category Data Parsing

Reddit API

Website parsing, bypassing Akamai protection

Парсинг маркетплейсу

Automatic import of prices from supplier price lists in Google Sheets CSV format to HOROSHOP

Find and add links to photos for 900 airplanes.