Splitting PDF into pages. AWS lambda, S3.
S3 Trigger: Launch Lambda function when a new PDF is uploaded to S3.
Lambda Function:
Download PDF from S3.
Split PDF into individual pages.
For each page:
Extract text.
Find a template in the text based on the file name (username_patternname_pdfname.pdf).
Save results to a database or log.
Optionally delete PDF after processing.
Considerations:
Template complexity: Prepare for processing simple and complex templates.
Scalability: Possible use of queues (SQS) for processing large volumes.
Security: Configure access rights for S3 and Lambda.
Database: Choose the appropriate database (DynamoDB, RDS).
-
593 15 0 Hello, ready to do. Worked with AWS and also made a parser for invoice in pdf, it's almost the same, have some ideas regarding the template, it's not very good to take templates by file name, there may be problems which will be difficult to detect later. Also interesting to see what kind of pdf it is, and suggest immediately convenient saving of templates in S3 so that it can be convenient to add your own later.
-
278 2 1 Hello. I am a programmer with 10+ years of experience in commercial development of projects of various levels of complexity.
I have some experience working with AWS and can write a script for manipulating pdf files. I have not done template searching, so I will need to figure it out.
Feel free to contact me.
-
1928 29 1 1 Good day! Write to me with any questions, always in touch. I also recommend familiarizing yourself with my portfolio!
Current freelance projects in the category Python
Creation of a multifunctional bot in Telegram
22 USD
(I will provide all materials in private messages) Here is a detailed project description: After pressing the button /start The bot sends a text with rules and terms of use (under the text, the button ‘acknowledged’) After pressing the button, the next message is… Python, Bot Development ∙ 23 hours 56 minutes back ∙ 84 proposals |
Find a product feed (Google Merchant XML) for a website on OpenCart
16 USD
It is necessary to find a direct link to the active product feed (XML) of a competitor for Google Merchant Center Platform (CMS): OpenCart / ocStore Find the original feedRequirements for the result: Working link to the XML file Python, Data Parsing ∙ 3 days 13 hours back ∙ 24 proposals |
Development of a TikTok farm (content factory)It is necessary to develop a system for centralized management of multiple TikTok accounts with automatic content publishing, using individual proxies and simulating natural account activity. Functional Requirements1. Account Management Adding and removing TikTok accounts.… Python, Bot Development ∙ 4 days back ∙ 19 proposals |
Improve the performance of Claude Code and work on software development.I am currently developing CRM and Analytics software. I am using Claude Code, but I understand that the results are not the best in terms of changes. There are 2 tasks - Need help creating a preset for skills, MD, and so on to improve quality. Take verified ones that have been… AI & Machine Learning, Python ∙ 4 days 7 hours back ∙ 25 proposals |
OCR systemA system for recognizing text on postal envelopes (index for whom - only numbers). The text can sometimes be handwritten. Recognition of stamps (counting quantity and denomination) Python ∙ 4 days 10 hours back ∙ 28 proposals |