Parsing the archive from the website
There is a website where archives are published every day. It is necessary to write a script that would visit the page every day and download the archive.
The problem is that the list of archives is loaded via js as I understand, and there is no direct link to the archive, as the link is formed in the process and has a temporary action code. Scripts with gpt from the local computer download the archive, but from the server, it does not work. More precisely, the archive is downloaded, but it does not contain xml as it should, but html with an error.
Here is one of the pages from which the archive needs to be downloaded
https://data.uspto.gov/bulkdata/datasets/trtdxfap?fileDataFromDate=2025-06-28&fileDataToDate=2025-08-27
There are a total of 2 pages.
Please take a look at how feasible it is to bypass this and download the archive every day?
Client's review of cooperation with Volodimir Nikolsky
Parsing the archive from the websiteI handled the task quickly and was always in touch! I recommend and will reach out for help again!
Freelancer's review of cooperation with Artur K.
Parsing the archive from the websiteThank you for the collaboration, wonderful client!
-
Good day. It's not a problem at all. I will write the code in Python Selenium, and you will be able to run it both on your local machine and on the server. The protection on the site is not very strong, so bypassing the issues will not be a problem. I am ready to complete it by the end of the weekend. I will be happy to collaborate.
-
Hello, I have been developing parsers for about two years, I implement everything quickly and efficiently, and I will also set up daily work.
-
945 31 1 Hello. I am ready to discuss in more detail in private messages. Where is the file located exactly at this link and how are the links formed too) I have drafted for this link and it works.
-
751 9 0 Hello, I will create a parser for you in C# to download archives. I looked at the page linked, it is possible to download, we can discuss in more detail in private messages. Here is a small test: https://postimg.cc/z3CkYK9J
-
1722 4 0 Yes, the task is real, but there are nuances because the link to the archive is generated dynamically via JavaScript and has a temporary token. Therefore, a simple requests.get() from the server does not work — the server returns HTML with an error, not XML.
-
3313 70 1 Hello, I briefly familiarized myself with the site, there is an API for interaction, I can help. The script will be in JavaScript. There is a server to host the script, do you need help with this?
-
418 2 0 I propose a simple service that will check the relevance of uploaded files once a day/or at a specified interval.
Also possible on request: started - received updates
can be formatted in an API for access from your programs.
Stack: nodejs, possibly docker
-
1616 8 0 Hello,
I am a developer in the field of AI/ML & WEB SCRAPING. I can complete your project. Write to me, and we will discuss.
-
882 29 1 Good afternoon.
I have extensive experience in writing parsers. I will do it quickly and efficiently.
I am waiting for your message.
-
10946 224 0 1 Good afternoon) I looked at the site and realized that it can be easily downloaded with Python) I don't see any protection as such.
They forgot to hide the paths to the API and don't even require a token, so it will be easy.
I need a couple of clarifications and we can get started.
Here you can see that all the files are laid bare: https://ibb.co/N8GvqM1
-
3392 84 0 Ready to take it on.
Need to clarify the order details, write!
I use python, uv, github, docker.
-
6256 74 1 Good day. I looked at the structure of the site. There are places to get links to archives for any date.
Write to me, we will discuss your task, I will be happy to help.
Current freelance projects in the category Data Parsing
A specialist in Telegram promotion is required.
29 USD
Tasks: invite real users from the username database to new chats and send messages to the target database. Only quality traffic and work with a live audience are of interest — performers using bots, fake engagement, or low-quality methods are requested NOT TO DISTURB. Work… Data Parsing, Social Media Marketing (SMM) ∙ 1 day 20 hours back ∙ 8 proposals |
Collection of B2B database of companies in Germany
40 USD
Goal: To obtain a list of potential employers (clients) for B2B mailing. Region: Munich (München) + radius of 50 km. Required niches: Construction companies (Bauunternehmen) Food enterprises (Lebensmittelhersteller, meat processing plants, bakeries) Logistics and… Data Parsing, Lead Generation & Sales ∙ 1 day 22 hours back ∙ 31 proposals |
Consultation on parsing Instagram account subscribersHello. It is necessary to conduct a preliminary assessment of the feasibility of the following task. I have a list of Instagram accounts. The goal is to obtain contact information (primarily email addresses) of users who follow these accounts. Previously, I encountered companies… Data Parsing ∙ 5 days 14 hours back ∙ 12 proposals |
A specialist is needed to find contacts of decision-makers in Ukraine.It is necessary to gather a database (or ready database) of contacts of decision-makers (DMs) in companies in Ukraine. Information Gathering, Data Parsing ∙ 5 days 19 hours back ∙ 18 proposals |
Need to scrape data from LinkedInWe need to scrape data from LinkedIn based on our list. For each entry, we need to find and collect available data if it exists on the LinkedIn profile, including the profile picture on the LinkedIn social network, email address, links to social media, company website, and… Data Parsing ∙ 6 days 1 hour back ∙ 28 proposals |