Parsing OLX
Hello everyone, I have Python code that goes through the olx.ua (internal) API and the GraphQL endpoint.
The problem is that the results are shown here with a delay of some n minutes.
I compared 2 endpoints, the results are the same in both olx.ua/api/v1 and in GraphQL.
I did the comparison with 30+ proxies, making batch requests to the URLs I need, and still the results change with a delay.
I understand that the problem lies in the CDN cache, or something else.
I have already analyzed the APK file, and it first calls GraphQL and has olx.ua/api/v1 as a fallback - there are no other methods or routes, so it is logical that the listings go first to the API, and then olx renders them on the search page.
This is confirmed by the listings in window.__PRERENDERED_STATE__.
The funny thing is that the results depend on the device and IP - if checked via PC.
On 5 different devices in different locations - it shows different results.
If anyone has information on how to bypass this cache (or I don't know what it is) - I can pay.
Client's review of cooperation with Roman K.
Parsing OLXTop performer, I recommend)
Without unnecessary words, he just gave what was needed.
I don't know what I would do next without Roman)
Freelancer's review of cooperation with Mihaylo K.
Parsing OLXThank you for the collaboration, I recommend.
-
370 1 0 👋 Hello. The task here is not about parsing itself, but about how to get current data from OLX without getting stuck on caching and GraphQL/CDN limitations.
I would first look at which specific point you are pulling the ads from and where the chain breaks — because in such tasks, the difference is made not by the code, but by the correct route to the data.
I have experience with similar scripts in Python, where it was necessary to bypass typical bottlenecks in the API and response structure.
📋 Here's what I'll do: I'll quickly check the requests, reproduce the problem, and then I'll gather a working version of the parsing without unnecessary noise.
I can start today. Send over what you already have — I'll figure out a solution right away.
-
4987 41 4 1 Good day!
I understand your problem with parsing OLX and the data delay due to CDN caching. I have significant experience working with Python, various APIs (including GraphQL), and proxies to bypass such limitations. I am ready to find an effective solution.
Message me privately, and we will discuss the details.
-
3978 106 0 Good day, please clarify which final URL you are using to obtain the list of ads; I can suggest a slightly different approach that yields results faster.
-
368 1 0 Good day, I can fix your software, I have experience in parsing and Python scripts, I await your response.
-
702 1 0 Hello! I have experience in fixing programs. I offer quality and fast work. Write to me.
-
927 5 0 Good day! We have experience working with the OLX API and understand the specifics of data caching on their servers. We implement this through analyzing response headers and optimizing requests to obtain current data without delays. We are ready to analyze your code and configure the parser to work correctly.
-
1382 8 0 Hello. I understand the issue with data delays when interacting with the OLX API and GraphQL, and this is a typical challenge for systems with distributed caching or CDN. My approach will focus on a deep analysis of HTTP headers, caching mechanisms, and simulating various client environments, going beyond simple proxy usage. I have ready-made developments and tools to identify and bypass such blocks, which will significantly speed up the search for an optimal solution. I suggest discussing all implementation details, final budget, and timelines in private messages.