Improve the parser in Python
The parser is written in Python. The library implements data collection for readability.
Documentation: https://pypi.org/project/readability/
The developer recorded the technical specifications in voice, explaining the problem. The specifications are in the archive. 1 voice note for Readability and 2 voice notes for Bootstrap. Also, when I accept you into the project, I will pass on any of your questions to the developer.
He managed to do it only this way. For beginners, this is probably not feasible. Therefore, I am reaching out to professionals.
Regarding Bootstrap. He also tried to implement it, but Bootstrap produced worse results compared to Readability. There was a lot of duplicated content that was the same. And it picked up extra unnecessary dirty code.
About the code itself: the code is written in Python. Requests to the server are made through aiohttp,
because the project is asynchronous, meaning requests are sent to the server in parallel, not sequentially.
The build is done using the PyInstaller library. I run the .exe program, and the command line opens. The parser itself opens in the browser, locally at address 127: and so on.
To evaluate the code and the cost of work. And you do not write a number out of thin air. I understand you. You will write a conditional one. Therefore, a convenient option. Connect to my PC. Look at the code. Understand that you can improve the parsing results and solve the task so that it picks not only text but also images from the websites. Then you will update your bid under the project, and I will accept you into the project. I will allocate a reserve of funds. And only then! Because! If you do not look at the code and write any bid. What will be the outcome? My time wasted and funds? And a negative review for you? I think you don't need that. I think we clarified this. Now, such a result for example from 10 websites. Out of 10 websites, it only picks text from 5 websites, and from the other 5 websites, it picks text + images. It picks text from all 10 websites. I think the logic is clear. What is needed is for it to also pick images just like text from all websites.
I don't care how to implement it through Readability or Bootstrap. What matters to me is that the parser picks data more accurately. Through Readability, it picks text from each site, but not images from each. Therefore, the task was to improve it or cross it with another library, algorithm, technology. That would pick images. And it would pick text.
Or to do it entirely through Bootstrap. But only so that it picks both text + images from all websites. In short, it should work on Bootstrap no worse than on Readability.
I can provide access through Anydesk, I can compile and build it into bild.exe myself. You just need to log into my PC, evaluate the code. And see if you can make changes in my code. On bs4. If you think that this will improve data collection and solve my problem, then no questions asked. If we test together and see that your technology is better, I will immediately choose you for the project. I will allocate a reserve of funds, you will make changes to the code. We will test. If the results are better, I will accept the project.
Applications 3
-
3893 24 0 Hello,
I am ready to take on your Python parser project for data collection using the Reability library. I have experience in developing code in Python and using aiohttp for asynchronous requests. Bundled application launch via PyInstaller is also in my arsenal.
To evaluate the code and develop a strategy for collecting both text and images from websites, I invite you to connect to my PC via anydesk. Upon a deeper review of the code and testing, we can make the necessary changes and improvements to achieve the desired result.
My hourly rate is $16. I look forward to your response for further collaboration.
Best regards,
… Maxim
-
Доброго дня Александр
Вашу програму можна покращити, але це не буде саме те, що Ви хочете.
Розбирати правильно абсолютно будь який сайт неможливо, або близько до цього.
Як мінімум -- на данний час.
В те щоб зробити readability вкладено багато грошей і років часу.
Якщо у Вас є якийсь перелік сайті(лінків) які Ви регулярно скрейпите -- то надішліть мені. Я подивлюсь який відсоток вийде покращити.
Зараз я трохи зайнятий і не зможу відповідати миттєво
-
Current freelance projects in the category Data Parsing
Need a parser for the online store https://www.lcsc.com/It is necessary to regularly (once a month, or upon script launch) obtain up-to-date information about the products available in the store. https://www.lcsc.com/ from the catalog of all sections.… Data Parsing ∙ 4 hours 32 minutes back ∙ 26 proposals |
OpenCart — rental catalog of special equipment
135 USD
OpenCart — Equipment Rental Catalog Need to launch an equipment rental catalog on OpenCart. Theme: excavators cherry pickers forklifts generators cranes scaffolding other construction equipment. It is preferable that you already have a ready-made template or developments… Web Programming, Data Parsing ∙ 20 hours 59 minutes back ∙ 46 proposals |
Transfer the program - the server where the program was located has crashed (officially permitted parsing of government data)
47 USD
Hello! My client has encountered the case described below. We need help transferring to a new server and testing the program. It would be better to have a programmer who understands parsing. Software & Server Configuration, Data Parsing ∙ 1 day back ∙ 26 proposals |
Parsing and classification of a large array of imagesIt is necessary to implement a project for collecting and structuring a large array of architectural images from open web sources.The task includes: automated collection of images; uploading files in the highest available quality; classification of images by categories:… Python, Data Parsing ∙ 1 day 1 hour back ∙ 30 proposals |
Website parsingImplementation of 4 parsers (directory websites) is required. There is a technical specification, and there is a code example as a reference. The tasks include: Writing a parser Integrating a proxy Deduplication logic (transfer the logic from the example) Hashing logic based… Data Parsing ∙ 2 days 17 hours back ∙ 42 proposals |