Data parsing
Set up individually A-parser, Content Downloader, X-parser or any other parser for parsing a list of URLs of a single blog-article
Input data:
- URLs of info blog pages
Output data:
- text with html markup in file.txt format (1 file = 1 text, see example)
- saved images in a separate /images/ folder located inside the folder with text files txt
Parameters:
- save only text, images, and headings (interested only in the article body + metatags). do not take: content at the beginning, author, commercial and advertising inserts
- take only the first image of the slider
- save tags: title, description, h1 - h6, i, p, blockquote, ol, ul, alt, strong, b
- save Description text at the beginning in the tag {desc}text{/desc}
- save text hyperlinks inside the text to external sources
- save links to themselves in relative form, but without everything before the slash (including the slash itself), i.e. when site.ru/category/url/ - it should look like: <a href="gripp/">anchor</a> where "gripp/" is the url. (site.ru/category/ - the beginning of the url is not needed, only the url tail is needed)
- since we save relative links, we also need to save the tails of the URLs of the pages themselves, for example, we scan the page: https://site.ru/rubrika/rubcy/ means inside the text, for example, make a tag with the url tail [url]rubcy[/url] (we only take the url tail without slashes)
- do not save links with anchors, unnecessary symbols like curly and square brackets at the end of the sentence [1], authors, advertisements
- separate code lines into paragraphs so that the entire parsed code is not one line.
- need to make similar highlighted texts in the form of the <blockquote> tag, which is a quote in WordPress
- the last thing we take in the article is the source and frequently asked questions.
- for saving categories in tags:
[category]mat.category[/category]
[category]category[/category]
take only the first (parent) and last (regular) category
Example of the finished text: https://share.cleanshot.com/w40l2mwj
Current freelance projects in the category Data Parsing
Database of websites on WooCommerceIt is necessary to compile a database of Ukrainian online store websites on WooCommerce with the contact information provided on the sites. Only active websites (indicator: updated catalog/content, working domain) Table format - website address, phone number, e-mail. Data Parsing ∙ 2 days 7 hours back ∙ 21 proposals |
Create a dashboard in https://airtable.com/ for the performance of advertising creatives from Facebook ads.Full specification https://docs.google.com/document/d/1_n_oYRNZWYxalUA---DM5AD1b5ZSrtePw5J4G42svGw/edit?usp=sharing Databases & SQL, Data Parsing ∙ 3 days 21 hours back ∙ 18 proposals |
Creation of an Excel file for uploading products to the websites of other partners.I am interested in creating an Excel table with all parameters. Here is the website - https://heiztechnik.com.ua/ And the positions I am interested in to be transferred: Manual boilers: 1) TIS UNI 15-95 kW (10) pcs 2)TIS HARD 150-500 kW (7) pcs Pellet boilers: 1)TIS PELLET… Data Parsing ∙ 4 days 1 hour back ∙ 35 proposals |
A developer is required for parsing the catalog and automating data import.Detailed technical specifications in the attached document Please indicate the estimated cost and timeline in your response Do you have experience working with parsing large catalogs What possible difficulties or limitations do you see in this task Databases & SQL, Data Parsing ∙ 4 days 4 hours back ∙ 41 proposals |
Find a product feed (Google Merchant XML) for a website on OpenCart
16 USD
It is necessary to find a direct link to the active product feed (XML) of a competitor for Google Merchant Center Platform (CMS): OpenCart / ocStore Find the original feedRequirements for the result: Working link to the XML file Python, Data Parsing ∙ 4 days 9 hours back ∙ 25 proposals |