20,000+ recipes from the site
The task was put to spart more than 20,000 different recipes and images to them, from the site allrecipesTOCKAcom, and then put it on the site c CMS WordPress.
In the process of implementation it turned out that part of the pages with recipes is repeated.
(on the example of these 1000 records https://drive.google.com/open?id=1nkKVW-QHG_alIYCuDJ0ipurb1t-vhMLG - you can see that only 65 lines are unique, the remaining 935 are repeated).
Therefore, the parser had to go through 100,000 pages, to the target in unique 20,000+ recipes.
After that, the recipes with the images were imported with a self-written script to the WordPress CMS site.
In the process of import some data were transformed according to the customer's task, for example, the time of preparation was converted from the form of CH:MM to MMM, i.e. if it was "2 hours 30 minutes" - it became "150 minutes"
#Parsing #Web-Parsing #Wordpress #Cms #Import
In the process of implementation it turned out that part of the pages with recipes is repeated.
(on the example of these 1000 records https://drive.google.com/open?id=1nkKVW-QHG_alIYCuDJ0ipurb1t-vhMLG - you can see that only 65 lines are unique, the remaining 935 are repeated).
Therefore, the parser had to go through 100,000 pages, to the target in unique 20,000+ recipes.
After that, the recipes with the images were imported with a self-written script to the WordPress CMS site.
In the process of import some data were transformed according to the customer's task, for example, the time of preparation was converted from the form of CH:MM to MMM, i.e. if it was "2 hours 30 minutes" - it became "150 minutes"
#Parsing #Web-Parsing #Wordpress #Cms #Import