Scrape the Steam site
Website https://store.steampowered.com/
All games in Ukrainian and English (regions Ukraine and USA), except for adult games (sex content).
Soundtracks, programs, and other content are not needed, only games.
Each category contains a subcategory, for example, Arcade, Casual, Open World, Shooters, and many others.
There is no list of links for parsing, it needs to be collected independently. The website has links to all categories and different operating systems.
1. The data should be in the form of json files (100 mb each), so that it can be uploaded to a mysql database for our structure using a script.
2. The first file consists of two sections: "categories" and "apps".
- "categories" contains an array of objects, which include "title" (category/section name) and link (full link to the category/section)
- "apps" contains an array of objects, which include all information about the application.
Each subsequent file does not contain the separation into "categories" and "apps", as the entire list of categories is in the first file. Further, only information about the applications.
3. The structure of the objects contained in the "apps" array:
link - Full link to the application/game
name_company - Developer
company_link - Full link to the developer
title - Application/game title
content - Description of the application/game (text description and all technical information: version, price, updates, languages, etc.)
categories - All categories to which the application/game belongs, for example ['Indie', 'Strategy']
rated - Age restrictions
update - Last update of the application/game
reviews - Number of reviews for the application/game
rating - Application rating
price - Cost (price)
size - Size of the application (megabytes)
compatibility - Compatibility (Windows, MacOS, Linux, SteamOS, etc.), for example ['Windows', 'macOS']
logo_image - Link to the image from the page - Logo of the application/game
logo_path - Name of the image from the page - Logo of the application/game
all_image - Link to the image from the page - “screenshots” of the application/game (first 3 screenshots, if there are no images, then such a game is skipped)
all_image_path - Name of the image from the page - “screenshots” of the application/game
3. For these files, we need a folder:
- With images (logo of the application/game + images from the application/game page)
In summary, we need to have:
- 1.json, 2.json, 3.json... - files with all information about all categories and applications/games
- images_1, images_2, images_3... - folders with images from the application/game pages, can be divided into 5 gb each
All games in Ukrainian and English (regions Ukraine and USA), except for adult games (sex content).
Soundtracks, programs, and other content are not needed, only games.
Each category contains a subcategory, for example, Arcade, Casual, Open World, Shooters, and many others.
There is no list of links for parsing, it needs to be collected independently. The website has links to all categories and different operating systems.
1. The data should be in the form of json files (100 mb each), so that it can be uploaded to a mysql database for our structure using a script.
2. The first file consists of two sections: "categories" and "apps".
- "categories" contains an array of objects, which include "title" (category/section name) and link (full link to the category/section)
- "apps" contains an array of objects, which include all information about the application.
Each subsequent file does not contain the separation into "categories" and "apps", as the entire list of categories is in the first file. Further, only information about the applications.
3. The structure of the objects contained in the "apps" array:
link - Full link to the application/game
name_company - Developer
company_link - Full link to the developer
title - Application/game title
content - Description of the application/game (text description and all technical information: version, price, updates, languages, etc.)
categories - All categories to which the application/game belongs, for example ['Indie', 'Strategy']
rated - Age restrictions
update - Last update of the application/game
reviews - Number of reviews for the application/game
rating - Application rating
price - Cost (price)
size - Size of the application (megabytes)
compatibility - Compatibility (Windows, MacOS, Linux, SteamOS, etc.), for example ['Windows', 'macOS']
logo_image - Link to the image from the page - Logo of the application/game
logo_path - Name of the image from the page - Logo of the application/game
all_image - Link to the image from the page - “screenshots” of the application/game (first 3 screenshots, if there are no images, then such a game is skipped)
all_image_path - Name of the image from the page - “screenshots” of the application/game
3. For these files, we need a folder:
- With images (logo of the application/game + images from the application/game page)
In summary, we need to have:
- 1.json, 2.json, 3.json... - files with all information about all categories and applications/games
- images_1, images_2, images_3... - folders with images from the application/game pages, can be divided into 5 gb each