Project Objective: The research objective pertains to comprehending Internet Presence, aiming to develop an Identity Scraper that can analyze and compile information available about an individual from publicly accessible sources. This investigative endeavor will be structured into multiple phases, each intended to yield concrete outcomes that can be presented to the board to demonstrate advancement and practicality. Input parameters may include a person's name, photograph, or phone number, while the anticipated output will consist of one or more profiles accompanied by a probability score indicating their authenticity.
Project Phases and Deliverables
Phase 1: Pilot Project - LinkedIn Data Scraping
Objective: Conduct a pilot project focusing on scraping LinkedIn data to identify individuals based on provided names. This phase also includes a scalable database to store the data collected. In Phase 1 the database will store the outputs of the search and also include basic information such as identifier, data source, etc.
Input Requirement: Basic name (e.g., John Doe).
Expected Output: Profiles corresponding to individuals sharing the provided name, including:
- First Name
- Surname
- Location
- Occupation
- Profile Picture
Deliverables for Phase 1:
- Data Collection Report:
- Detailed report on the data scraping process, including methods and tools used.
- Explanation of data sources and the legality of scraping LinkedIn data.
- Profile Compilation:
- A database of structured profiles matching the provided names
- Each profile to include the specified details as per expected output
- Probability Score Algorithm:
- Development of an algorithm to assign a probability score indicating the authenticity of the profiles.
- Documentation explaining the criteria and logic behind the probability scoring system.
- Presentation to the Board:
- A comprehensive presentation summarizing the findings and demonstrating the practicality of the Identity Scraper.
- Visuals and charts showcasing the effectiveness and accuracy of the tool in identifying profiles.
If Phase 1 is a success and approved, additional phases will be commissioned to the developers based on the success shown in Phase 1.
Additional Phases:
Phase 2: APIs, Search Engine scraping, expansion to other social media platforms (e.g., Facebook, Twitter, Instagram), and public databases with incorporation of additional input parameters (e.g., current employer, phone number, etc).
Phase 3: Implementation of a GUI and incorporation of Search Tables, Search Optimization, Advanced Analytics and Reporting features
Phase 4: Enhancements to the probability scoring algorithm based on feedback and results.
Requirements for Freelancers:
Proven experience in web scraping and data analysis.
Proficiency in programming languages such as Python, Java, or relevant alternatives.
Proficiency in RPA and automation tools such as UiPath, BluePrism, Pega, AA or relevant alternatives.
Familiarity with Social Platforms and APIs.
Ability to deliver detailed reports and presentations.
How to Apply:
– Provide a brief introduction about yourself and your experience in similar projects.
– Include examples of past projects relevant to web scraping and data analysis.
– Outline your proposed approach for Phase 1 of the project.
– Reply with your availability and expected timeline & budget.