Need a specialist in Big Data, ML, NLP, RAG
1. Goal
Create a tool for analyzing large arrays of text data (correspondence on a dating site) with the aim of identifying key psychological characteristics of clients.
2. Brief description of the task
There are a couple of hundred large files (several hundred pages in each file, possibly up to tens of thousands of messages per page), containing client correspondence.
It is necessary to analyze this data in terms of a number of psychological indicators.
The RAG (Retrieval-Augmented Generation) approach and vector databases are planned to be used for effective search and analysis.
3. Requirements for the specialist
Big Data: understanding of working with large volumes of data (Spark/Hadoop or analogs).
NLP / ML:
Experience in text processing (preprocessing, tokenization, cleaning) and applying ML models.
Knowledge of modern text analysis methods (sentiment, topic modeling, classification).
Vector Databases / RAG:
Practical experience with Pinecone, Milvus, Weaviate (or analogs) and embedding models.
Ability to build a Retrieval-Augmented Generation pipeline (embedding generation, similarity search, LLM integration).
Additionally:
Ability to document solutions, explain tool choices.
DevOps skills (Docker/Kubernetes) are welcome.
4. Expected tasks
Data preparation: reading, cleaning, structuring correspondence.
Embedding generation: model tuning (BERT, Sentence-BERT, OpenAI Embeddings, etc.).
Using vector DB: loading embeddings, searching for relevant fragments.
RAG analytics: integration with a language model to extract key characteristics of client behavior.
Assessment of psychological criteria: tuning/creating models that allow highlighting the necessary aspects. (there is a ready-made list of criteria by which clients will need to be assessed)
5. Key skills and technologies
Language: Python (pandas, scikit-learn, PySpark, HuggingFace Transformers or others).
Infrastructure: knowledge of cloud services (AWS, GCP, Azure or others) or local Big Data clusters.
Vector databases: Pinecone, Milvus, Weaviate or FAISS or others.
NLP libraries: spaCy, NLTK, as well as tools for lemmatization and cleaning.
ML pipeline: knowledge of MLOps tools (Airflow, MLflow, Docker or others).
-
2 days100 USD2 days100 USD
Hello, I have experience working with Big Data, and I even once participated in a Kaggle competition for data analysis and won a prize there. I am also familiar with Kubernetes, various machine learning methods, and Python libraries, having practical experience in solving tasks. I look forward to receiving a private message.
Current freelance projects in the category AI & Machine Learning
Creation of an AI assistant for communication with ClientsIt is necessary to create an AI assistant for communication with Clients. The chat window will be located on our website, followed by communication with the bot. Questions about products, settings, capabilities, etc. In cases where the information is unknown or the request can… AI & Machine Learning, AI Consulting ∙ 14 hours 51 minutes back ∙ 28 proposals |
I am looking for a video editor who creates AI videos.Creation of AI videos for dentists and other experts Objective: To create short vertical videos for Instagram Reels, Facebook Reels, TikTok, and YouTube Shorts that explain complex topics in simple language and hold the viewer's attention through a combination of AI animation… AI & Machine Learning ∙ 22 hours 24 minutes back ∙ 2 proposals |
I am looking for a mentor/teacher for ComfyUI for online learning (working through RunPod)
16 USD
Hello. I am looking for a practicing specialist and mentor who can help me master working with ComfyUI. The main feature of my request is that the work will be done entirely in the cloud, without downloading the program to a local computer. I plan to rent a graphics card through… AI & Machine Learning ∙ 1 day 8 hours back ∙ 1 proposal |
AI agent of sports nutrition technologistThe agent helps develop formulations for new sports nutrition products — protein bars, proteins, pre-workouts, isotonic drinks, bars, etc. The main feature: the agent knows the legislation of different countries and automatically takes it into account when creating the… AI & Machine Learning, Web Programming ∙ 1 day 9 hours back ∙ 59 proposals |
Integration of the analytics system with the Database in Tables
111 USD
The current analytics system needs to be brought to a stable working state. Currently, data from CRM, telephony, and advertising accounts is pulled through Supabase via MSP into Google Sheets, but some processes still require manual control. This needs to be eliminated.1.… AI & Machine Learning, Bot Development ∙ 1 day 23 hours back ∙ 32 proposals |