Voice AI assistant for Central Asian countries.
Vacancy: Lead AI / Fullstack Engineer — Project "Cataleya" (Voice-to-Voice AI)
Project Name: Cataleya
Format: Project work / Remote (with access to a local cluster)
Stack: PersonaPlex (Moshi), PyTorch, TensorRT-LLM, FastAPI, WebRTC, Telegram Mini App.
Hardware Location: Uzbekistan Kazakhstan (TAS-IX), clusters based on NVIDIA RTX 4090.
Project Description
Cataleya is an innovative multimodal ecosystem "Voice-to-Voice" (S2S), creating the effect of live communication. We are developing an AI assistant that combines the roles of an expert teacher (chemistry, history, biology), an empathetic conversational partner, and a simultaneous translator. The system works directly with audio tokens, providing unprecedented interaction speed.
Current Status: The base model (in English) is working stably. It needs to be adapted to regional specifics and packaged into a high-tech application.
Key Tasks
1. Core AI & ML (Adaptation and Intelligence)
Multilingualism: Cross-lingual Fine-tuning of the model for native support of Russian, Uzbek (considering dialects), and Kazakh languages.
Low Latency: Optimization of inference to achieve a response delay of 0.07 seconds.
Smart RAG (100 GB): Building a vector knowledge base on educational materials with a "triple-check" data mechanism to eliminate hallucinations.
NVIDIA Stack: Optimization of inference for RTX 4090 (vLLM, TensorRT-LLM, INT4/FP8 quantization).
2. Telegram Mini App & Real-time Web
Streaming Audio: Implementation of real-time audio transmission via WebRTC / WebSockets (without using standard voice messages).
Full-Duplex UI: An interface that supports interruptions (Interruptibility) with instant AI response.
Vocal ID: Implementation of voice biometrics for user authorization.
Billing: Integration of payment systems (Payme, Click) for subscription management.
3. Architecture and Optimization
Highload: Designing a system with horizontal scaling capabilities.
AEC & Noise Suppression: Software echo and noise suppression for quality communication in any environment.
Traffic Localization: Optimization of routing for operation within the TAS-IX network.
Candidate Requirements
AI / ML Engineer:
Experience with End-to-end speech models (Moshi, AudioLM or analogs).
Fluent in PyTorch and experience with transformers.
Skills in model fine-tuning for new language groups.
Ability to work with CUDA 12.x and NVIDIA optimization libraries.
Fullstack Developer:
Expert knowledge of WebRTC / WebSockets for audio streaming.
Experience in developing Telegram Mini Apps (TMA).
Professional proficiency in FastAPI and React / Next.js.
Understanding of low-latency system specifics.
Payment to be agreed upon after discussion