Switch to English version?

Yes

Переключитись на українську версію?

Так

Переключиться на русскую версию?

Да

Przełączyć się na polską wersję?

Tak

Login
Registration
- Welcome to Freelancehunt
  
  Work risk-free, saving time and money
  
  Login Registration

Post your project for free and start receiving proposals from freelancers within minutes after publication!

3333 USD

Voice AI assistant for Central Asian countries.

3333 USD

AI & Machine Learning, Databases & SQL

3 out of 3

suspended by administration

publication
open for proposals
suspended by administration

Project translated automatically. Log in or register, to view the original

Vacancy: Lead AI / Fullstack Engineer — Project "Cataleya" (Voice-to-Voice AI)

Project Name: Cataleya

Format: Project work / Remote (with access to a local cluster)

Stack: PersonaPlex (Moshi), PyTorch, TensorRT-LLM, FastAPI, WebRTC, Telegram Mini App.

Hardware Location: Uzbekistan Kazakhstan (TAS-IX), clusters based on NVIDIA RTX 4090.

Project Description

Cataleya is an innovative multimodal ecosystem "Voice-to-Voice" (S2S), creating the effect of live communication. We are developing an AI assistant that combines the roles of an expert teacher (chemistry, history, biology), an empathetic conversational partner, and a simultaneous translator. The system works directly with audio tokens, providing unprecedented interaction speed.

Current Status: The base model (in English) is working stably. It needs to be adapted to regional specifics and packaged into a high-tech application.

Key Tasks

1. Core AI & ML (Adaptation and Intelligence)

Multilingualism: Cross-lingual Fine-tuning of the model for native support of Russian, Uzbek (considering dialects), and Kazakh languages.

Low Latency: Optimization of inference to achieve a response delay of 0.07 seconds.

Smart RAG (100 GB): Building a vector knowledge base on educational materials with a "triple-check" data mechanism to eliminate hallucinations.

NVIDIA Stack: Optimization of inference for RTX 4090 (vLLM, TensorRT-LLM, INT4/FP8 quantization).

2. Telegram Mini App & Real-time Web

Streaming Audio: Implementation of real-time audio transmission via WebRTC / WebSockets (without using standard voice messages).

Full-Duplex UI: An interface that supports interruptions (Interruptibility) with instant AI response.

Vocal ID: Implementation of voice biometrics for user authorization.

Billing: Integration of payment systems (Payme, Click) for subscription management.

3. Architecture and Optimization

Highload: Designing a system with horizontal scaling capabilities.

AEC & Noise Suppression: Software echo and noise suppression for quality communication in any environment.

Traffic Localization: Optimization of routing for operation within the TAS-IX network.

Candidate Requirements

AI / ML Engineer:

Experience with End-to-end speech models (Moshi, AudioLM or analogs).

Fluent in PyTorch and experience with transformers.

Skills in model fine-tuning for new language groups.

Ability to work with CUDA 12.x and NVIDIA optimization libraries.

Fullstack Developer:

Expert knowledge of WebRTC / WebSockets for audio streaming.

Experience in developing Telegram Mini Apps (TMA).

Professional proficiency in FastAPI and React / Next.js.

Understanding of low-latency system specifics.

Payment to be agreed upon after discussion

Proposals 1

1 proposal concealed

Matthew Ts

Almaty (Alma-Ata)