Switch to English?
Yes
Переключитись на українську?
Так
Переключиться на русскую?
Да
Przełączyć się na polską?
Tak
Post your project for free and start receiving proposals from freelancers within minutes after publication!

Cataleya (Voice-to-Voice AI)

Translated2200 USD

  1. 148    1  1
    11 days2200 USD

    Hello! I am ready to complete this project and I have extensive experience in developing various applications.

  2. 1117    4  0
    20 days2500 USD

    Hello!

    Cataleya sounds exciting, and I also understand how challenging it is to achieve truly natural-sounding speech. I have worked with PyTorch-based models and real-time audio processing pipelines, and I can assist your team in meticulously refining latency, stability, and the entire process from microphone to GPU and to the speaker.

    I would start small and practical. First, I would profile the current English processing path from start to finish and record where time is spent on capture, token processing, output, and streaming. Then I would address the largest latencies one by one, ensuring easily verifiable changes and safe deployment on your 4090 clusters. For Uzbek, Kazakh, and Russian languages, I would help create a simple test set that includes regional speech patterns so that fine-tuning is based on real examples rather than just general estimates.

    Another simple but useful idea I can add is an internal representation of latency tracing for the team. This provides a brief analysis of each call to determine whether the slowdown is caused by WebRTC, the server, or the GPU. This significantly simplifies the current setup without complicating the task for users.

    https://storyai.cc
    https://oscarstories.com

    Thank you!

  3. 12784    4  2
    15 days2200 USD

    Hello,

    I am interested in participating in the Cataleya project and clearly understand the technical and architectural complexity of the task. I have hands-on experience with end-to-end speech and multimodal models, low-latency inference pipelines, and large-scale deployment on NVIDIA GPU clusters. I work confidently with PyTorch, Transformer-based architectures, CUDA optimization, quantization, and inference acceleration (including TensorRT-LLM and vLLM), as well as multilingual fine-tuning for non-English language groups.

    On the product and infrastructure side, I have experience building real-time audio systems using WebRTC and WebSockets, developing low-latency full-duplex interfaces, and integrating AI services into production environments via FastAPI. I also understand the specifics of Telegram Mini Apps, subscription logic, and payment integrations, and I approach system design with a strong focus on scalability, fault tolerance, and regional network optimization.

    I work as a product-minded engineer, comfortable with research, adaptation, and production delivery, and I am confident I can contribute to both the core S2S intelligence and the real-time application layer of Cataleya.

    Best regards,
    Jeo Vincent Carretas

  4. 1 proposal concealed

Client
Tulkin Said
Uzbekistan Ташкент
Project published
4 months 28 days back
122 views
Tags
  • webrtc
  • pytorch
  • telegram mini app
  • fastapi
  • TensorRT-LLM
  • NVIDIA RTX 4090
  • Moshi