Need an AI specialist: deployment of a fast local voice assistant (STT + Local LLM + TTS)

AI & Machine Learning, Python — incorrectly specified categories?

880 USD

Project: Web Panel for AI Outbound Calling with Dynamic Agent Configuration

Core Concept: Develop a fully functional web application to manage outbound calls powered by an AI agent. The system is based on a local LLM (Llama, Deepseek, Gemma) and must feature a configuration panel to tailor the agent's behavior per call (voice, language, prompt), a lead management module, and detailed call analytics.

Key Quality Requirements: Low latency under 800ms and natural, human-like speech with appropriate pacing and pauses.

Core Modules:

1. Agent Configuration Panel (Web UI)
Allows users to select the following before a call:
- Languages: EN, DE, ES, NL (determines available voices and transcription accuracy)
- STT Model: Choose transcription engine (Deepgram / Cartesia / Gemini)
- TTS Provider & Model: Choose synthesis backend (Cartesia / Deepgram / ElevenLabs)
- Voice Selection: Select specific voice to define tone and style
- Silence Timeout: Set delay before re-prompting/call end (Default 30s)
- First Message Mode: Toggle between Bot Speaks First or Wait for User
- Background Noise: Add ambient sound (office, call center) for realism
- Prompt & Context: Field for custom LLM prompts (full conversation flow) - Support for uploading example dialogues for few-shot learning + export for learning/feeding model

2. Lead & Call Management (Web UI)
- Upload and delete contact lists (CSV or manual entry)
- Real-time call controls in browser: Start, Pause, Stop
- Automatic call recording linked to each lead

3. Reporting & Analytics
Per call data includes:
- AI-generated call summary
- Call duration
- Full audio recording
- Translated transcript (English translation of the conversation)

4. Integrations & Telephony
- WebRTC calling direct from browser
- Integration with external SIP trunks (IP&IP SIP BASED) and Asterisk

5. Technical Requirements
- End-to-end latency must be 800ms or less
- Telegram notifications for call start, end, and results delivery
- Server recommendation and setup guidance to meet performance targets

Tech Stack Preferred:
- Backend: Python (FastAPI / Django / Flask)
- Frontend: React, Vue, or core HTML/JS
- AI:
- Local LLM as the core reasoning engine (Llama, Deepseek, Gemma) – developer must select and optimize the most suitable model for speed and quality.
- Cloud APIs for low-latency STT/TTS (Deepgram, Cartesia, Gemini, ElevenLabs) to ensure performance.

Ideal Candidate:
An experienced full-stack developer with expertise in orchestrating complex voice pipelines and the ability to rightly choose the most optimal, fastest, and most cost-effective models for each component (STT, local LLM, TTS) based on specific use cases and requirements.

Start: as soon as possible (ASAP)
Fixed budget: $1000 (motivated budget increase possible) с фул сорсами
Long-term Cooperation:
We are also considering candidates who would be available for paid ongoing support and future project enhancements after the initial MVP is delivered.

Please include in your proposal:

Links or descriptions of similar past work (AI calling, voice bots)
Confirmation that you can independently choose and justify LLM + STT + TTS
Deadline by which you can provide a working pipeline with latency ≤ 800ms

communication languages: UA RU EN

*The LLM names listed are just examples from my experience. If you know better, faster, or cheaper solutions for this task, feel free to suggest them. We're looking for a motivated candidate for long-term collaboration with appropriate financial reward.

Proposals 24 Withdrawn 2 Discussions 3

Taras Tarasovich 5 March

Бомба проект , складнощів мульйон .. Але 800 мс. затримки то мабуть дуже занадто -
11Лабс - затримка при синтезу не менше чим 200 мс. (по моїм тестам 1 сек)
СТТ - не менше 0.3 сек. причому не віспер - це точно .
ЛЛМ - 0.5 сек. плюс (невеличка моделька якась)
але зате - на цпу І 4-6 ЯДЕР НА один поток.
але ще ж якщо локальний ТТС- то живої мови не отримаете (можна але затримки зростуть в рази)

Davlat Rebrakov 6 March

нереальный проект, закройте пока вайбкодеры не пришли) вам же лучше будет, лапши навешают что все реально

Taras Tarasovich 6 March

ну - буде дуже цікаво побачити хочаб приблизний результат цього дійства

Add Comment

Oleh Mirishko
San Jose, United States

Projects -
Rating -
Rating 130

Need an AI specialist: deployment of a fast local voice assistant (STT + Local LLM + TTS)

Proposals concealed

Proposals are currently absent

Proposals concealed

Proposals concealed

Current freelance projects in the category AI & Machine Learning

Improvement of Instagram Direct bot — AI logic instead of scripted bot (+ optionally auto-posting Reels)

Automation of content publication on social media through a Telegram bot using AI

Automation of sending KP messages on LinkedIn, WhatsApp, Reddit

Set up an AI bot in ManyChat for Instagram and Facebook Messenger

AI Model