Freelance projects

Freelance projects

AI Voice Cloning Real-Time

AI & Machine Learning — incorrectly specified categories?

Project translated automatically. Log in or register, to view the original

Real-Time Voice Changing Application

What it does: changes the user's voice on the fly — what you say into the microphone is heard by the interlocutor as a different voice. The target voice is set by a short audio sample file (1-5 minutes).

How it works from the user's perspective

Launched the application on their computer
Uploaded a voice sample (.wav file) they want to imitate
Selected input and output devices
Clicked "Start"
Speaks into the microphone → after ~0.3-0.5 seconds hears their own voice, but sounding like the sample
Can be used in Discord, Zoom, OBS — via a virtual audio cable

What should be in the interface

Device selection (microphone / headphones / virtual audio cable)
Upload / select voice sample
Voice model training
"Start / Stop" button
Indicators: microphone level, current latency, network status
Quality settings (faster / prettier)

Technical requirements

Latency from microphone to ear — target ≤ 400 ms
Voice quality — recognizable, without artifacts on normal speech
Works on Windows client, server part — separate machine with GPU
Should be compiled into a single .exe for distribution

Proposals 15 Discussions 4

Oleg Grigoryev

32 0

Budget: 27000 UAH Deadline: 45 days

The benchmark for the first working MVP is 320,000 UAH and about 45 days. I would include in this estimate a Windows client, selection of audio devices, sample upload, streaming processing via a server with GPU, faster - better modes, assembly into a single .exe, and measurement of actual latency. The goal of 400 ms is realistic only after testing the model, network, and audio drivers, so we can start with a short engineering prototype.

An important point - we only work with voices where there are rights for use and consent from the owner. For such a product, I would add scenario limitations, logging, and clear labeling, because otherwise, the risk is not technical, but legal and reputational. Look, there’s a nuance - the devil is not in the interface, but in the latency and artifacts =)

> On implementation
>> Windows application for microphone, output, and virtual audio cable
>> separate GPU service for real-time voice conversion
>> level, latency, and network status indicators
>> quality modes, test profiles, and packaging into .exe

> Questions
>> Is there already a GPU server or does it need to be selected and configured?
>> Is an MVP needed on ready-made models or an industrial product level with tests on different microphones, networks, and voices?

> Similar works by Ingello
>> https://business.ingello.com/tts - close in voice technologies and speech processing
>> https://business.ingello.com/fractal - close in complex AI architecture and automation
>> https://systems-fl.ingello.com - Ingello Systems profile for such systems

!!If the goal is public distribution, it’s better to start with a prototype and a technical audit of latency, rather than promising quality at random!!-

Maksym Merkuriev

0 0

Projects -
Rating -
Rating 142

Budget: 3000 UAH Deadline: 2 days

I can do this for 3k with the help of vibe coding. I have already done something similar. The requirements are that you have a powerful graphics card or money for cloud AI.

Daria Kratofil

0 0

Projects -
Rating -
Rating 196

Budget: 27000 UAH Deadline: 45 days

we already have a nearly ready architecture for such a voice AI product, it can be quickly adapted and launched for a Windows client, GPU server, and virtual audio cable
we are in touch, we can discuss the details here on the platform

the estimate for the first working stage is 260,000 UAH and about 45 days

We can keep the start simple - I would go through a technical prototype with measurable latency, and then improve the voice quality
the goal of 0.3-0.5 seconds is achievable only with careful stream processing, buffer tuning, model, and network

- I will clarify 2 points
-- do we need a recognizable voice of a specific person or is a change in timbre and speech manner sufficient
-- is the GPU server already available or does it need to be selected and deployed along with the solution

- what we will include in the first stage
-- Windows application with microphone selection, output, and virtual cable
-- uploading a wav sample and preparing a voice profile
-- streaming audio to the GPU server
-- real-time voice transformation
-- start, stop, level indicator, latency, and connection status
-- packaging into a single .exe for test distribution

- similar cases Ingello
-- https://business.ingello.com/tts - AI voice and speech solutions
-- https://business.ingello.com/fractal - server architecture for complex AI processes
-- https://business.ingello.com/vorfahr - strong example of a product with automation and integrations

main landing for freelancehunt - https://systems-fl.ingello.com

it seems that first of all, we should test the prototype on 1-2 target voices in real Discord or OBS
here !!low latency is more important than a beautiful demo picture!! - the hardware will show the truth better than the presentation ))-

Matvii Marchenko

20 0

Projects 20
Rating -
Rating 2 077

Budget: 26000 UAH Deadline: 22 days

I understood the technical specifications: Windows application, real-time voice conversion (microphone → target voice → virtual audio cable), target latency ≤400ms, server-side on GPU. Sample target voice — one file 1-5 minutes. .exe for distribution, UI with device selection, model training, level and latency indicators.

Stack as I see it.

Voice model. For real-time voice conversion with 400ms latency and quality without artifacts, the best option in 2026 is RVC (Retrieval-based Voice Conversion) or its evolution Seed-VC. RVC is trained on short samples, supports real-time inference on GPU 12GB+. An alternative is F5-TTS or OpenVoice v2 from MyShell for voice cloning (but they are more for batch generation, real-time with them is harder to keep within 400ms). RVC inference on RTX 3060/4060 gives a confident 200-300ms per chunk, which fits the budget.

Architecture. A thin Windows client (Python + Qt or C# WPF) captures the microphone via WASAPI/PyAudio, breaks it into chunks of 100-150ms, sends to the GPU server via WebSocket with low-latency options (ping-pong keepalive, no buffering). The server performs inference and returns the processed audio chunk. The client writes to the virtual audio cable (VB-Audio Virtual Cable as the standard for Windows). Latency budget: 30ms capture + 50ms network round-trip (if on the same network) + 200ms GPU inference + 30ms playback = ~310ms. If the server is remote (cloud GPU) — network round-trip can increase to 80-150ms, plus dependency on connection stability.

UI. Tkinter or PyQt5 for the Windows client (I have production experience with PyQt5 specifically for this class of tasks). Device selection — through pyaudio.list_devices() with Input/Output filter. Uploading sample voice, sending to the server, model training (training step synchronous or background). Start/Stop button. Indicators — microphone level (RMS), real-time latency (rolling average over the last 50 chunks), connection status.

Server. FastAPI or WebSocket server on aiohttp with the model loaded in memory, GPU-bound worker queue. If you plan many simultaneous users — a load balancer and several GPU instances are needed, but for MVP one machine with RTX 3090 or 4090 can handle ~5-10 simultaneous users.

Building into .exe — PyInstaller with bundled dependencies, or Nuitka for production-grade compilation. I have experience with PyInstaller on desktop projects, .exe builds reliably.

Honestly: real-time voice conversion at this latency is a niche ML task, I haven't done such in production. I have strong backend, ASR/TTS experience (Whisper,

Nikita Rumyantsev

5 1

Budget: 16000 UAH Deadline: 14 days

Hi, write to me in private messages. I think I can handle it, I've done something similar, but I need a more detailed technical specification. I will outline how many tokens it will take, etc.

Ivan Danyleiko

20 0

Budget: 25000 UAH Deadline: 6 days

Hello. A year ago, I created a similar solution for Windows in .exe format for real-time voice conversion. I have working developments; now I need to update the packages, adapt them to your requirements, and test the connection between the Windows client and the GPU server. I believe I can quickly bring this to MVP.

Rumzik Matvey

15 0

Budget: 27000 UAH Deadline: 7 days

Good afternoon.
I am currently working on TTS systems like Cartesian/Inword and local LLMs such as XTTS-v2 (Coqui).
It's not as simple as it seems; TTS is one thing, and STT is another, and a unified solution doesn't always yield acceptable results. Sometimes the TTS is poor, or the STT latency is unsuitable, or the recognition quality doesn't meet your goals. To achieve your target of 400ms, some adjustments are necessary. Basically, I am currently focused on this and trying to achieve a latency of at least 1 second.
I am a senior developer, working on an hourly rate of 30 euros/hour for this task.
It's hard to say how long the core will take; it could be 10 hours or even 40 hours, plus a wrapper for Windows.
If this suits you, my rate is acceptable for you - welcome. I always deliver quality work.
If we communicate, I will provide a more accurate cost estimate for such a project.

Andrii Y.

0 0

Projects -
Rating -
Rating 180

Budget: 27000 UAH Deadline: 50 days

We have experience in developing AI/audio realtime solutions, including work with voice conversion, streaming audio, GPU inference, and low-latency sound processing.

We understand the specifics of the realtime voice changing task:
— capturing and processing audio streams;
— voice cloning from a short sample;
— minimizing latency;
— integration with Discord / Zoom / OBS via virtual audio devices;
— building a desktop application for Windows in .exe.

We can implement:
• desktop client;
• server GPU component;
• voice conversion pipeline;
• training/fine-tuning of the voice model;
• realtime streaming;
• quality/latency settings;
• UI/UX interface of the application.

We have worked with the AI audio stack:
RVC, XTTS, So-VITS-SVC, Whisper, PyTorch, WebRTC, CUDA, realtime audio pipelines.

We pay special attention to:
— stability of realtime operation;
— voice quality without strong artifacts;
— optimization for regular PCs;
— architecture for further scaling.

We are ready to discuss the stack, architecture, and showcase relevant experience.

Sincerely, Benefit Studio

Ganna K.

1 0

Projects -
Rating -
Rating 556

Budget: 11111 UAH Deadline: 30 days

Hello! I am implementing real-time voice conversion with low latency and a client (Windows) + server with GPU inference.

I have experience with AI integrations and real-time systems (WebRTC/streaming/low-latency processing), so I can design the architecture for this case.

Architecture:

* Windows desktop client (UI + audio stream)
* Virtual audio driver / loopback (VB-Cable or similar)
* Backend server with GPU (model inference)
* Streaming via WebSocket / gRPC
* Buffering for latency ≤ 300–400ms

ML part:

* voice conversion model (RVC / so-vits-svc / similar)
* loading reference voice (1–5 minutes)
* caching voice embeddings
* optimization for real-time inference

Client:

* selection of input/output devices
* loading voice sample
* start/stop streaming button
* latency/load/audio level indicator
* integration with Discord / Zoom via virtual audio device

Work stages:

1. Architecture + pipeline prototype
— checking pipeline latency, selecting model
Deadline: 5 days
Cost: 400 USD

2. Backend GPU inference
— real-time voice conversion API
— latency optimization
Deadline: 10 days
Cost: 800 USD

3. Windows client
— UI + audio routing + stream management
Deadline: 8 days
Cost: 700 USD

4. Integration + testing
— stability, latency tuning, packaging into .exe
Deadline: 5 days
Cost: 400 USD

Total duration: 4 weeks
Budget: 2300 USD (MVP → stable version)

Important: the key risk here is the latency and stability of the real-time model. Therefore, I will first create a pipeline prototype to confirm the achievable latency, and only then we will finalize the client.

Andrii Y.

1 1

Projects -
Rating -
Rating 246

Budget: 2500 UAH Deadline: 2 days

Good day, I am ready to take on the project, I have experience in creating similar work.

The list does not show proposals concealed by the client or freelancer with a Plus profile, as well as proposals violating rules

Nikita Rumyantsev 26 May

Есть же аналоги уже , создание подобного очень дорого выйдет

Nikita Rumyantsev 28 May

Можем плюс-минус подсчитать, сколько выйдет затрат на токены и т.д.

Pavlo B. 31 May

Нужно вручную.

Yevhen Melnik 5 June

Есть кейсы, где спич, направление или продукт являются конфиденциальными, и требуют своей сборки на своих серверах друг)

Add Comment

Odd Man
Kyiv, Ukraine

Projects -
Rating -
Rating 20

AI Voice Cloning Real-Time

Real-Time Voice Changing Application

How it works from the user's perspective

What should be in the interface

Technical requirements

Oleg Grigoryev

Maksym Merkuriev

Daria Kratofil

Matvii Marchenko

Nikita Rumyantsev

Ivan Danyleiko

Rumzik Matvey

Andrii Y.

Ganna K.

Andrii Y.

Proposals are currently absent

Current freelance projects in the category AI & Machine Learning

Processing and transforming a large volume of text

Web3 Game project

Creation of AI for searching interested B2B companies for grants

AI service for competitor analysis

Test the operation of Claude Code via CLI now.