This microservice must operate independently from the main Optizium core, communicating via HTTP POST requests. All text messages and GPT responses will be processed by your main API.

⚙️ Architecture

🧱 Components:

Microservice (Python + FastAPI or Node.js + Express)
OpenAI Whisper API for STT
OpenAI TTS for voice synthesis
Interaction with Optizium through:
- /api/chats/send — sending text to chat
- /api/chats/chat — retrieving history
- /api/integrations/integration — WebHook (optional)
- /api/leads/leads — processing contact forms (if needed)

📤 Data Transmission

🔽 Incoming request to the microservice (from the website frontend or mobile application):

1. Incoming audio (STT)

yamlCopyEditPOST /speech-to-text
Content-Type: multipart/form-data

Form-data:
- audio_file: .mp3/.ogg/.wav
- bot_id: string
- chat_room: string

2. Incoming text for speech synthesis (TTS)

pgsqlCopyEditPOST /text-to-speech
Content-Type: application/json

{
  "text": "Your product is available",
  "language": "uk-UA",
  "voice": "female",
  "bot_id": "...",
  "chat_room": "..."
}

🔁 Microservice Behavior

🟡 STT:

Receives audio file
Recognizes text via OpenAI Whisper

Sends it to your API:

cssCopyEditPOST /api/chats/send
headers: {Authorization, Content-Type}
body:
{
  "bot_id": "...",
  "chat_room": "...",
  "author": "user",
  "message": "recognized text"
}

🟢 TTS:

Receives text response from GPT (via your system)
Synthesizes it via TTS system
Returns .mp3 or URL to the file to the frontend

🔐 Security and Privacy

Use of HTTPS
API key is required on the request side (Basic or Bearer)
Audio files are deleted after processing
Do not store history on the microservice side (only transmission)

📦 Result

Expected endpoints:

Method	Endpoint	Purpose
POST	`/speech-to-text`	Speech recognition to text
POST	`/text-to-speech`	Voice synthesis from text
GET	`/status`	Service status (ping)

🧪 Testing

Sending a test voice file → checking text in Optizium chat.
GPT response → synthesis → checking playback on the site.
Sending feedback form after voice request.

🧰 Technology Stack (recommended):

Python 3.11+, FastAPI, uvicorn
OpenAI Whisper API, gTTS / TTS by Coqui, Edge TTS
pydub or ffmpeg for audio processing
Docker, Gunicorn (production build)
ngrok / HTTPS proxy (for local testing of WebHook)

🕐 Deadline:

3–5 working days

Proposals 6 Rejected 3 Discussions 1

Good day.
I am ready to take on your project.
I can develop such an integration for you using no-code/low-code tools.
Message me privately, we will discuss all possible nuances and can start the implementation.

Oleksii Buglak

9 0

Budget: 10000 UAH Deadline: 5 days

Good day!

In a week (5 days) I will be able to create such a service on nodejs. But first, I need to take a closer look at your Optizium service.
Examples of work: https://github.com/axbuglak

Sincerely,
Buhlak Oleksiy

Vasil Savchuk

1 0

Projects -
Rating -
Rating 426

Budget: 5000 UAH Deadline: 4 days

I will create an independent microservice in Python 3.11+ with FastAPI that will process users' voice messages. It will accept an audio file (formats .mp3/.ogg/.wav), convert it to text using the OpenAI Whisper API, and then send this text via an HTTP POST request to the main API /api/chats/send. For the reverse task (TTS), the microservice will accept text, convert it to speech using gTTS, Coqui TTS, or Edge TTS, generate an audio file, and return a link to it. Audio processing will be done using ffmpeg or pydub.

All interactions will occur through secure HTTPS requests with API key authorization. Audio files will not be stored — they will be deleted after processing. I will also implement /status to check the service availability. Testing will include 3 stages: STT (recognition), TTS (synthesis), and full connection with the main API.

For deployment, I will use Docker + Gunicorn, and for local testing, WebHook – ngrok.

Oleksandr S.

Winning proposal

9 0

Budget: 5000 UAH Deadline: 5 days

I have experience in creating microservices on FastAPI, worked with OpenAI, and worked with audio files.
I can implement a microservice that will fully comply with the described architecture: fast, secure, independent, and easily scalable. I am ready to discuss the details (authentication, deployment) in private messages. Please write, and we will discuss everything.

Yelena Druzenko

5 0

Budget: 12000 UAH Deadline: 7 days

Good day
I am ready to complete your task
The actual execution time is 6-7 days
The cost is 12,000 UAH

Ievhen Likhachev
Odessa, Ukraine

Projects 63
Rating 5.0
Rating 6 692

Development of a microservice for audio processing (speech to text) with integration of our service via API

Client's review of cooperation with Oleksandr S.

Development of a microservice for audio processing (speech to text) with integration of our service via API

5.0

Ievhen Likhachev

Freelancer's review of cooperation with Ievhen Likhachev

Development of a microservice for audio processing (speech to text) with integration of our service via API

5.0

Oleksandr S.

⚙️ Architecture

🧱 Components:

📤 Data Transmission

🔽 Incoming request to the microservice (from the website frontend or mobile application):

1. Incoming audio (STT)

2. Incoming text for speech synthesis (TTS)

🔁 Microservice Behavior

🟡 STT:

🟢 TTS:

🔐 Security and Privacy

📦 Result

Expected endpoints:

🧪 Testing

🧰 Technology Stack (recommended):

🕐 Deadline:

Yaroslav Stopin

Oleksii Buglak

Vasil Savchuk

Oleksandr S.

Proposals concealed

Proposals are currently absent

Yelena Druzenko

Proposals concealed

Current freelance projects in the category AI & Machine Learning

AI automation of telephony Binotel and chat

MATLAB and machine learning for image analysis

Multi-agent system

Counting finished products and people involved in the process based on the YOLO model.

AI assistants and aides in business and personal life