Development of a microservice for audio processing (speech to text) with integration of our service via API
Create a separate microservice that processes voice messages from users:
🎙️ transforms audio to text (speech-to-text / STT),
🔊 converts text to speech (text-to-speech / TTS).
This microservice must operate independently from the main Optizium core, communicating via HTTP POST requests. All text messages and GPT responses will be processed by your main API.
⚙️ Architecture
🧱 Components:
Microservice (Python + FastAPI or Node.js + Express)
OpenAI Whisper API for STT
OpenAI TTS for voice synthesis
Interaction with Optizium through:
/api/chats/send— sending text to chat/api/chats/chat— retrieving history/api/integrations/integration— WebHook (optional)/api/leads/leads— processing contact forms (if needed)
📤 Data Transmission
🔽 Incoming request to the microservice (from the website frontend or mobile application):
1. Incoming audio (STT)
yamlCopyEditPOST /speech-to-text
Content-Type: multipart/form-data
Form-data:
- audio_file: .mp3/.ogg/.wav
- bot_id: string
- chat_room: string
2. Incoming text for speech synthesis (TTS)
pgsqlCopyEditPOST /text-to-speech
Content-Type: application/json
{
"text": "Your product is available",
"language": "uk-UA",
"voice": "female",
"bot_id": "...",
"chat_room": "..."
}
🔁 Microservice Behavior
🟡 STT:
Receives audio file
Recognizes text via OpenAI Whisper
Sends it to your API:
cssCopyEdit
POST /api/chats/send
headers: {Authorization, Content-Type}
body:
{
"bot_id": "...",
"chat_room": "...",
"author": "user",
"message": "recognized text"
}
🟢 TTS:
Receives text response from GPT (via your system)
Synthesizes it via TTS system
Returns
.mp3or URL to the file to the frontend
🔐 Security and Privacy
Use of HTTPS
API key is required on the request side (Basic or Bearer)
Audio files are deleted after processing
Do not store history on the microservice side (only transmission)
📦 Result
Expected endpoints:
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /speech-to-text | Speech recognition to text |
| POST | /text-to-speech | Voice synthesis from text |
| GET | /status | Service status (ping) |
🧪 Testing
Sending a test voice file → checking text in Optizium chat.
GPT response → synthesis → checking playback on the site.
Sending feedback form after voice request.
🧰 Technology Stack (recommended):
Python 3.11+,FastAPI,uvicornOpenAI Whisper API,gTTS/TTS by Coqui,Edge TTSpyduborffmpegfor audio processingDocker,Gunicorn(production build)ngrok/ HTTPS proxy (for local testing of WebHook)
🕐 Deadline:
3–5 working days
Client's review of cooperation with Oleksandr S.
Development of a microservice for audio processing (speech to text) with integration of our service via APIEverything is great, the project was completed on time and in full. Thank you for the work and I can recommend you to other clients!
Freelancer's review of cooperation with Ievhen Likhachev
Development of a microservice for audio processing (speech to text) with integration of our service via APIEverything is fine, the client treats the brief professionally and listens to advice, I hope to work together again.
-
1315 7 0 Good day.
I am ready to take on your project.
I can develop such an integration for you using no-code/low-code tools.
Message me privately, we will discuss all possible nuances and can start the implementation.
-
3082 9 0 Good day!
In a week (5 days) I will be able to create such a service on nodejs. But first, I need to take a closer look at your Optizium service.
Examples of work: https://github.com/axbuglak
Sincerely,
Buhlak Oleksiy
-
306 1 0 I will create an independent microservice in Python 3.11+ with FastAPI that will process users' voice messages. It will accept an audio file (formats .mp3/.ogg/.wav), convert it to text using the OpenAI Whisper API, and then send this text via an HTTP POST request to the main API /api/chats/send. For the reverse task (TTS), the microservice will accept text, convert it to speech using gTTS, Coqui TTS, or Edge TTS, generate an audio file, and return a link to it. Audio processing will be done using ffmpeg or pydub.
All interactions will occur through secure HTTPS requests with API key authorization. Audio files will not be stored — they will be deleted after processing. I will also implement /status to check the service availability. Testing will include 3 stages: STT (recognition), TTS (synthesis), and full connection with the main API.
For deployment, I will use Docker + Gunicorn, and for local testing, WebHook – ngrok.
-
1444 9 0 I have experience in creating microservices on FastAPI, worked with OpenAI, and worked with audio files.
I can implement a microservice that will fully comply with the described architecture: fast, secure, independent, and easily scalable. I am ready to discuss the details (authentication, deployment) in private messages. Please write, and we will discuss everything.
-
631 5 0 Good day
I am ready to complete your task
The actual execution time is 6-7 days
The cost is 12,000 UAH
-
Рассматриваете ли вы другие языки программирования?
-
Current freelance projects in the category AI & Machine Learning
AI Commenting Platform for TikTok and Instagram.Project Goal Develop a system that allows managing a large number of TikTok and Instagram accounts and automatically posting relevant comments under selected videos using AI. Main Functionality1. Account Management It is necessary to implement the ability to connect accounts:… AI & Machine Learning, Python ∙ 4 hours 48 minutes back ∙ 10 proposals |
AI agent for searching and analyzing a dataset of documents in the decisions register1. Context and Problem Target user: a specialist working with a large array of text documents and making decisions based on precedents. Essence of the problem: working with an open document registry takes excessively much time: search requires manual selection of keywords and… AI & Machine Learning ∙ 9 hours 24 minutes back ∙ 26 proposals |
Build a customer classification model1. There is client data in Mongo/SQL (approximately 20,000 entries with raw data). 2. It is necessary to build features and a classification model of clients into behavioral groups based on this data. 3. The project should be completed in Python. AI & Machine Learning, Python ∙ 1 day 23 hours back ∙ 32 proposals |
Integration of dental scanner modules into CRM
601 USD
We have developed a CRM system for interaction with dentists and laboratories. It is necessary to integrate services like iTero, Sirona, Medit, and others so that files are pulled automatically. AI & Machine Learning, Java ∙ 2 days back ∙ 27 proposals |
Create a team of AI agentsI want to create a team of AI agents that will help in everyday life, control business processes, analyze reports, etc. AI & Machine Learning ∙ 2 days 3 hours back ∙ 31 proposals |