Development of a microservice for audio processing (speech to text) with integration of our service via API
Create a separate microservice that processes voice messages from users:
🎙️ transforms audio to text (speech-to-text / STT),
🔊 converts text to speech (text-to-speech / TTS).
This microservice must operate independently from the main Optizium core, communicating via HTTP POST requests. All text messages and GPT responses will be processed by your main API.
⚙️ Architecture
🧱 Components:
Microservice (Python + FastAPI or Node.js + Express)
OpenAI Whisper API for STT
OpenAI TTS for voice synthesis
Interaction with Optizium through:
/api/chats/send— sending text to chat/api/chats/chat— retrieving history/api/integrations/integration— WebHook (optional)/api/leads/leads— processing contact forms (if needed)
📤 Data Transmission
🔽 Incoming request to the microservice (from the website frontend or mobile application):
1. Incoming audio (STT)
yamlCopyEditPOST /speech-to-text
Content-Type: multipart/form-data
Form-data:
- audio_file: .mp3/.ogg/.wav
- bot_id: string
- chat_room: string
2. Incoming text for speech synthesis (TTS)
pgsqlCopyEditPOST /text-to-speech
Content-Type: application/json
{
"text": "Your product is available",
"language": "uk-UA",
"voice": "female",
"bot_id": "...",
"chat_room": "..."
}
🔁 Microservice Behavior
🟡 STT:
Receives audio file
Recognizes text via OpenAI Whisper
Sends it to your API:
cssCopyEdit
POST /api/chats/send
headers: {Authorization, Content-Type}
body:
{
"bot_id": "...",
"chat_room": "...",
"author": "user",
"message": "recognized text"
}
🟢 TTS:
Receives text response from GPT (via your system)
Synthesizes it via TTS system
Returns
.mp3or URL to the file to the frontend
🔐 Security and Privacy
Use of HTTPS
API key is required on the request side (Basic or Bearer)
Audio files are deleted after processing
Do not store history on the microservice side (only transmission)
📦 Result
Expected endpoints:
| Method | Endpoint | Purpose |
|---|---|---|
| POST | /speech-to-text | Speech recognition to text |
| POST | /text-to-speech | Voice synthesis from text |
| GET | /status | Service status (ping) |
🧪 Testing
Sending a test voice file → checking text in Optizium chat.
GPT response → synthesis → checking playback on the site.
Sending feedback form after voice request.
🧰 Technology Stack (recommended):
Python 3.11+,FastAPI,uvicornOpenAI Whisper API,gTTS/TTS by Coqui,Edge TTSpyduborffmpegfor audio processingDocker,Gunicorn(production build)ngrok/ HTTPS proxy (for local testing of WebHook)
🕐 Deadline:
3–5 working days
Client's review of cooperation with Oleksandr S.
Development of a microservice for audio processing (speech to text) with integration of our service via APIEverything is great, the project was completed on time and in full. Thank you for the work and I can recommend you to other clients!
Freelancer's review of cooperation with Ievhen Likhachev
Development of a microservice for audio processing (speech to text) with integration of our service via APIEverything is fine, the client treats the brief professionally and listens to advice, I hope to work together again.
-
1315 7 0 Good day.
I am ready to take on your project.
I can develop such an integration for you using no-code/low-code tools.
Message me privately, we will discuss all possible nuances and can start the implementation.
-
3082 9 0 Good day!
In a week (5 days) I will be able to create such a service on nodejs. But first, I need to take a closer look at your Optizium service.
Examples of work: https://github.com/axbuglak
Sincerely,
Buhlak Oleksiy
-
306 1 0 I will create an independent microservice in Python 3.11+ with FastAPI that will process users' voice messages. It will accept an audio file (formats .mp3/.ogg/.wav), convert it to text using the OpenAI Whisper API, and then send this text via an HTTP POST request to the main API /api/chats/send. For the reverse task (TTS), the microservice will accept text, convert it to speech using gTTS, Coqui TTS, or Edge TTS, generate an audio file, and return a link to it. Audio processing will be done using ffmpeg or pydub.
All interactions will occur through secure HTTPS requests with API key authorization. Audio files will not be stored — they will be deleted after processing. I will also implement /status to check the service availability. Testing will include 3 stages: STT (recognition), TTS (synthesis), and full connection with the main API.
For deployment, I will use Docker + Gunicorn, and for local testing, WebHook – ngrok.
-
1444 9 0 I have experience in creating microservices on FastAPI, worked with OpenAI, and worked with audio files.
I can implement a microservice that will fully comply with the described architecture: fast, secure, independent, and easily scalable. I am ready to discuss the details (authentication, deployment) in private messages. Please write, and we will discuss everything.
-
631 5 0 Good day
I am ready to complete your task
The actual execution time is 6-7 days
The cost is 12,000 UAH
-
Рассматриваете ли вы другие языки программирования?
-
Current freelance projects in the category AI & Machine Learning
Build a customer classification model1. There is client data in Mongo/SQL (approximately 20,000 entries with raw data). 2. It is necessary to build features and a classification model of clients into behavioral groups based on this data. 3. The project should be completed in Python. AI & Machine Learning, Python ∙ 1 day 11 hours back ∙ 29 proposals |
Integration of dental scanner modules into CRM
601 USD
We have developed a CRM system for interaction with dentists and laboratories. It is necessary to integrate services like iTero, Sirona, Medit, and others so that files are pulled automatically. AI & Machine Learning, Java ∙ 1 day 13 hours back ∙ 22 proposals |
Create a team of AI agentsI want to create a team of AI agents that will help in everyday life, control business processes, analyze reports, etc. AI & Machine Learning ∙ 1 day 15 hours back ∙ 28 proposals |
IT Automation of VAT Reporting
223 USD
It is necessary to develop a system for automating the transfer of sales data from the CRM to the accounting system Wafeq. The system should import bank and payment reports, automatically reconcile payments with invoices, generate invoices for VAT reporting, and minimize manual… AI & Machine Learning, Python ∙ 1 day 17 hours back ∙ 39 proposals |
Development of a sales AI agent for an online store on PrestaShop 1.6 with KeyCRM integrationWe are looking for a developer or a small team to create an AI sales consultant for an online store of educational literature. The site runs on PrestaShop 1.6, CRM — KeyCRM. We need not an ordinary chatbot with ready-made answers, but an AI seller that will help the customer… AI & Machine Learning, Online Stores & E-commerce ∙ 1 day 22 hours back ∙ 39 proposals |