An automated pipeline for transcribing calls and performing sentiment analysis. Azure Speech streams real-time transcriptions, then an LLM analyzes the call and generates a report on agent performance.
How it works
- Input. Upload a .wav file or stream live audio.
- Transcription. Azure Speech streams real-time partial and final results.
- Analysis. DeepSeek V3-2 performs call breakdown and identifies coaching opportunities.
- Output. JSON file with full transcript and structured feedback.
I used a detailed system prompt to control the LLM output, plus the max_tokens parameter to adjust report length. Caching prevents redundant API calls when the same file gets uploaded twice.
Demos
Terminal demo:
Streamlit interface with real-time transcription streaming:
What I learned
Pairing specialized Azure services with LLMs works well. Azure Speech handles transcription, the LLM adds analysis. Together they do more than either could alone.
Real-time streaming makes a big difference in user experience. Watching transcription appear word-by-word feels more responsive than waiting for batch processing. The caching was a practical addition that saved API costs on duplicate uploads.
Tech stack
- Python — pipeline orchestration
- Azure Speech Services — real-time transcription
- Azure AI Foundry (DeepSeek V3-2) — call analysis
- Streamlit — interactive UI