Built with AI. The code in this project was written by Claude Code.

An automated pipeline for transcribing calls and performing sentiment analysis. Azure Speech streams real-time transcriptions, then an LLM analyzes the call and generates a report on agent performance.

How it works

  • Input. Upload a .wav file or stream live audio.
  • Transcription. Azure Speech streams real-time partial and final results.
  • Analysis. DeepSeek V3-2 performs call breakdown and identifies coaching opportunities.
  • Output. JSON file with full transcript and structured feedback.

I used a detailed system prompt to control the LLM output, plus the max_tokens parameter to adjust report length. Caching prevents redundant API calls when the same file gets uploaded twice.

Demos

Terminal demo:

Streamlit interface with real-time transcription streaming:

What I learned

Pairing specialized Azure services with LLMs works well. Azure Speech handles transcription, the LLM adds analysis. Together they do more than either could alone.

Real-time streaming makes a big difference in user experience. Watching transcription appear word-by-word feels more responsive than waiting for batch processing. The caching was a practical addition that saved API costs on duplicate uploads.

Tech stack

  • Python — pipeline orchestration
  • Azure Speech Services — real-time transcription
  • Azure AI Foundry (DeepSeek V3-2) — call analysis
  • Streamlit — interactive UI