AI Resume Analysis: Voice Interview Module

Design and API implementation notes for the voice interview module in the interview-guide project

VoiceInterview Module Design and Implementation

This note records how I implemented the VoiceInterview module in the interview-guide project. The core goal is to make voice interviews deliver a complete experience of real-time interaction, resumable sessions, and traceable evaluation.

Module Capability Overview

  • Real-time voice interaction: built on WebSocket + Qwen3 Voice Model (shared API key for ASR/TTS/LLM).
  • Streaming experience optimization: sentence-level concurrent TTS, generation/synthesis/playback in parallel, first-packet latency around 200ms.
  • Server-side VAD: automatic segmentation with real-time subtitles (including intermediate results).
  • Echo protection: supports manual submission to avoid AI playback being captured as user input.
  • Session continuity: supports pause/resume and multi-turn context memory, with auto-pause on timeout.
  • Observability metrics: Micrometer metrics for TTS/ASR latency, session duration, etc.

State Transitions

Key API Design

POST /api/voice-interview/sessions Create Voice Interview Session

Controller entry:

VoiceInterviewController.createSession(@Valid @RequestBody CreateSessionRequest request)

Core call chain:

voiceInterviewService.createSession(request);

Implementation highlights:

  • Fallback skillId (use default skill when missing).
  • Fallback llmProvider (use default provider when empty).
  • Build VoiceInterviewSessionEntity (phase switches, difficulty, resume ID, JD text, planned duration, etc.).
  • Default userId = "default".
  • Set initial phase (the first enabled one in intro/tech/project/hr).
  • Persist to voice_interview_sessions and cache in Redis (with TTL).
  • Return SessionResponseDTO (session ID, status, phase, config, etc.).

GET /api/voice-interview/sessions/{sessionId} Get Session Detail by ID

Controller call:

voiceInterviewService.getSessionDTO(sessionId);

Implementation highlights:

  • Read Redis first, then DB fallback.
  • Build SessionResponseDTO when found.
  • Return unified error when not found: Session not found: {sessionId}.

POST /api/voice-interview/sessions/{sessionId}/end End Session and Trigger Async Evaluation

Controller call:

voiceInterviewService.endSession(sessionId.toString());

End + evaluation logic:

session.setEndTime(now);
session.setCurrentPhase(COMPLETED);
session.setStatus(COMPLETED);
session.setEvaluateStatus(PENDING);
sessionRepository.save(session);
voiceEvaluateStreamProducer.sendEvaluateTask(sessionId);
redisService.streamAdd(streamKey(), buildMessage(payload), AsyncTaskStreamConstants.STREAM_MAX_LEN);

Notes:

  • API returns Result.success() immediately without waiting for evaluation completion.
  • Frontend polls GET /api/voice-interview/sessions/{sessionId}/evaluation for progress.

PUT /api/voice-interview/sessions/{sessionId}/pause Pause Session

Core call:

voiceInterviewService.pauseSession(sessionId.toString(), reason);

Implementation highlights:

  • Only IN_PROGRESS sessions can be paused.
  • Set status to PAUSED, record reason, update updatedAt.
  • Persist DB and sync Redis cache.

PUT /api/voice-interview/sessions/{sessionId}/resume Resume Session

Core call:

voiceInterviewService.resumeSession(sessionId.toString());

Implementation highlights:

  • Only PAUSED sessions can be resumed.
  • After resume, status becomes IN_PROGRESS without resetting phase/progress.
  • Persist DB, sync Redis, and return latest SessionResponseDTO.

GET /api/voice-interview/sessions Get Session List (Filter by userId/status)

Call chain:

voiceInterviewService.getAllSessions(userId, status);
sessionRepository.findByUserIdAndStatusOrderByUpdatedAtDesc(userId, statusEnum);

Return:

  • Result<List<SessionMetaDTO>>

DELETE /api/voice-interview/sessions/{sessionId} Delete Voice Interview Session

Call chain:

voiceInterviewService.deleteSession(sessionId);

Implementation highlights:

  • Validate session existence.
  • Delete session and related data (messages/evaluation, depending on repository implementation).
  • Clear Redis cache.

GET /api/voice-interview/sessions/{sessionId}/messages Get Conversation History

Call chain:

voiceInterviewService.getConversationHistoryDTO(sessionId);

Return:

  • Result<List<VoiceInterviewMessageDTO>>

GET /api/voice-interview/sessions/{sessionId}/evaluation Get Async Evaluation Status and Result

Implementation highlights:

  • Validate session first (throw VOICE_SESSION_NOT_FOUND if missing).
  • Read evaluateStatus and evaluateError.
  • If status is COMPLETED, load evaluation details:
evaluationService.getEvaluation(sessionId);
  • Return VoiceEvaluationStatusDTO (includes status and result when completed).

POST /api/voice-interview/sessions/{sessionId}/evaluation Manually Trigger Async Evaluation

Processing logic:

voiceInterviewService.getSession(sessionId);
evaluationService.getEvaluation(sessionId);
voiceInterviewService.triggerEvaluation(sessionId);

Rules:

  • If already COMPLETED: return existing evaluation result directly.
  • If PENDING/PROCESSING: return current status without duplicate triggering.
  • For other triggerable states: enqueue evaluation task and return PENDING, then frontend continues polling.

Summary

The key value of the VoiceInterview module is not just making voice interaction work, but making the entire real-time pipeline and session lifecycle robustly connected. For me, only when the full chain (create, pause, resume, end, evaluate) works reliably can voice interviews become a truly evolvable product capability.