Techhalo · 360Voice Platform

Production architecture · voice360-prod-aks · Azure East US · Last updated June 2026

production release staging dev
Application services (AKS)
Managed SaaS (cloud providers)
In-cluster infrastructure
External vendors
CI/CD pipeline
Hardening backlog
System overview
ID card flow
CI/CD pipeline
DR status
Techhalo Voice360 System Architecture Three-tier architecture showing internet traffic, AKS application cluster, and data services Internet / DNS Azure Kubernetes Service · voice360-prod-aks · eastus · Standard_D4s_v3 × 3 (autoscale 3–6) ingress-nginx namespace: 360voice-prod namespace: kafka cert-manager Data services · Managed SaaS + AWS S3 👤 End users HTTPS / browser / SMS link ⚖ Azure Load Balancer 20.124.179.20 · port 443 TLS → ingress-nginx ⚙ GitHub Actions dev → release → main → production Trivy scan gate · Helm upgrade 📦 Azure ACR 360voicerelease / 360voiceprod Managed identity pull 📱 Twilio SMS ID card link delivery bic-sky.halo360.ai 🌐 DNS / Cloudflare All hosts → 20.124.179.20 Kafka ext → 4.156.62.101 NGINX Ingress TLS termination · host routing cert-manager v1.16.2 Let's Encrypt · auto-renew voice360 Python · port 8000 · 1 replica 360voice-release.techhalo.ai ⚠ LiveCallConsumer degraded bic-agent Angular/NGINX · port 4200 · 2 replicas halovoice-release.techhalo.ai ID cards: bic-sky.halo360.ai halo-task-manager-api .NET API · port 44301 · 1 replica api-halo-task-release.techhalo.ai Auth · Billing · Tasks · SMS Kafka (KRaft) bitnamilegacy · 1 broker · 32Gi :9092 internal · :9094 external ⚠ No HA · DR risk Kubernetes Secrets (plain — no Key Vault) voice360-env · bic-agent-taskmanager-env · halo-task-manager-api-env Helm releases voice360-prod (rev 3) bic-agent-taskmanager-voice360-prod (rev 8) halo-task-manager-api-prod (rev 5) Topics: turn_resolution.v1 call_trace.v1 + DLQs ClusterIP 10.0.244.91:80→8000 ClusterIP →:4200 ClusterIP →:44301 liveness: GET / :8000 liveness: GET / :4200 liveness: GET /health :44301 ×1 ×2 ×1 RabbitMQ CloudAMQP · SaaS toucan.lmq.cloudamqp.com:5671 ClickHouse Cloud GCP us-east1 · managed dbs: voice, analytics Postgres UpCloud managed · auto-snapshots halotasksapidb-release:11550 AWS S3 halovoiceaudiotranscripts us-east-1 · IAM key auth External SaaS vendors OpenAI · Stripe · PayPal · Twilio · ClickSend · Sentry · Recaptcha · SMTP · Slack · Jira · Google APIs All credentials in k8s Secrets (voice360-env / halo-task-manager-api-env) kafka-prod-new.techhalo.ai:9094 → LB 4.156.62.101 Nodes: 3× Standard_D4s_v3 (4vCPU/16GiB) · autoscale 3–6 · Azure CNI · Pod CIDR 10.244.0.0/16 ⚠ DR Hardening Backlog • Kafka: single broker, no replication • Cluster secrets: manual export only • No IaC / GitOps / HPA / multi-AZ Full cluster rebuild RTO: ~4 hours Traffic flow key HTTPS / service call Kafka event stream Async / data store
Client ID card delivery flow

When a client needs to download their ID card, halo-task-manager-api triggers a Twilio SMS with a unique URL. The client taps the link and lands on the bic-sky.halo360.ai frontend (served by bic-agent), which renders the ID card and allows download — no login required.

halo-task API

Generates unique ID card link. Calls Twilio API to send SMS.

Twilio SMS

Sends SMS to client's phone number with the link.

Client's phone

Receives SMS, taps the unique link in message.

DNS resolves

bic-sky.halo360.ai → Azure LB 20.124.179.20 → ingress-nginx

bic-agent serves

Angular app renders ID card. Client downloads PDF.

Download

ID card downloaded to client device. Flow complete.

Environment hostnames

Production

bic-sky.halo360.ai

live

Release

release-sky.halo360.ai

pre-prod

Staging

staging-sky.halo360.ai

test

Dev

dev-sky.halo360.ai

dev
⚠ Notes (from ARCHITECTURE.md)
No explicit documentation was provided for the bic-sky ID card flow — the above is inferred from the architecture. The appconfig.json baked into each bic-agent image points at the correct api-halo-task-* host per environment. The 0.1.27-r119 release image is correctly configured; the 0.1.27-r121 prod image ships staging URLs and requires a postStart hook or image swap to fix.
Branch promotion flow
dev
release
main
production ✓
Branch policy enforced by GitHub Actions
Build & push jobs (dev / release / main)
git push to dev/release/main
build-and-push · docker buildx · tag X.Y.Z-r<RUN>
trivy-scan · CRITICAL+HIGH gate · SARIF → GitHub Security
deploy · az aks get-credentials · kubectl create secret · helm upgrade --install --wait 5m
Production promotion (no rebuild)
push to production branch (from main)
resolve-prod-image · picks newest non-latest tag from 360voiceprod ACR
Trivy skipped (already scanned on main)
deploy to voice360-prod-aks · letsencrypt-prod · 360voice.techhalo.ai
Build once, promote many
Production always runs the exact same image bytes that were scanned and soak-tested on main. No re-build = no new vulnerabilities introduced at promote time.
⚠ CI/CD gaps (from DEPLOY.md)
Multi-repo coordination
bic-agent and halo-task live in separate repos with their own CI. No coordinated "prod release manifest" listing all three image tags shipped together.
No smoke test
aks.yml does helm upgrade --wait but no post-deploy curl against the public hostname. Broken cert/ingress won't surface automatically.
Deployment targets per branch
Branch ACR Cluster Namespace Public host
dev360voiceaplcommissions-aks360voice-dev360voice-dev.techhalo.ai
release360voicereleaseaplcommissions-aks360voice-release360voice-release.techhalo.ai
main360voiceprodaplcommissions-aks360voice-prod360voice.techhalo.ai (legacy)
production ✓360voiceprodvoice360-prod-aks360voice-prod360voice.techhalo.ai
RTO / RPO targets vs current reality
Component RTO target RPO target Status
Hardening backlog (priority order)

Kafka HA

Move from 1 broker to 3 (replicationFactor=3, min.insync=2). Or migrate to Azure Event Hubs.

P1 · data loss risk

Cluster GitOps

Terraform/Bicep for AKS + Flux/ArgoCD for Helm. Eliminates 4-hour manual rebuild.

P2 · ops risk

Secret backup

Automate export to Azure Blob / Key Vault. Currently manual export to OneDrive.

P3 · security

Multi-AZ

Spread nodepool across AZs 1/2/3. Min 2 replicas per service with anti-affinity.

P4 · availability

Observability

Metrics + alerting + on-call rotation. Currently Sentry + AKS Container Insights only.

P5 · ops