VocaLIFT
Voice-first iOS workout tracker
Voice-first iOS workout tracker. Tap mic, speak, see structured data. Native iOS app that replaces gym logging with voice. Speak naturally ("plate and a 10 each side for 8") and WhisperKit transcribes on-device while GPT-4o-mini structures the utterance into exercise, weight, and reps.
In Development. iOS 17+, Swift 6, Xcode 26.3. Epics 1–7 shipped (voice pipeline, offline queue, PR detection, home, paywall, profile). Private repo.
What it is
Replaces manual set entry in legacy lifting apps (Strong, Hevy) with a voice loop tuned for between-set logging under load. On-device ASR means zero per-transcription cost, zero audio leaves the phone, and no network dependency for the hot path. LLM structuring runs server-side through a Supabase Edge Function with a strict JSON schema and deterministic prompting.
By the numbers
| Metric | Value |
|---|---|
| Transcription cost | $0 per set (on-device) |
| Structuring cost | ~$0.005 per workout |
| Target gross margin | 95%+ on paid tier |
| Auto-confirm window | under 5 seconds |
| Offline resilience | 80%+ |
| Edge Function timeout | 15 seconds |
| Epics shipped | 7 (voice, offline, PR, home, paywall, profile, sync) |
| Audio uploaded | zero bytes (text transcript only) |
Architecture
[Mic tap]
|
v
AVAudioEngine --> AudioBufferProcessor --> WhisperKit (on-device)
|
v transcript
VoiceTranscriptionService
|
v
VoiceLogPipelineManager
|
+---------------------------+--------------------+
v v v
ExerciseHistoryService EdgeFunctionClient AutoConfirmEngine
("last time" context) | |
v v
Supabase Edge Fn: structure-workout commit
(GPT-4o-mini + JSON schema) |
| v
v SwiftData
StructureResponse |
v
PRDetectionService
|
v
SyncService / OfflineQueueProcessor
|
v
Supabase Postgres (reconciliation)Data flow guarantees
- Local write always precedes remote write. Failed sync enqueues; queue replays in submission order.
- Edge Function returns
{ error, retryable, message }with explicit retryability. Client distinguishes transient from terminal failure. - Response schema is versioned (
CURRENT_SCHEMA_VERSION) so the client can refuse unknown shapes. - Idempotency via client-generated IDs carried through sync DTOs.
Key features
- On-device ASR — WhisperKit (argmaxinc/WhisperKit 0.9+) runs Whisper models natively on the Neural Engine. No audio uploaded or stored.
- GPT-4o-mini structuring — Supabase Edge Function
(
supabase/functions/structure-workout) wraps OpenAI with a typed schema, 15s timeout, and versioned response contract. - Auto-confirm ring under 5s —
AutoConfirmEnginecommits a structured set if the user doesn't intervene inside the window. - AI plate math — Model resolves gym vernacular ("plate and a 10 each side") to barbell loading with correct total weight.
- Exercise-specific "last time" context —
ExerciseHistoryServicefeeds the previous session's reps/weight for the current exercise into the prompt and UI for carry-forward prompting. - PR detection with shareable graphics —
PRDetectionServicecomputes 1RM and volume PRs on commit;ShareImageRendererproduces social-ready cards. - Progressive overload tracking — Per-exercise charts, PR accordion, progression history.
- Offline queue —
OfflineQueueProcessor+SyncServicemaintain an ordered, idempotent sync queue with conflict-safe replay when network returns. - SwiftData local-first persistence —
Workout,Exercise,ExerciseSetmodels are the source of truth; server sync is reconciliation, not dependency. - Audio interruption handling — Music ducking, call/interruption recovery, graceful mic-session teardown.
- RevenueCat paywall + A/B testing — Free trial, subscription management, restoration, remote offering variants.
- Design system — Explicit tokens (color/typography/spacing/animation), reduced-motion aware modifiers, haptic engine.
- Test coverage — Unit, integration, and E2E targets covering voice pipeline, auth sync, offline queue, PR detection, inline editing, error recovery, share flow.
What makes it stand out
- Zero audio ever leaves the device. On-device Whisper on the Neural Engine means microphone input stays local; only the text transcript crosses the wire.
- $0 transcription cost, ~$0.005 structuring cost per workout — a 95%+ gross margin target without compromising privacy.
- Local-first, not cloud-dependent — SwiftData is the source of truth. Sync reconciles; it doesn't gate logging.
- Versioned Edge Function contract — the client refuses unknown response shapes, so server rollouts can't silently corrupt local state.
Stack
| Layer | Technology |
|---|---|
| UI | SwiftUI, Swift 6, iOS 17+ |
| Persistence | SwiftData (local-first) |
| ASR | WhisperKit (on-device, Neural Engine) |
| LLM | GPT-4o-mini via Supabase Edge Function (Deno) |
| Backend | Supabase (Postgres, Auth, Edge Functions) |
| Monetization | RevenueCat (paywall, A/B, restoration) |
| Analytics | PostHog |
| Audio | AVFoundation, AVAudioEngine, ObjC exception bridging |
| Lint / format | SwiftLint (pre-build phase) |
| Project gen | XcodeGen (project.yml) |