🎙️ Project Showcase: OpenVoice Flow
For this blog post, I want to share a project I’ve been working on recently called OpenVoice Flow. It’s a cross-platform, open-source real-time voice transcription application.
OpenVoice Flow is now open-source! You can find the repository on GitHub.
The Origins
I’ve always wanted a fast, privacy-focused voice transcription app that just works. I tried things like Aqua Voice, but most solutions out there are either too expensive, don’t respect your privacy, or are just not developer-friendly. So I decided to build my own.
Built with Tauri 2 (Rust backend + Webview UI) and React + TypeScript frontend, it delivers instant, context-aware speech-to-text functionality with a focus on privacy, developer-friendliness, and extensibility.
The Features
- Real-time transcription with low latency
- Multi-provider support - Soniox (tested), OpenAI Whisper, Ollama (experimental)
- Cross-platform - Runs on Windows, macOS, and Linux (Tauri powered)
- Modular architecture - Easy to add new transcription providers
- BYO-API-key model - Bring your own API keys, no vendor lock-in
- Privacy-first - Data stays on your device, local key storage
- Floating overlay - Live transcription feedback in a draggable bubble
- History view - Access past transcriptions with audio and re-transcription capability
- Personal dictionary - Custom word replacements, formatting rules, and instructions
- Network diagnostics - Latency and stats per provider
- Global hotkeys - Toggle or Push-to-Talk recording modes
The Stack
- Frontend: React 18 + TypeScript + Vite
- Backend: Rust (Tauri 2)
- Audio:
cpalfor cross-platform audio capture - Database: SQLite for transcript storage
- Build System: Vite for frontend, Cargo for Rust
The Technical Challenges
Building this wasn’t without its challenges. Here are some of the things I had to figure out:
-
Cross-platform audio capture - Different OSes handle audio differently. Used
cpalto abstract platform differences, but still had to handle device enumeration, sample rate conversion (44.1kHz vs 16kHz), and channel handling (stereo to mono). -
Real-time streaming - WebSocket connections can be flaky. Implemented a streaming worker pattern with separate threads for audio processing, control channels for start/stop signals, and reconnection logic with exponential backoff.
-
Global hotkeys - Need to work even when the app is in the background. Used
tauri-plugin-global-shortcutwhich handles macOS (Carbon framework), Windows (RegisterHotKey API), and Linux (X11/Wayland protocols). -
Overlay window management - The floating overlay needs to stay on top and be draggable. Separate Tauri window with
alwaysOnTop,skipTaskbar, anddecorations: false. Known limitation: doesn’t work correctly in macOS full-screen mode. -
Provider abstraction - Different providers have different APIs. Created a
TranscriptionProvidertrait that all providers implement, making it easy to add new ones.
The Key Learnings
Working on this project taught me a lot:
- Tauri 2 is powerful - Type-safe IPC, great plugin ecosystem, small bundle size, and native performance.
- Rust’s ownership model prevents bugs - No null pointer dereferences, thread safety enforced at compile time, memory leaks are rare.
- Async in Rust is powerful but complex - Need to understand
SendandSyncbounds, channel types, andArc<Mutex<T>>for shared state. - Audio processing is tricky - Sample rates matter, channels need conversion, VAD is crucial, and chunking is important.
- TypeScript + React is great for UI - Type safety, component reusability, and simple state management with hooks.
- Testing is hard - Need to mock audio devices and APIs for integration tests, but manual testing is still required.
- Privacy is a competitive advantage - Users care about where their data goes. Local storage, no vendor lock-in, optional audio saving, and clear history.
The Current Status
OpenVoice Flow is currently in active development and not yet production-ready.
Known limitations:
- macOS full-screen mode: Overlay doesn’t work correctly (macOS window management restriction)
- Credentials: API keys stored in plain text (encryption planned)
- Provider testing: Only Soniox is properly tested
- Platform support: Only tested on macOS, Windows/Linux planned
The Future
Here are some things I’m planning to work on:
- Windows and Linux testing and bug fixes
- Encryption for API key storage
- More transcription providers (Google, AWS, Azure)
- Improved VAD for better silence detection
- Export/import of transcripts
- Plugin system for custom providers
The End
Building OpenVoice Flow has been a rewarding journey. The combination of Rust’s performance and safety, Tauri’s cross-platform capabilities, and React’s developer experience has proven to be an excellent stack for this type of application.
If you’re interested in trying it out or contributing, check out the GitHub repository. It’s MIT licensed, so feel free to use it for personal or commercial projects.
I hope you enjoyed this blog post and I will see you in the next one!