🎙️ Project Showcase: OpenVoice Flow

1/11/2026 at 1:31:50 PM • ~5 min read

For this blog post, I want to share a project I’ve been working on recently called OpenVoice Flow. It’s a cross-platform, open-source real-time voice transcription application.

OpenVoice Flow is now open-source! You can find the repository on GitHub.

The Origins

I’ve always wanted a fast, privacy-focused voice transcription app that just works. I tried things like Aqua Voice, but most solutions out there are either too expensive, don’t respect your privacy, or are just not developer-friendly. So I decided to build my own.

Built with Tauri 2 (Rust backend + Webview UI) and React + TypeScript frontend, it delivers instant, context-aware speech-to-text functionality with a focus on privacy, developer-friendliness, and extensibility.

The Features

Real-time transcription with low latency
Multi-provider support - Soniox (tested), OpenAI Whisper, Ollama (experimental)
Cross-platform - Runs on Windows, macOS, and Linux (Tauri powered)
Modular architecture - Easy to add new transcription providers
BYO-API-key model - Bring your own API keys, no vendor lock-in
Privacy-first - Data stays on your device, local key storage
Floating overlay - Live transcription feedback in a draggable bubble
History view - Access past transcriptions with audio and re-transcription capability
Personal dictionary - Custom word replacements, formatting rules, and instructions
Network diagnostics - Latency and stats per provider
Global hotkeys - Toggle or Push-to-Talk recording modes

The Stack

Frontend: React 18 + TypeScript + Vite
Backend: Rust (Tauri 2)
Audio: cpal for cross-platform audio capture
Database: SQLite for transcript storage
Build System: Vite for frontend, Cargo for Rust

The Technical Challenges

Building this wasn’t without its challenges. Here are some of the things I had to figure out:

Cross-platform audio capture - Different OSes handle audio differently. Used cpal to abstract platform differences, but still had to handle device enumeration, sample rate conversion (44.1kHz vs 16kHz), and channel handling (stereo to mono).
Real-time streaming - WebSocket connections can be flaky. Implemented a streaming worker pattern with separate threads for audio processing, control channels for start/stop signals, and reconnection logic with exponential backoff.
Global hotkeys - Need to work even when the app is in the background. Used tauri-plugin-global-shortcut which handles macOS (Carbon framework), Windows (RegisterHotKey API), and Linux (X11/Wayland protocols).
Overlay window management - The floating overlay needs to stay on top and be draggable. Separate Tauri window with alwaysOnTop, skipTaskbar, and decorations: false. Known limitation: doesn’t work correctly in macOS full-screen mode.
Provider abstraction - Different providers have different APIs. Created a TranscriptionProvider trait that all providers implement, making it easy to add new ones.

The Key Learnings

Working on this project taught me a lot:

Tauri 2 is powerful - Type-safe IPC, great plugin ecosystem, small bundle size, and native performance.
Rust’s ownership model prevents bugs - No null pointer dereferences, thread safety enforced at compile time, memory leaks are rare.
Async in Rust is powerful but complex - Need to understand Send and Sync bounds, channel types, and Arc<Mutex<T>> for shared state.
Audio processing is tricky - Sample rates matter, channels need conversion, VAD is crucial, and chunking is important.
TypeScript + React is great for UI - Type safety, component reusability, and simple state management with hooks.
Testing is hard - Need to mock audio devices and APIs for integration tests, but manual testing is still required.
Privacy is a competitive advantage - Users care about where their data goes. Local storage, no vendor lock-in, optional audio saving, and clear history.

The Current Status

OpenVoice Flow is currently in active development and not yet production-ready.

Known limitations:

macOS full-screen mode: Overlay doesn’t work correctly (macOS window management restriction)
Credentials: API keys stored in plain text (encryption planned)
Provider testing: Only Soniox is properly tested
Platform support: Only tested on macOS, Windows/Linux planned

The Future

Here are some things I’m planning to work on:

Windows and Linux testing and bug fixes
Encryption for API key storage
More transcription providers (Google, AWS, Azure)
Improved VAD for better silence detection
Export/import of transcripts
Plugin system for custom providers

The End

Building OpenVoice Flow has been a rewarding journey. The combination of Rust’s performance and safety, Tauri’s cross-platform capabilities, and React’s developer experience has proven to be an excellent stack for this type of application.

If you’re interested in trying it out or contributing, check out the GitHub repository. It’s MIT licensed, so feel free to use it for personal or commercial projects.

I hope you enjoyed this blog post and I will see you in the next one!

← Previous Post

🛞 Sim-Racing Steering Wheel

← Back to the blog