AI voice assistant that lives on your desktop
A pure Rust desktop overlay with GPU-rendered visuals, bidirectional voice via Gemini Live, Groq-powered transcription, and a real-time agent dashboard.
maVoice is a floating AI voice assistant rendered directly on your desktop using GPU shaders. No Electron. No WebView. No browser. Just two transparent windows β an animated AI orb and a waveform strip β that sit on top of everything and respond to your voice in real-time.
It operates in two modes:
- Groq mode β Push-to-talk dictation. Record, transcribe via Groq Whisper, paste to clipboard.
- Gemini mode β Always-on bidirectional voice conversation via Gemini 2.0 Flash Live. The AI can search memory, run shell commands, delegate tasks to Claude, and remember things across sessions.
This repo contains two implementations β the original Tauri/React app and the newer pure Rust native overlay. Both live in the same repo and both work.
| mavoice-native (new) | src-tauri (original) | |
|---|---|---|
| Stack | Pure Rust, wgpu, WGSL shaders | Tauri 2 + React + TypeScript |
| Rendering | GPU shaders on transparent X11 windows | WebKitGTK floating widget |
| Voice | Gemini Live bidirectional + Groq STT | Groq STT only |
| Tools | search_memory, remember, run_command, ask_claude | β |
| Dashboard | WebSocket broadcast to claudegram | β |
| UI | AI orb + waveform strip (shader-rendered) | Floating button with settings panel |
| Size | ~5MB static binary | ~50MB (Tauri + WebKitGTK) |
The native version (mavoice-native/) is the active development target. The Tauri version (src-tauri/) remains in the repo as a fully functional alternative β useful if you want the settings UI, web-based configuration panel, or prefer the widget-style interface.
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β mavoice-native (pure Rust binary) β
β β
β ββββββββββββ ββββββββββββ ββββββββββββββββββββββ β
β β wgpu/WGSLβ β cpal β β Gemini Live (WS) β β
β β renderer β β audio β β bidirectional voiceβ β
β β 2 windowsβ β capture β β + function calling β β
β ββββββββββββ ββββββββββββ ββββββββββββββββββββββ β
β β
β ββββββββββββ ββββββββββββ ββββββββββββββββββββββ β
β β Groq API β β Global β β Dashboard WS β β
β β Whisper β β Hotkeys β β broadcast (3001) β β
β β STT β β F2 / F3 β β β claudegram UI β β
β ββββββββββββ ββββββββββββ ββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Tools: search_memory, remember, run_command, β β
β β ask_claude β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Two transparent always-on-top windows rendered with wgpu + WGSL shaders:
- AI Orb (96px) β Animated spiral sphere shader that reacts to voice state (idle pulse, speaking glow, thinking spin)
- Waveform Strip (64px) β Real-time audio level visualization at the bottom of the screen
Both windows use softbuffer for X11 transparency compositing. No toolkit, no DOM, no CSS β raw GPU pixels on a transparent surface.
Groq mode:
Mic (cpal) β WAV buffer β Groq Whisper API β clipboard (xclip) β xdotool paste
Gemini mode:
Mic (cpal) β PCM 16kHz β WebSocket β Gemini 2.0 Flash Live
β
β Audio response (PCM 24kHz)
β Function calls (tools)
β Text responses
When in Gemini mode, the AI has access to 4 function-calling tools:
| Tool | Description |
|---|---|
search_memory |
FTS5 search over the ShieldCortex memory database (SQLite) |
remember |
Save a new memory to the database for cross-session recall |
run_command |
Execute a shell command with 30s timeout, return stdout/stderr |
ask_claude |
Delegate a task to Claude Code CLI, return the response |
A WebSocket broadcast server on ws://localhost:3001 streams real-time events to the claudegram dashboard β a separate Next.js project with a glass-morphism UI that shows:
- Agent status cards (Claude, Gemini, Droid, Groq) with live state indicators
- Kanban board for tracking agent tasks across columns
- Action log with conversation bubbles, per-event copy, and session export
- Tool call timeline with elapsed timers
See the claudegram-dashboard repo for setup and usage.
- Rust 1.75+
- Linux with X11 (Wayland support planned)
- A Groq API key for transcription
- A Google AI API key for Gemini Live voice
sudo apt install -y \
build-essential pkg-config \
libasound2-dev \
xdotool xclip \
libx11-dev libxcb1-devgit clone https://github.com/lliWcWill/maVoice-Linux.git
cd maVoice-Linux/mavoice-native
# Build release binary
cargo build --release
# Install to ~/.local/bin
cp target/release/mavoice-native ~/.local/bin/
# Create config
mkdir -p ~/.config/mavoice
cat > ~/.config/mavoice/config.toml << 'EOF'
api_key = "gsk_your_groq_key_here"
gemini_api_key = "your_google_ai_key_here"
model = "whisper-large-v3-turbo"
language = "en"
mode = "gemini"
voice_name = "Aoede"
EOFmavoice-nativemkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/mavoice.service << 'EOF'
[Unit]
Description=maVoice β AI Voice Assistant Overlay
Documentation=https://github.com/lliWcWill/maVoice-Linux
After=graphical-session.target
[Service]
Type=simple
ExecStart=%h/.local/bin/mavoice-native
Restart=on-failure
RestartSec=3
Environment=DISPLAY=:0
Environment=RUST_LOG=info
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now mavoiceThe original Tauri version is a floating desktop widget with a React-based settings panel, model selection, and multi-language support.
- Node.js 18+
- Rust 1.70+
- Tauri 2 system dependencies (WebKitGTK, etc.)
git clone https://github.com/lliWcWill/maVoice-Linux.git
cd maVoice-Linux
./install.sh
# Add your Groq API key
echo "VITE_GROQ_API_KEY=your_groq_api_key_here" > src-tauri/aquavoice-frontend/.env
# Launch
npm run devTauri system dependencies (Debian/Ubuntu)
sudo apt install -y \
build-essential pkg-config libgtk-3-dev libwebkit2gtk-4.1-dev \
libsoup-3.0-dev libjavascriptcoregtk-4.1-dev libdbus-1-dev \
libappindicator3-dev librsvg2-dev libasound2-dev \
xdotool wl-clipboard wtypeWSL2 setup
WSL2 + WSLg works for the Tauri version. Update WSL2 from PowerShell:
wsl --update
wsl --version # Ensure version 2 with WSLgThen install dependencies inside your WSL2 distro and run normally.
- Double-click the floating widget to start recording
- Single-click to stop and transcribe
- Right-click or Ctrl+click to drag the widget
- Settings via the gear icon (model selection, language, custom prompts, temperature)
| Key | Action |
|---|---|
| F2 | Toggle Groq dictation (push-to-talk) |
| F3 | Toggle Gemini Live voice conversation |
- Press F2 to start recording
- Speak naturally
- Press F2 again to stop
- Transcription is copied to clipboard and pasted at cursor
- Press F3 to open a Gemini Live session
- Speak naturally β the AI responds with voice in real-time
- The AI can use tools (search memory, run commands, ask Claude)
- Press F3 again to end the session
- Supports barge-in (interrupt the AI mid-sentence)
Edit ~/.config/mavoice/config.toml:
api_key = "gsk_..." # Groq API key
gemini_api_key = "AI..." # Google AI API key
model = "whisper-large-v3-turbo" # Groq model
language = "en" # Transcription language
mode = "gemini" # Default mode: "groq" or "gemini"
voice_name = "Aoede" # Gemini voice: Puck, Charon, Kore, Fenrir, Aoede
system_instruction = "..." # Custom system prompt for Gemini
temperature = 0.0 # Groq transcription temperature
dictionary = "" # Custom terms for Groq| Component | Technology |
|---|---|
| Language | Rust (pure, no WebView) |
| GPU Rendering | wgpu + WGSL shaders |
| Window Management | winit + softbuffer (X11 transparency) |
| Audio Capture | cpal (ALSA) |
| Voice AI | Gemini 2.0 Flash Live (WebSocket) |
| Transcription | Groq Whisper Large v3 Turbo |
| Tool Execution | rusqlite, tokio::process, Claude CLI |
| Dashboard | tokio-tungstenite broadcast server |
| Hotkeys | global-hotkey crate |
| Clipboard | xclip, xdotool |
| Component | Technology |
|---|---|
| Framework | Tauri 2 |
| Frontend | React + TypeScript + Tailwind |
| Transcription | Groq Whisper (via groq-sdk) |
| Audio | Web Audio API |
maVoice-Linux/
βββ mavoice-native/ # β Pure Rust native overlay (active)
β βββ src/
β β βββ main.rs # Entry point, window creation
β β βββ app.rs # Event loop, state machine, dashboard
β β βββ renderer.rs # wgpu setup, shader pipeline
β β βββ shader.wgsl # Waveform strip shader
β β βββ ai_shader.wgsl # AI orb spiral sphere shader
β β βββ config.rs # TOML config loading
β β βββ dashboard.rs # WebSocket broadcast server
β β βββ state_machine.rs # App state transitions
β β βββ api/
β β β βββ gemini.rs # Gemini Live bidirectional WebSocket
β β β βββ groq.rs # Groq Whisper transcription API
β β βββ audio/
β β β βββ recorder.rs # cpal microphone capture
β β β βββ player.rs # PCM audio playback
β β βββ system/
β β β βββ hotkeys.rs # Global F2/F3 hotkey registration
β β β βββ text_inject.rs # xdotool clipboard paste
β β βββ tools/
β β βββ mod.rs # Gemini function calling tools
β βββ Cargo.toml
β
βββ src-tauri/ # β Tauri 2 desktop app (legacy)
β βββ aquavoice-frontend/ # React + TypeScript UI
β β βββ src/components/
β β βββ FloatingOverlay.tsx
β βββ src/main.rs # Tauri backend
β βββ Cargo.toml
β βββ tauri.conf.json
β
βββ install.sh # Tauri dependency installer
βββ package.json # Tauri npm scripts
βββ README.md
- claudegram-dashboard β Real-time agent monitoring dashboard (Next.js + glass morphism UI). Connects to maVoice's WebSocket broadcast server to display agent status, kanban tasks, conversation logs, and tool call timelines.
MIT License. See LICENSE for details.