maVoice

AI voice assistant that lives on your desktop

A pure Rust desktop overlay with GPU-rendered visuals, bidirectional voice via Gemini Live, Groq-powered transcription, and a real-time agent dashboard.

What is maVoice?

maVoice is a floating AI voice assistant rendered directly on your desktop using GPU shaders. No Electron. No WebView. No browser. Just two transparent windows — an animated AI orb and a waveform strip — that sit on top of everything and respond to your voice in real-time.

It operates in two modes:

Groq mode — Push-to-talk dictation. Record, transcribe via Groq Whisper, paste to clipboard.
Gemini mode — Always-on bidirectional voice conversation via Gemini 2.0 Flash Live. The AI can search memory, run shell commands, delegate tasks to Claude, and remember things across sessions.

Two Versions

This repo contains two implementations — the original Tauri/React app and the newer pure Rust native overlay. Both live in the same repo and both work.

	mavoice-native (new)	src-tauri (original)
Stack	Pure Rust, wgpu, WGSL shaders	Tauri 2 + React + TypeScript
Rendering	GPU shaders on transparent X11 windows	WebKitGTK floating widget
Voice	Gemini Live bidirectional + Groq STT	Groq STT only
Tools	search_memory, remember, run_command, ask_claude	—
Dashboard	WebSocket broadcast to claudegram	—
UI	AI orb + waveform strip (shader-rendered)	Floating button with settings panel
Size	~5MB static binary	~50MB (Tauri + WebKitGTK)

The native version (mavoice-native/) is the active development target. The Tauri version (src-tauri/) remains in the repo as a fully functional alternative — useful if you want the settings UI, web-based configuration panel, or prefer the widget-style interface.

Architecture (Native)

┌──────────────────────────────────────────────────────┐
│          mavoice-native (pure Rust binary)           │
│                                                      │
│  ┌──────────┐  ┌──────────┐  ┌────────────────────┐  │
│  │ wgpu/WGSL│  │  cpal    │  │ Gemini Live (WS)   │  │
│  │ renderer │  │  audio   │  │ bidirectional voice│  │
│  │ 2 windows│  │ capture  │  │ + function calling │  │
│  └──────────┘  └──────────┘  └────────────────────┘  │
│                                                      │
│  ┌──────────┐  ┌──────────┐  ┌────────────────────┐  │
│  │ Groq API │  │ Global   │  │ Dashboard WS       │  │
│  │ Whisper  │  │ Hotkeys  │  │ broadcast (3001)   │  │
│  │ STT      │  │ F2 / F3  │  │ → claudegram UI    │  │
│  └──────────┘  └──────────┘  └────────────────────┘  │
│                                                      │
│  ┌─────────────────────────────────────────────────┐ │
│  │ Tools: search_memory, remember, run_command,    │ │
│  │        ask_claude                               │ │
│  └─────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────┘

Rendering

Two transparent always-on-top windows rendered with wgpu + WGSL shaders:

AI Orb (96px) — Animated spiral sphere shader that reacts to voice state (idle pulse, speaking glow, thinking spin)
Waveform Strip (64px) — Real-time audio level visualization at the bottom of the screen

Both windows use softbuffer for X11 transparency compositing. No toolkit, no DOM, no CSS — raw GPU pixels on a transparent surface.

Voice Pipeline

Groq mode:

Mic (cpal) → WAV buffer → Groq Whisper API → clipboard (xclip) → xdotool paste

Gemini mode:

Mic (cpal) → PCM 16kHz → WebSocket → Gemini 2.0 Flash Live
                                          ↓
                              ← Audio response (PCM 24kHz)
                              ← Function calls (tools)
                              ← Text responses

Gemini Tools

When in Gemini mode, the AI has access to 4 function-calling tools:

Tool	Description
`search_memory`	FTS5 search over the ShieldCortex memory database (SQLite)
`remember`	Save a new memory to the database for cross-session recall
`run_command`	Execute a shell command with 30s timeout, return stdout/stderr
`ask_claude`	Delegate a task to Claude Code CLI, return the response

Dashboard

A WebSocket broadcast server on ws://localhost:3001 streams real-time events to the claudegram dashboard — a separate Next.js project with a glass-morphism UI that shows:

Agent status cards (Claude, Gemini, Droid, Groq) with live state indicators
Kanban board for tracking agent tasks across columns
Action log with conversation bubbles, per-event copy, and session export
Tool call timeline with elapsed timers

See the claudegram-dashboard repo for setup and usage.

Quick Start (Native)

Prerequisites

Rust 1.75+
Linux with X11 (Wayland support planned)
A Groq API key for transcription
A Google AI API key for Gemini Live voice

System Dependencies (Debian/Ubuntu)

sudo apt install -y \
    build-essential pkg-config \
    libasound2-dev \
    xdotool xclip \
    libx11-dev libxcb1-dev

Build & Install

git clone https://github.com/lliWcWill/maVoice-Linux.git
cd maVoice-Linux/mavoice-native

# Build release binary
cargo build --release

# Install to ~/.local/bin
cp target/release/mavoice-native ~/.local/bin/

# Create config
mkdir -p ~/.config/mavoice
cat > ~/.config/mavoice/config.toml << 'EOF'
api_key = "gsk_your_groq_key_here"
gemini_api_key = "your_google_ai_key_here"
model = "whisper-large-v3-turbo"
language = "en"
mode = "gemini"
voice_name = "Aoede"
EOF

Run

mavoice-native

Systemd Service (auto-start)

mkdir -p ~/.config/systemd/user

cat > ~/.config/systemd/user/mavoice.service << 'EOF'
[Unit]
Description=maVoice — AI Voice Assistant Overlay
Documentation=https://github.com/lliWcWill/maVoice-Linux
After=graphical-session.target

[Service]
Type=simple
ExecStart=%h/.local/bin/mavoice-native
Restart=on-failure
RestartSec=3
Environment=DISPLAY=:0
Environment=RUST_LOG=info

[Install]
WantedBy=default.target
EOF

systemctl --user daemon-reload
systemctl --user enable --now mavoice

Quick Start (Tauri — Legacy)

The original Tauri version is a floating desktop widget with a React-based settings panel, model selection, and multi-language support.

Prerequisites

Node.js 18+
Rust 1.70+
Tauri 2 system dependencies (WebKitGTK, etc.)

Install & Run

git clone https://github.com/lliWcWill/maVoice-Linux.git
cd maVoice-Linux
./install.sh

# Add your Groq API key
echo "VITE_GROQ_API_KEY=your_groq_api_key_here" > src-tauri/aquavoice-frontend/.env

# Launch
npm run dev

Tauri system dependencies (Debian/Ubuntu)

sudo apt install -y \
    build-essential pkg-config libgtk-3-dev libwebkit2gtk-4.1-dev \
    libsoup-3.0-dev libjavascriptcoregtk-4.1-dev libdbus-1-dev \
    libappindicator3-dev librsvg2-dev libasound2-dev \
    xdotool wl-clipboard wtype

WSL2 setup

WSL2 + WSLg works for the Tauri version. Update WSL2 from PowerShell:

wsl --update
wsl --version  # Ensure version 2 with WSLg

Then install dependencies inside your WSL2 distro and run normally.

Tauri Usage

Double-click the floating widget to start recording
Single-click to stop and transcribe
Right-click or Ctrl+click to drag the widget
Settings via the gear icon (model selection, language, custom prompts, temperature)

Usage (Native)

Hotkeys

Key	Action
F2	Toggle Groq dictation (push-to-talk)
F3	Toggle Gemini Live voice conversation

Groq Mode (F2)

Press F2 to start recording
Speak naturally
Press F2 again to stop
Transcription is copied to clipboard and pasted at cursor

Gemini Mode (F3)

Press F3 to open a Gemini Live session
Speak naturally — the AI responds with voice in real-time
The AI can use tools (search memory, run commands, ask Claude)
Press F3 again to end the session
Supports barge-in (interrupt the AI mid-sentence)

Configuration

Edit ~/.config/mavoice/config.toml:

api_key = "gsk_..."                # Groq API key
gemini_api_key = "AI..."           # Google AI API key
model = "whisper-large-v3-turbo"   # Groq model
language = "en"                    # Transcription language
mode = "gemini"                    # Default mode: "groq" or "gemini"
voice_name = "Aoede"               # Gemini voice: Puck, Charon, Kore, Fenrir, Aoede
system_instruction = "..."         # Custom system prompt for Gemini
temperature = 0.0                  # Groq transcription temperature
dictionary = ""                    # Custom terms for Groq

Tech Stack

Native (`mavoice-native/`)

Component	Technology
Language	Rust (pure, no WebView)
GPU Rendering	wgpu + WGSL shaders
Window Management	winit + softbuffer (X11 transparency)
Audio Capture	cpal (ALSA)
Voice AI	Gemini 2.0 Flash Live (WebSocket)
Transcription	Groq Whisper Large v3 Turbo
Tool Execution	rusqlite, tokio::process, Claude CLI
Dashboard	tokio-tungstenite broadcast server
Hotkeys	global-hotkey crate
Clipboard	xclip, xdotool

Tauri (`src-tauri/`)

Component	Technology
Framework	Tauri 2
Frontend	React + TypeScript + Tailwind
Transcription	Groq Whisper (via groq-sdk)
Audio	Web Audio API

Project Structure

maVoice-Linux/
├── mavoice-native/              # ← Pure Rust native overlay (active)
│   ├── src/
│   │   ├── main.rs              # Entry point, window creation
│   │   ├── app.rs               # Event loop, state machine, dashboard
│   │   ├── renderer.rs          # wgpu setup, shader pipeline
│   │   ├── shader.wgsl          # Waveform strip shader
│   │   ├── ai_shader.wgsl       # AI orb spiral sphere shader
│   │   ├── config.rs            # TOML config loading
│   │   ├── dashboard.rs         # WebSocket broadcast server
│   │   ├── state_machine.rs     # App state transitions
│   │   ├── api/
│   │   │   ├── gemini.rs        # Gemini Live bidirectional WebSocket
│   │   │   └── groq.rs          # Groq Whisper transcription API
│   │   ├── audio/
│   │   │   ├── recorder.rs      # cpal microphone capture
│   │   │   └── player.rs        # PCM audio playback
│   │   ├── system/
│   │   │   ├── hotkeys.rs       # Global F2/F3 hotkey registration
│   │   │   └── text_inject.rs   # xdotool clipboard paste
│   │   └── tools/
│   │       └── mod.rs           # Gemini function calling tools
│   └── Cargo.toml
│
├── src-tauri/                   # ← Tauri 2 desktop app (legacy)
│   ├── aquavoice-frontend/      # React + TypeScript UI
│   │   └── src/components/
│   │       └── FloatingOverlay.tsx
│   ├── src/main.rs              # Tauri backend
│   ├── Cargo.toml
│   └── tauri.conf.json
│
├── install.sh                   # Tauri dependency installer
├── package.json                 # Tauri npm scripts
└── README.md

Related Projects

claudegram-dashboard — Real-time agent monitoring dashboard (Next.js + glass morphism UI). Connects to maVoice's WebSocket broadcast server to display agent status, kanban tasks, conversation logs, and tool call timelines.

License

MIT License. See LICENSE for details.

maVoice — Pure Rust AI voice on your desktop

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
mavoice-native		mavoice-native
src-tauri		src-tauri
.gitignore		.gitignore
Dockerfile		Dockerfile
QUICK_REFERENCE.md		QUICK_REFERENCE.md
README.md		README.md
check-wsl-tauri-setup.sh		check-wsl-tauri-setup.sh
docker-compose.dev.yml		docker-compose.dev.yml
install-tauri2-deps.sh		install-tauri2-deps.sh
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
run-on-windows.md		run-on-windows.md
setup-mavoice.sh		setup-mavoice.sh
test-install-sequence.sh		test-install-sequence.sh
windows-text-injection-fix.md		windows-text-injection-fix.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

maVoice

What is maVoice?

Two Versions

Architecture (Native)

Rendering

Voice Pipeline

Gemini Tools

Dashboard

Quick Start (Native)

Prerequisites

System Dependencies (Debian/Ubuntu)

Build & Install

Run

Systemd Service (auto-start)

Quick Start (Tauri — Legacy)

Prerequisites

Install & Run

Tauri Usage

Usage (Native)

Hotkeys

Groq Mode (F2)

Gemini Mode (F3)

Configuration

Tech Stack

Native (`mavoice-native/`)

Tauri (`src-tauri/`)

Project Structure

Related Projects

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

lliWcWill/maVoice-Linux

Folders and files

Latest commit

History

Repository files navigation

maVoice

What is maVoice?

Two Versions

Architecture (Native)

Rendering

Voice Pipeline

Gemini Tools

Dashboard

Quick Start (Native)

Prerequisites

System Dependencies (Debian/Ubuntu)

Build & Install

Run

Systemd Service (auto-start)

Quick Start (Tauri — Legacy)

Prerequisites

Install & Run

Tauri Usage

Usage (Native)

Hotkeys

Groq Mode (F2)

Gemini Mode (F3)

Configuration

Tech Stack

Native (mavoice-native/)

Tauri (src-tauri/)

Project Structure

Related Projects

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Native (`mavoice-native/`)

Tauri (`src-tauri/`)

Packages