DecisionsAI

DecisionsAI is a voice-controlled digital assistant that combines real-time speech recognition, local language model processing, and text-to-speech synthesis to provide a seamless hands-free computing experience. Built on the Pipecat framework, it efficiently orchestrates audio and text processing with minimal memory overhead.

Built for offline-first, upgraded with powerful cloud capabilities. DecisionsAI is designed to work completely offline using local models (Whisper.cpp for speech recognition, Ollama for language processing, and Kokoro for text-to-speech), ensuring privacy and reliability. However, when you enter your API keys, the experience transforms—you gain access to cutting-edge cloud models and services that take it to the next level.

Very strong support for third-party services. DecisionsAI now includes comprehensive support for major AI providers including OpenAI, Anthropic, ElevenLabs, and OpenRouter. With OpenRouter integration, you can tap into the latest models including GPT-5.2, Gemini 3 Flash, Nano Banana, and all the newest model releases as they become available. This gives you instant access to the most advanced AI capabilities without waiting for individual provider integrations.

Remote control from anywhere. DecisionsAI includes full Telegram integration for remote access—send voice messages, text commands, or use the web-based remote control interface to navigate and control your computer screens from anywhere using your mobile device.

Native Google Workspace integration. For optimal performance with Google services, DecisionsAI includes direct integration with Google Workspace (Gmail, Calendar, Drive, Docs, Sheets). This native integration provides faster, more reliable access to your Google data compared to third-party routing. For everything else, the platform supports workflow automation through Rube/Composio, connecting to 500+ additional apps and services including Slack, GitHub, Notion, Microsoft Teams, and more.

Full file control and document processing. DecisionsAI gives you complete control over your files—transcribe audio and video files, process PDFs and Word documents for context, upload documents to Google Drive, and convert Markdown files directly into Google Docs. Create files on command during any conversation, and generate tickets for your development workflow as you go.

Universal vision support across all providers. DecisionsAI now supports image input and vision capabilities across all major LLM providers—OpenAI, Anthropic (Claude), OpenRouter, Groq, KiloCode, and Ollama. Upload images directly in chat and ask questions about screenshots, photos, diagrams, or any visual content. All providers automatically optimize images with WebP compression (typically 50-80% smaller than PNG/JPEG), saving bandwidth and API costs while maintaining quality. Vision tools include built-in screenshot analysis and image processing capabilities that work seamlessly with any vision-capable model.

Beyond voice commands, DecisionsAI includes a chat interface for text-based conversations with full conversation history and an interactive Oracle/globe visual interface. The chat interface also lets you download any generated text-to-speech audio as mp3 files for later use. For automation, the built-in Actions feature lets you record keyboard and mouse input as macros, then replay them on command—perfect for automating repetitive tasks or creating complex workflows that can be triggered with a simple voice command.

IDE Integration & Project Workflows

DecisionsAI includes a Visual Studio Code extension that seamlessly integrates with your development workflow. When you're working on a project, simply tell DecisionsAI about it—the extension listens for tickets and instructions you create through voice commands. As you discuss features, bugs, or tasks, DecisionsAI automatically generates structured tickets that your IDE picks up and processes.

The workflow is simple: start a conversation about your project, describe what needs to be done, and DecisionsAI creates tickets with full context. IDEs like Cursor or Visual Studio Code with the extension installed will automatically detect these tickets and can begin working on them. This hands-free approach means you can brainstorm, plan, and delegate tasks to your IDE without ever touching the keyboard—DecisionsAI and your IDE work hand in glove to turn your ideas into code.

Performance & Architecture

DecisionsAI is built on the Pipecat framework, a real-time voice AI pipeline that significantly optimizes memory usage and performance. Pipecat orchestrates the flow of audio, text, and control frames between speech recognition (STT), language models (LLM), and text-to-speech (TTS) services using efficient frame-based communication.

Key Improvements & Optimizations

Memory Efficiency: Pipecat's frame-based architecture eliminates redundant data copying and enables efficient streaming, reducing overall memory footprint by up to 40-50% compared to traditional approaches
Real-time Processing: Frame-based communication enables low-latency voice interactions with minimal buffering
Interruption Handling: Built-in interruption support allows natural conversation flow with immediate response to user input
Streaming Architecture: Audio and text are processed in chunks, reducing memory spikes and enabling smooth performance on lower-end hardware
Service Coordination: Intelligent frame routing ensures optimal resource utilization across STT, LLM, and TTS services

Technology Stack

Offline-First Core:

Whisper.cpp - Efficient offline speech recognition (supports various model sizes)
Kokoro - High-quality offline text-to-speech with natural voice synthesis
Ollama - Local language model inference (supports various models including Llama, Gemma, and more)
Pipecat-ai - Real-time voice AI pipeline orchestration framework with frame-based streaming

Third-Party Services (Optional - Enter API Keys to Enable):

AI Model Providers:

OpenRouter - Unified access to the latest models including GPT-5.2, Gemini 3 Flash, Nano Banana, Claude 3.7, and all cutting-edge model releases as they become available
OpenAI - GPT-5.2, GPT-4 Turbo, GPT-4o, and other OpenAI models
Anthropic - Claude 3.7 Sonnet, Claude 3.5 Opus, Claude 3 Haiku, and other Claude models
Ollama - Local and remote Ollama instances for self-hosted models

Speech & Voice Services:

ElevenLabs - Cloud-based text-to-speech with high-quality voice synthesis and voice cloning
AssemblyAI - Advanced speech recognition and transcription services

Integration & Automation Platforms:

Rube/Composio - Connect to 500+ apps and services for workflow automation (Slack, GitHub, Gmail, Notion, Google Workspace, Microsoft Teams, and more)

System Requirements

Local/Offline Mode (Default)

When running DecisionsAI in offline mode with local models, you'll need:

Operating System:
- macOS: Fully tested and supported
- Linux/Unix: Intended support (may require additional configuration)
- Windows: Intended support (may require additional configuration)
RAM: Minimum 12GB (16GB recommended for optimal performance)
Python: 3.8 or higher
System Dependencies: PortAudio and FFmpeg
Disk Space: Minimum 6GB free space for model downloads
Internet Connection: Stable connection required for initial setup (model downloads are ~5GB total)

⏱️ Initial Setup & Model Downloads (Offline Mode):

Total download size: ~5.0GB (Kokoro TTS models: ~100MB + Ollama llama3.1:8b: ~4.9GB)
Download time estimates:
- Fast connection (100 Mbps): ~7-10 minutes
- Medium connection (50 Mbps): ~15-20 minutes
- Slow connection (10 Mbps): ~1+ hours
Progress bars will be displayed during downloads. Please be patient and ensure you have a stable internet connection.

Note: In offline mode, the application uses llama3.1:8b (~4.9GB) which stays loaded in memory for optimal performance. Combined with the operating system, application overhead, and other models (Kokoro TTS, Whisper.cpp), a minimum of 12GB RAM is required. 16GB is recommended for smooth operation, especially when running other applications simultaneously. Thanks to Pipecat's optimized architecture, DecisionsAI efficiently orchestrates audio and text processing with minimal memory overhead beyond the model itself.

Online/Cloud Mode (With API Keys)

Using OpenAI, Anthropic, OpenRouter, or other cloud-based services drastically reduces the system footprint!

When using online services, you can run DecisionsAI with significantly lower requirements:

Operating System:
- macOS: Fully tested and supported
- Linux/Unix: Intended support (may require additional configuration)
- Windows: Intended support (may require additional configuration)
RAM: Minimum 4GB (8GB recommended)
Python: 3.8 or higher
System Dependencies: PortAudio and FFmpeg
Disk Space: Minimum 200MB free space (only for Kokoro TTS models and Whisper.cpp)
Internet Connection: Stable, high-speed connection required for real-time AI interactions

Benefits of Online Mode:

No large model downloads: No need to download 4.9GB Ollama models
Reduced memory usage: Models run on cloud servers, not your local machine
Access to latest models: Get instant access to Gemini 3, GPT-4 Turbo, Claude 3.5 Sonnet, and other cutting-edge models
Better performance on low-end hardware: Perfect for laptops and systems with limited RAM
Faster setup: Only download lightweight local components (~200MB total)

Note: You can mix and match! Use local models for privacy-sensitive tasks and cloud models for complex reasoning. DecisionsAI intelligently routes requests based on your configuration. The application includes cross-platform support with platform-specific optimizations for clipboard operations, keyboard shortcuts, and system paths.

Installation & Usage

Quick Start (Recommended)

The easiest way to get started is to use the provided executables, which handle all setup automatically:

Clone the repository:

git clone https://github.com/tensology/decisionsai.git
cd decisionsai

Run the appropriate executable for your platform:

macOS:
- Double-click decisions.app in Finder, or
- Run ./decisions in Terminal
Windows:
- Double-click decisions.bat in File Explorer, or
- Run decisions.bat from Command Prompt
Unix/Linux:
- Run ./decisions in Terminal
These executables will automatically:
- Check and install system dependencies (portaudio, ffmpeg) if missing
- Detect or create a Python virtual environment (prioritizing virtualenvwrapper if available)
- Install all Python dependencies from requirements.txt
- Download required AI models (if not already present) via bin/setup.py
- Start the application via bin/start.py
Note: On Linux/macOS, system dependency installation may require sudo/admin privileges. On Windows, the script will attempt to use winget, Chocolatey, or Scoop if available.
Interact with the assistant using voice commands.

Manual Installation & Setup

If you prefer to set up the project manually or need more control over the installation process, you can use the scripts in the bin/ directory directly.

Prerequisites

Python: 3.8 or higher
System Dependencies: PortAudio and FFmpeg
- macOS: brew install portaudio ffmpeg
- Linux: sudo apt-get install portaudio19-dev ffmpeg (Debian/Ubuntu) or equivalent
- Windows: Install via winget, Chocolatey, or Scoop

Step 1: Set Up Python Environment

Create and activate a virtual environment:

# Using venv (recommended)
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Or using virtualenvwrapper (if installed)
mkvirtualenv decisions

Step 2: Install Python Dependencies

# For Python 3.13+, set compatibility flag for tiktoken (LlamaIndex dependency)
# For Python 3.12 or earlier, you can skip the export line
export PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1  # Only needed for Python 3.13+
pip install -r requirements.txt

Optional Dependencies:

LlamaIndex (for RAG): Better performance for document indexing and retrieval

These are included in requirements.txt but can be installed separately if needed. See docs/INSTALLATION.md for details.

Step 3: Download AI Models

Run the setup script to download all required AI models:

# Download required models
python bin/setup.py

# Optional: Also install optional dependencies (LlamaIndex)
python bin/setup.py --install-optional

What bin/setup.py does:

The setup script downloads and configures the following AI models:

Kokoro TTS Models (Text-to-Speech):
- Downloads kokoro-v1.0.onnx (~50MB) - The main TTS model
- Downloads voices-v1.0.bin (~10MB) - Voice configuration file
- Location: ./distr/agent/models/
- Source: kokoro-onnx releases
Ollama Language Model:
- Pulls llama3.1:8b model (~4.9GB) via Ollama API
- Better accuracy and lower hallucination rates compared to smaller models like llama3.2:3b
- Checks if model is up-to-date (within 24 hours) before re-downloading
- Requires Ollama to be installed and running
- Source: Ollama

Model Download Process:

The script checks if models already exist before downloading
Progress bars are displayed for large downloads
Failed downloads can be resumed by running the script again
Total download size: ~100MB (Kokoro models) + ~4.9GB (Ollama llama3.1:8b model) = ~5.0GB total

Optional: Vosk Speech Recognition Model

Whisper.cpp is the default STT (Speech-to-Text) engine and is included with the main setup. However, if you prefer to use Vosk as an alternative STT option, you can install it separately:

python bin/setup_vosk.py

What bin/setup_vosk.py does:

Downloads vosk-model-en-us-0.22.zip (~1.8GB) - English speech recognition model
Extracts to ./distr/agent/models/vosk-model-en-us-0.22/
Location: ./distr/agent/models/vosk-model-en-us-0.22/
Source: Vosk models

Note: Vosk is optional. The application uses Whisper.cpp by default for speech recognition. You can switch between Whisper.cpp and Vosk in the application settings (Settings > AI > Transcription Model) if both are installed.

Step 4: Start the Application

Once all models are downloaded, start the application:

python bin/start.py

What bin/start.py does:

Adds the project root to Python's path for module imports
Suppresses macOS memory logging warnings (if on macOS)
Initializes the AppKit framework for macOS integration (if on macOS)
Launches the main application via distr.app.run()

The application will start and you can begin interacting with the assistant using voice commands.

Contributing

We welcome contributions to DecisionsAI! If you have suggestions or improvements, please open an issue or submit a pull request.

Development Status

This project is actively being developed. Current focus areas include:

Improving voice recognition accuracy
Enhancing offline capabilities
Adding support for additional AI models
Enhanced dictation and transcription features

Code Execution

DecisionsAI includes built-in code execution capabilities, enabling the assistant to execute Python code, perform file operations, and carry out complex tasks on your local machine directly through voice commands or chat interactions.

Voice Commands

DecisionsAI responds to a wide range of voice commands.

Here's a comprehensive list of available commands:

Navigation and Window Management

Command	Description
Open / Focus / Focus on	Open or focus on a specific window
Open file menu	Open the file menu
Hide oracle / Hide globe	Hide the oracle/globe interface
Show oracle / Show globe	Show the oracle/globe interface
Change oracle / Change globe	Change the oracle/globe interface
Change previous oracle / Change previous globe	Change to the previous oracle/globe image
Open GPT	Open GPT (Alt+Space shortcut)
Open spotlight / Spotlight search	Open Spotlight search (Cmd+Space)
New tab	Create a new tab (Cmd+T)
Previous tab	Switch to the previous tab (Cmd+Alt+Left)
Next tab	Switch to the next tab (Cmd+Alt+Right)
Close	Close the current window (Cmd+W)
Quit	Quit the current application (Cmd+Q)

Chat Management

Command	Description
New Chat / Start over / New conversation	Start a new chat conversation

Text Editing and Navigation

Command	Description
Copy	Copy selected text (Cmd+C)
Paste	Paste copied text (Cmd+V)
Cut	Cut selected text (Cmd+X)
Select all	Select all text (Cmd+A)
Undo	Undo last action (Cmd+Z)
Redo	Redo last undone action (Cmd+Shift+Z)
Back space / Backspace	Delete character before cursor
Delete	Delete character after cursor
Clear line	Clear the current line
Delete line	Delete the entire line (Cmd+Shift+K)
Force delete	Force delete (Cmd+Backspace)

Carot Movement

Command	Description
Up / Down / Left / Right	Move cursor in specified direction
Page up / Page down	Scroll page up/down (Fn+Up/Down)
Home	Move cursor to beginning of line (Fn+Left)
End	Move cursor to end of line (Fn+Right)

Mouse Control

Command	Description
Mouse up / Mouse down / Mouse left / Mouse right	Move mouse in specified direction
Mouse slow up / Mouse slow down / Mouse slow left / Mouse slow right	Move mouse slowly in specified direction
Move mouse center	Move mouse to center of screen
Move mouse middle	Move mouse to horizontal middle of screen
Move mouse vertical middle	Move mouse to vertical middle of screen
Move mouse top	Move mouse to top of screen
Move mouse bottom	Move mouse to bottom of screen
Move mouse far left	Move mouse to left edge of screen
Move mouse far right	Move mouse to right edge of screen
Right click	Perform a right-click
Click	Perform a left-click
Double click	Perform a double left-click
Scroll up / Scroll down	Scroll the page up/down

Sound Controls

Command	Description
Refresh / Reload	Refresh the current page (Cmd+R)
Pause / Stop / Play	Control media playback
Next track / Previous track	Switch between tracks
Mute	Mute audio
Volume up / Volume down	Adjust volume

Function Keys

Command	Description
Press F1 through Press F12	Press the corresponding function key

Special Keys

Command	Description
Space bar / Space / Spacebar	Press the space bar
Control	Press the Control key
Command	Press the Command key
Enter this	Press the Enter key
Press alt / Alt	Press the Alt key
Press escape / Escape / Cancel	Press the Escape key
Tab	Press the Tab key

AI Assistant Interactions

Command	Description
Dictate	Start dictation mode, enters in whatever you say, except for ending phrases, ie. "Enter this"
Transcribe / Listen / Listen to	Start transcription mode, stores whatever you say to clipboard until you say "Enter this" or "stop listening"
Read / Speak / Recite / Announce	Read out the transcribed text, or if you say "this", it will read out whatever you've selected
Agent / Hey / Jarvis	Activate the AI agent for complex tasks
Explain / Elaborate	Explanation or elaboration of the copy that is in the clipboard
Rework this / Reword this	Rework/improve selected text using LLM, updates clipboard, then pastes
Rework from clipboard / Reword from clipboard	Rework/improve clipboard content using LLM, updates clipboard (no paste)
Summarize this	Summarize selected text using LLM, updates clipboard, then pastes
Summarize from clipboard	Summarize clipboard content using LLM, updates clipboard (no paste)
What's in the clipboard / Get the clipboard / Show clipboard	Display current clipboard content in the conversation
Save this as audio	Generate audio from selected text using TTS and save it as a WAV file to the Desktop
Calculate / Figure out / Analyze	Perform calculations or analysis of clipboard content
Translate	Translate text from source language to target language
Type 'text' / Type "text"	Immediately type the specified text as keyboard input (e.g., "type 'hello world'" or "type from clipboard")

Control Commands

Command	Description
Start listening / Listen / Listen to	Begin voice command recognition
Stop listening / Stop / Halt	Stop voice command recognition
Stop speaking / Shut up / Be quiet	Stop the AI from speaking
Exit	Exit the application

Telegram Integration

DecisionsAI includes comprehensive Telegram integration for remote control and communication:

Remote Control: Say "remote control" or "remote" in Telegram to receive a link to a web-based remote control interface. This allows you to navigate and control your computer screens through WebSockets using your Telegram chat ID as the subscription identifier. You can view screens, take screenshots, control mouse position, click, double-click, type text, and send keyboard commands directly from the web interface.
Voice & Text Messages: Send voice messages or text to your connected Telegram bot, and DecisionsAI will process them as commands or questions, responding with voice notes, text, and screenshots as appropriate.
Connection: Connect your Telegram account through Settings > Advanced > Telegram Connection. Once connected, you can interact with DecisionsAI remotely via Telegram.
Optimized Performance:
- Screenshots are automatically compressed to WebP format (typically 25-35% smaller than PNG/JPEG) for faster uploads and reduced bandwidth
- Silent connection polling - ping/pong keepalive messages are handled silently without log spam
- Smart auto-reconnect - automatic reconnections don't send notification messages to avoid spam
- Efficient connection status tracking - only logs meaningful connection state changes

Google Workspace Integration

DecisionsAI includes native Google Workspace integration for direct access to Google services:

Gmail: Read emails, send emails, create drafts, reply to messages, and manage your inbox with natural voice commands
Google Calendar: Create events, check your schedule, and manage appointments
Google Drive: List folders, read files, upload documents, and access PDFs
Google Docs: Create documents directly from markdown files
Google Sheets: Read and interact with spreadsheet data

Setup: Connect your Google account through Settings > Connections > Google Workspace. The integration uses OAuth 2.0 for secure authentication.

Why Native Integration? DecisionsAI prioritizes its native Google Workspace integration over third-party routing (like Rube/Composio) because direct API access provides faster response times and more reliable performance when working with your Google data.

Actions (Macro Recording & Playback)

DecisionsAI includes a powerful action recording system that lets you record keyboard and mouse input, then replay those actions on command:

Record Actions: Say "start recording" to begin capturing your keyboard presses, mouse movements, clicks, and drags. Everything you do is recorded with precise timing.
Stop Recording: Say "stop recording" or click the tray icon to stop. You'll be prompted to name your action.
Run Actions: Say "run action [name]" or "play action [name]" to replay the recorded sequence. DecisionsAI automatically generates trigger words from your action title, so an action named "Open Terminal and SSH" can be triggered by saying "open terminal", "SSH", or the full name.
Stop Playback: Say "stop action" to immediately halt a running action.

Use Cases:

Automate repetitive tasks (form filling, file operations, application workflows)
Create keyboard shortcuts for complex multi-step processes
Build macros for applications that don't support native automation

Management: Access the Actions window from the system tray menu to view, edit, rename, or delete your recorded actions.

Note: Voice recognition is currently limited to English. Some features may require internet connectivity depending on your configuration.

Summary

DecisionsAI is an intelligent digital assistant designed to understand and execute various tasks on your computer. It leverages cutting-edge AI technologies to provide voice interaction, automation, and adaptive learning capabilities.

License

This project is licensed under the TENSOLOGY COMMUNITY LICENSE AGREEMENT. See the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 179 Commits
.cursor		.cursor
assets		assets
bin		bin
distr		distr
installer		installer
llama_index_storage		llama_index_storage
playground		playground
scripts		scripts
tests		tests
vscode_extension		vscode_extension
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE.md		LICENSE.md
README.md		README.md
decisions		decisions
decisions.bat		decisions.bat
decisionsai.db		decisionsai.db
decisionsai.pid		decisionsai.pid
info.plist		info.plist
requirements.txt		requirements.txt
uninstall		uninstall

Folders and files

Latest commit

History

Repository files navigation

DecisionsAI

IDE Integration & Project Workflows

Performance & Architecture

Key Improvements & Optimizations

Technology Stack

System Requirements

Local/Offline Mode (Default)

Online/Cloud Mode (With API Keys)

Installation & Usage

Quick Start (Recommended)

Manual Installation & Setup

Prerequisites

Step 1: Set Up Python Environment

Step 2: Install Python Dependencies

Step 3: Download AI Models

Optional: Vosk Speech Recognition Model

Step 4: Start the Application

Contributing

Development Status

Code Execution

Voice Commands

Navigation and Window Management

Chat Management

Text Editing and Navigation

Carot Movement

Mouse Control

Sound Controls

Function Keys

Special Keys

AI Assistant Interactions

Control Commands

Telegram Integration

Google Workspace Integration

Actions (Macro Recording & Playback)

Summary

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages