Skip to content

xrettle/cogency

 
 

Repository files navigation

Cogency

Streaming agents that resume execution after tool calls

Break free from quadratic context replay. Linear token scaling enables conversations of unlimited depth.

Performance Breakthrough

Traditional frameworks replay entire context every tool call:

Turn 8: 31,200 tokens (5.2x cost)
Turn 16: 100,800 tokens (9.3x cost) 
Turn 32: 355,200 tokens (17.4x cost)

Cogency maintains streaming state:

Turn 8: 6,000 tokens
Turn 16: 10,800 tokens  
Turn 32: 20,400 tokens

Result: 94% token reduction at 32 turns. The deeper the conversation, the greater the savings.

Core Innovation

Stream injection with delimiter protocol:

from cogency import Agent

agent = Agent()
async for event in agent("Debug this Python script and fix any issues"):
    if event["type"] == "respond":
        print(event["content"])

Agent signals execution state explicitly:

§think: I need to examine the code structure first
§call: {"name": "file_read", "args": {"file": "main.py"}}  
§execute
[SYSTEM: Found syntax error on line 15]
§respond: Fixed the missing semicolon. Code runs correctly now.
§end

Stream pauses for tool execution, then resumes with results injected. No context replay needed.

Key Features

🚀 Stream Resumption: WebSocket sessions maintain context across tool calls
💾 Dual Memory: Passive profiles + active recall across conversations
🔒 Layered Security: Semantic reasoning + execution-level validation
🔌 Multi-Provider: OpenAI Realtime, Gemini Live, Claude HTTP
⚡ Real-time Streaming: Word-level or semantic-level event control
🔍 9 Built-in Tools: Complete file_, web_, memory_, and system_ operations

Installation

pip install cogency
export OPENAI_API_KEY="your-key"

Verify installation:

python -c "from cogency import Agent; print('✓ Cogency installed')"

Execution Modes

# Resume: WebSocket streaming (default)
agent = Agent(mode="resume")     # Persistent session, O(n) scaling

# Replay: HTTP requests  
agent = Agent(mode="replay")     # Universal compatibility, O(n²) scaling

# Auto: Resume with HTTP fallback
agent = Agent(mode="auto")       # Production recommended

Multi-Provider

agent = Agent(llm="openai")     # GPT-4o Realtime API
agent = Agent(llm="gemini")     # Gemini Live WebSocket  
agent = Agent(llm="anthropic")  # Claude HTTP

Usage

from cogency import Agent

# Basic usage
agent = Agent()
async for event in agent("What files are in this directory?"):
    if event["type"] == "respond":
        print(event["content"])

# Multi-turn conversations
async for event in agent(
    "Continue our code review",
    user_id="developer", 
    conversation_id="review_session"
):
    if event["type"] == "respond":
        print(event["content"])

# 9 built-in tools:
# file_read, file_write, file_edit, file_list, file_search
# web_search, web_scrape, recall, shell

# Custom tools
from cogency import Tool, ToolResult

class DatabaseTool(Tool):
    name = "query_db"
    description = "Execute SQL queries"
    
    async def execute(self, sql: str, user_id: str):
        # Your implementation
        return ToolResult(outcome="Query executed successfully", content="Query results...")

agent = Agent(tools=[DatabaseTool()])

Streaming Control

chunks=False (default): Complete semantic units

async for event in agent("Debug this code", chunks=False):
    if event["type"] == "think":
        print(f"🤔 {event['content']}")  # "I need to analyze this code structure"
    elif event["type"] == "respond":
        print(f"💬 {event['content']}")  # "The syntax error is on line 15"

chunks=True: Real-time event streaming

async for event in agent("Debug this code", chunks=True):
    if event["type"] == "think":
        print(event["content"], end="")  # "I need" " to" " analyze"...
    elif event["type"] == "respond":  
        print(event["content"], end="")  # "The" " syntax" " error"...

See ARCHITECTURE.md for detailed streaming behavior and frontend integration patterns.

Performance

Token efficiency scales exponentially with conversation depth:

Turns Traditional O(n²) Streaming O(n) Efficiency
1 1,800 1,800 1.0x
2 4,200 2,400 1.8x
4 10,800 3,600 3.0x
8 31,200 6,000 5.2x
16 100,800 10,800 9.3x
32 355,200 20,400 17.4x

Longer agents = exponentially better efficiency

Mathematical proof: docs/proof.md

License

Apache 2.0

About

Cognitive Architecture for AI Agents

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.8%
  • Just 0.2%