GitHub - mabel-dev/orso: Orso is a row-based Python DataFrame library

Orso is a shared DataFrame library for Opteryx and Mabel.

Overview

Orso is not intended to compete with Polars or Pandas (or your favorite ~~bear~~ DataFrame technology), instead it is developed as a common layer for Mabel and Opteryx.

Key Use Cases:

In Opteryx, Orso provides most of the database Cursor functionality
In Mabel, Orso provides the data schema and validation functionality

Orso DataFrames are row-based, driven by their initial target use-case as the WAL for Mabel and Cursor for Opteryx. Each row in an Orso DataFrame can be quickly converted to a Tuple of values, a Dictionary, or a byte representation.

Installation

Install Orso from PyPI:

pip install orso

Quick Start

Creating a DataFrame

import orso

# Create from list of dictionaries
df = orso.DataFrame([
    {'name': 'Alice', 'age': 30, 'city': 'New York'},
    {'name': 'Bob', 'age': 25, 'city': 'San Francisco'},
    {'name': 'Charlie', 'age': 35, 'city': 'Chicago'}
])

print(f"Created DataFrame with {df.rowcount} rows and {df.columncount} columns")

Displaying Data

# Display the DataFrame
print(df.display())

# Convert to different formats
arrow_table = df.arrow()  # PyArrow Table
pandas_df = df.pandas()   # Pandas DataFrame

Working with Schema

# Access column names
print("Columns:", df.column_names)

# Access schema information  
print("Schema:", df.schema)

Converting Between Formats

# From PyArrow
import pyarrow as pa
arrow_table = pa.table({'x': [1, 2, 3], 'y': ['a', 'b', 'c']})
orso_df = orso.DataFrame.from_arrow(arrow_table)

# To Pandas
pandas_df = orso_df.pandas()

Features

Lightweight: Minimal overhead for tabular data operations
Row-based: Optimized for row-oriented operations
Interoperable: Easy conversion to/from PyArrow, Pandas
Schema-aware: Built-in data validation and type checking
Fast serialization: Efficient conversion to bytes, tuples, and dictionaries

API Reference

DataFrame Class

The main DataFrame class provides the following key methods:

DataFrame(dictionaries=None, *, rows=None, schema=None) - Constructor
display(limit=5, colorize=True, show_types=True) - Pretty print the DataFrame
arrow(size=None) - Convert to PyArrow Table
pandas(size=None) - Convert to Pandas DataFrame
from_arrow(tables) - Create DataFrame from PyArrow Table(s)
fetchall() - Get all rows as list of Row objects
collect() - Materialize the DataFrame
append(other) - Append another DataFrame
distinct() - Get unique rows

Properties

rowcount - Number of rows
columncount - Number of columns
column_names - List of column names
schema - Schema information

Development

Building from Source

# Clone the repository
git clone https://github.com/mabel-dev/orso.git
cd orso

# Install dependencies
pip install -r requirements.txt
pip install -r tests/requirements.txt

# Build Cython extensions
make compile

# Run tests
make test

Contributing

Orso is part of the Mabel ecosystem. Contributions are welcome! Please ensure:

All tests pass: make test
Code follows the project style: make lint
New features include appropriate tests
Documentation is updated for API changes

Performance Benchmarking

Orso includes a comprehensive performance benchmark suite to compare different versions:

# Run full benchmark suite
python tests/test_benchmark_suite.py

# Compare two versions
python tests/test_benchmark_suite.py -o baseline.json
# <switch version>
python tests/test_benchmark_suite.py -o current.json -c baseline.json

See BENCHMARK_SUITE.md for detailed documentation.

License

Orso is licensed under Apache 2.0 unless explicitly indicated otherwise.

Status

Orso is in beta. Beta means different things to different people, to us, being beta means:

Interfaces are generally stable but may still have breaking changes
Unit tests are not reliable enough to capture breaks to functionality
Bugs are likely to exist in edge cases
Code may not be tuned for performance

As such, we really don't recommend using Orso in critical applications.

Related Projects

Opteryx - SQL query engine for data files
Mabel - Data processing framework

Name		Name	Last commit message	Last commit date
Latest commit History 495 Commits
.github/workflows		.github/workflows
examples		examples
orso		orso
tests		tests
.gitignore		.gitignore
.yamllint		.yamllint
BENCHMARK_QUICKREF.md		BENCHMARK_QUICKREF.md
BENCHMARK_SUITE.md		BENCHMARK_SUITE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
build.sh		build.sh
compare_versions.sh		compare_versions.sh
orso.png		orso.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Overview

Installation

Quick Start

Creating a DataFrame

Displaying Data

Working with Schema

Converting Between Formats

Features

API Reference

DataFrame Class

Properties

Development

Building from Source

Contributing

Performance Benchmarking

License

Status

Related Projects

About

Uh oh!

Releases 235

Uh oh!

Contributors 4

Uh oh!

Languages

License

mabel-dev/orso

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Quick Start

Creating a DataFrame

Displaying Data

Working with Schema

Converting Between Formats

Features

API Reference

DataFrame Class

Properties

Development

Building from Source

Contributing

Performance Benchmarking

License

Status

Related Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 235

Uh oh!

Contributors 4

Uh oh!

Languages