Heterogeneous column types

So, I thought I'd start opening up issues to enable discussion of individual dataframe features we'd like to see. I'd like to start with 'heterogeneous column types': the ability to have a dataframe with columns of different types.

In looking through existing WIPs in #4, I came across a few different methods of implementing this:
1. Using an `enum` for either a column or for individual values. [utah](https://github.com/kernelmachine/utah) (and really any arbitrarily-typed dataframe library) can house enums as values, which allows you to mix types however you want (even within the same column), at the cost of run-time type safety and some performance. I didn't see any library currently use column-based enums, but I could see having something like 
```rust 
enum Column { 
    Float(Vec<f64>),
    Int(Vec<i64>),
    Text(Vec<String>), 
}
```
and in fact did it this way in an early version of `agnes`.

2. Using `Any`-based storage, along with some metadata for relating columns to data types at run-time. Used by [rust-dataframe](https://github.com/nevi-me/rust-dataframe) and [black-jack](https://github.com/milesgranger/black-jack).
3. Using cons-lists to provide compile-time type safety. Used by [agnes](https://github.com/agnes-rs/agnes) and [frames](https://github.com/jesskfullwood/frames).

Each of these has its own advantages and disadvantages. For example, *1* and *2* lack compile-time type-checking, but have much cleaner type signatures and potentially cleaner error messages than *3* (where you have something like `DataFrame<Cons<usize, Cons<f64, Cons<String, Nil>>>>` for a relatively simple three-column dataframe).

You could also have a combination of the above techniques -- I could see something like cons-list type-checking column metadata structure while the data itself is stored in some sort of `Any`-based structure.

I'm personally a fan of the compile-time type-checking that cons-lists provide, but they can be hard to work with for those unfamiliar with them. I've started work on a [labeled cons-list library](https://github.com/agnes-rs/lhlist) (which will replace the one I'm using in `agnes`) to hopefully help out with some of these issues.

What are everyone's thoughts / opinions? Are there other options we should consider? I'd love to hear what people think the best approach is!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heterogeneous column types #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Heterogeneous column types #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions