Skip to content

Use column projection during update#322

Merged
eddyxu merged 10 commits intomainfrom
lei/update_projection
Nov 21, 2022
Merged

Use column projection during update#322
eddyxu merged 10 commits intomainfrom
lei/update_projection

Conversation

@eddyxu
Copy link
Member

@eddyxu eddyxu commented Nov 21, 2022

Closes #319 and #321

@eddyxu eddyxu requested a review from changhiskhan November 21, 2022 16:47
@eddyxu eddyxu self-assigned this Nov 21, 2022
@eddyxu eddyxu added the c++ C++ issues label Nov 21, 2022
Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of questions

}

ARROW_ASSIGN_OR_RAISE(auto datum,
::arrow::compute::ExecuteScalarExpression(expression, *schema(), batch));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this bind the expression or is it required to be bound before the AddColumn method is called?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARROW_ASSIGN_OR_RAISE(arr, CreateArray(datum.scalar(), batch->num_rows()));
} else if (datum.is_chunked_array()) {
auto chunked_arr = datum.chunked_array();
ARROW_ASSIGN_OR_RAISE(arr, ::arrow::Concatenate(chunked_arr->chunks()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExtensionArray's cannot be concatenated currently - tho compute expressions won't either so ExtensionArray's probably won't make it past ExecuteScalarExpression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a test as follow up? But also, there is no function / kernel is available for extension types right now, this method might fail earlier.

std::move(new_field));
}

::arrow::Result<std::shared_ptr<LanceDataset>> LanceDataset::AddColumn(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again the problem here is if the compute expression contains aggregates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be checked via bool Expression::IsScalarExpression() const.

I can throw a invalid status from AddColumn

ARROW_ASSIGN_OR_RAISE(auto datum,
::arrow::compute::ExecuteScalarExpression(expression, *schema(), batch));
std::shared_ptr<::arrow::Array> arr;
if (datum.is_scalar()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so this is a constant literal value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is for case like AddColumn(field, pc::literal(1234)).

Copy link
Contributor

@changhiskhan changhiskhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be safe if you just add the check/raise on column aggregates

@eddyxu
Copy link
Member Author

eddyxu commented Nov 21, 2022

should be safe if you just add the check/raise on column aggregates

added IsScalarExpression check.

@eddyxu eddyxu merged commit 1035ed9 into main Nov 21, 2022
@eddyxu eddyxu deleted the lei/update_projection branch November 21, 2022 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ C++ issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Projection during appending columns

2 participants