Skip to content

Conversation

@amaliujia
Copy link
Contributor

@amaliujia amaliujia commented Oct 12, 2022

What changes were proposed in this pull request?

  1. Sync newest proto to python client.
  2. Update Aggregate to match proto change.
  3. Change select to have it accept both column and str
  4. Make sure * pass through the entire path which has been implemented on the server side [SPARK-40587][CONNECT] Support SELECT * in an explicit way in connect proto #38023

Why are the changes needed?

Update python client side to match the change in connect proto.

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@amaliujia
Copy link
Contributor Author

amaliujia commented Oct 12, 2022

R: @HyukjinKwon @zhengruifeng

Copy link
Contributor Author

@amaliujia amaliujia Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best is to support dataframe.schema then check the schema. However the StructType support in proto is under development: #38200, and there could be other blocking issues there.

Once schema is supported in proto, I will follow up to use .schema on python side whenever it makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it can not support selecting struct.* now?

Copy link
Contributor Author

@amaliujia amaliujia Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No not yet. That is on TODO list. We need proto change accordingly to support it.

Right now we don't even have a complete Struct support yet :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did this not break before in the mypy checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. This breaks when code is running.

I should ask you probably as I remember you enabled mypy :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it can not support selecting struct.* now?

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@amaliujia amaliujia changed the title [SPARK-40587][CONNECT][FOLLOW] Make sure python client support select * [SPARK-40587][CONNECT][FOLLOW-UP] Make sure python client support select * Oct 14, 2022
@amaliujia amaliujia force-pushed the select_start_in_python branch from 704b953 to 1af9f98 Compare October 14, 2022 04:21
Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

@amaliujia amaliujia force-pushed the select_start_in_python branch 2 times, most recently from 33b16a6 to fefc9cb Compare October 19, 2022 02:11
@amaliujia amaliujia force-pushed the select_start_in_python branch from fefc9cb to b9b9bb3 Compare October 19, 2022 03:08
for c in self._raw_columns:
if isinstance(c, Expression):
proj_exprs.append(c.to_plan(session))
elif c == "*":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about t1.*?

Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as this is a followup, but we need to think more about how to support star completely in spark connect.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 3c5bc21 Oct 19, 2022
SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
…ect *

### What changes were proposed in this pull request?

1. Sync newest proto to python client.
2. Update Aggregate to match proto change.
3. Change `select` to have it accept both `column` and `str`
4. Make sure `*` pass through the entire path which has been implemented on the server side apache#38023

### Why are the changes needed?

Update python client side to match the change in connect proto.

### Does this PR introduce _any_ user-facing change?

No
### How was this patch tested?

UT

Closes apache#38218 from amaliujia/select_start_in_python.

Authored-by: Rui Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants