-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-28787][DOC][SQL]Document LOAD DATA statement in SQL Reference #25522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #109464 has finished for PR 25522 at commit
|
|
@huaxingao Could you please attach a screenshot of the page ? |
docs/sql-ref-syntax-dml-load.md
Outdated
|
|
||
| **This page is under construction** | ||
| ### Description | ||
| The LOAD DATA statement can be used to load data from a file into a table or a partition in the table. The target table must not be temporary. A partition spec must be provided if and only if the target table is partitioned. The LOAD DATA statement is only supported for tables created using the Hive format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back-tick the reserved words for clarity.
Nit: I'd write "..., or into a partition ..." to make sure it doesn't suggest it's the alternative to "from a file"
docs/sql-ref-syntax-dml-load.md
Outdated
| One or more partition column name and value pairs. | ||
|
|
||
| ##### ***LOCAL***: | ||
| If specified, local file system is used. Otherwise, the default file system is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say that it causes the INPATH to be resolved against the local file system, instead of the default file system, which is typically distributed storage.
|
Test build #109520 has finished for PR 25522 at commit
|
|
Test build #109522 has finished for PR 25522 at commit
|
|
updated |
|
Test build #109535 has finished for PR 25522 at commit
|
|
Test build #109932 has finished for PR 25522 at commit
|
|
@dilipbiswal @srowen @gatorsmile |
docs/sql-ref-syntax-dml-load.md
Outdated
|
|
||
| **This page is under construction** | ||
| ### Description | ||
| `LOAD DATA` loads data from a directoy or a file into a table or into a partition in the table. A partition spec should be specified whenever the target table is partitioned. The `LOAD DATA` statement can only be used with tables created using the Hive format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao What if we say it this way ? Please feel free to change it the way you see fit.
LOAD DATA statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the LOAD statement takes an optional partition specication. When a partiion-spec is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the target table.
docs/sql-ref-syntax-dml-load.md
Outdated
| ### Parameters | ||
| <dl> | ||
| <dt><code><em>path</em></code></dt> | ||
| <dd>Path of the file system.</dd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao I was looking up the hive documentation. They mention about load deleting the data from source directory i.e they do a move operation. I think this is a important thing to say if that is the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huaxingao Also seems like this path can be both an absolute path or relative path. Do we need to mention that here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dilipbiswal I checked, actually the data in the source directory is not deleted.
|
Test build #109993 has finished for PR 25522 at commit
|
|
load data should have limitation also because when user gives as below |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dilipbiswal any final comments, based on your other reviews? this is looking OK.
|
@srowen Actually this pr may need a minor clarification. I have already discussed with Huaxin. We just need to clarify the "move" vs "copy" behavior. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the move vs copy issue?
docs/sql-ref-syntax-dml-load.md
Outdated
|
|
||
| **This page is under construction** | ||
| ### Description | ||
| `LOAD DATA` statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the `LOAD DATA` statement takes an optional partition specification. When a partition is specified, the data files ( when input source is a directory ) or the single file ( when input source is a file ) are loaded into the partition of the target table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove spaces inside parentheses
If we see https://cwiki.apache.org/confluence/display/Hive/GettingStarted and look for "LOAD DATA" command, we see the following comments under
The question i had was , "should we document our exact behaviour" i.e do we move the data from original location to the target location vs do we copy ? Can we move ahead on this PR as is and clarify it in a follow-up ? |
|
Looks OK as-is then; I have one minor comment above otherwise. |
|
@srowen Thanks a lot. |
b29223f to
d7fec55
Compare
|
Test build #112410 has finished for PR 25522 at commit
|
|
Merged to master |
|
Thanks a lot! @srowen @dilipbiswal |
What changes were proposed in this pull request?
Document LOAD DATA statement in SQL Reference
Why are the changes needed?
To complete the SQL Reference
Does this PR introduce any user-facing change?
Yes
How was this patch tested?
Tested using jykyll build --serve
Here are the screen shots: