[SPARK-28787][DOC][SQL]Document LOAD DATA statement in SQL Reference #25522

huaxingao · 2019-08-21T05:15:50Z

What changes were proposed in this pull request?

Document LOAD DATA statement in SQL Reference

Why are the changes needed?

To complete the SQL Reference

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Tested using jykyll build --serve

Here are the screen shots:

SparkQA · 2019-08-21T05:30:55Z

Test build #109464 has finished for PR 25522 at commit efea0b8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dilipbiswal · 2019-08-21T17:51:17Z

@huaxingao Could you please attach a screenshot of the page ?

srowen · 2019-08-21T18:22:01Z

docs/sql-ref-syntax-dml-load.md


-**This page is under construction**
+### Description
+The LOAD DATA statement can be used to load data from a file into a table or a partition in the table. The target table must not be temporary. A partition spec must be provided if and only if the target table is partitioned. The LOAD DATA statement is only supported for tables created using the Hive format.


Back-tick the reserved words for clarity.
Nit: I'd write "..., or into a partition ..." to make sure it doesn't suggest it's the alternative to "from a file"

srowen · 2019-08-21T18:22:42Z

docs/sql-ref-syntax-dml-load.md

+One or more partition column name and value pairs.
+
+##### ***LOCAL***:
+If specified, local file system is used. Otherwise, the default file system is used.


Maybe say that it causes the INPATH to be resolved against the local file system, instead of the default file system, which is typically distributed storage.

SparkQA · 2019-08-21T19:23:30Z

Test build #109520 has finished for PR 25522 at commit 19deb6d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-21T19:52:43Z

Test build #109522 has finished for PR 25522 at commit c680337.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2019-08-21T23:30:25Z

updated

SparkQA · 2019-08-21T23:45:33Z

Test build #109535 has finished for PR 25522 at commit 08ba5f6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-08-30T02:30:23Z

Test build #109932 has finished for PR 25522 at commit 8b23eae.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

huaxingao · 2019-08-30T19:40:54Z

@dilipbiswal @srowen @gatorsmile
Could you please review? Thanks!

dilipbiswal · 2019-08-31T20:33:53Z

docs/sql-ref-syntax-dml-load.md


-**This page is under construction**
+### Description
+`LOAD DATA` loads data from a directoy or a file into a table or into a partition in the table. A partition spec should be specified whenever the target table is partitioned. The `LOAD DATA` statement can only be used with tables created using the Hive format.


@huaxingao What if we say it this way ? Please feel free to change it the way you see fit.

LOAD DATA statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the LOAD statement takes an optional partition specication. When a partiion-spec is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the target table.

dilipbiswal · 2019-08-31T20:35:50Z

docs/sql-ref-syntax-dml-load.md

+### Parameters
+<dl>
+  <dt><code><em>path</em></code></dt>
+  <dd>Path of the file system.</dd>


@huaxingao I was looking up the hive documentation. They mention about load deleting the data from source directory i.e they do a move operation. I think this is a important thing to say if that is the behaviour.

@huaxingao Also seems like this path can be both an absolute path or relative path. Do we need to mention that here ?

@dilipbiswal I checked, actually the data in the source directory is not deleted.

docs/sql-ref-syntax-dml-load.md

SparkQA · 2019-09-01T07:40:14Z

Test build #109993 has finished for PR 25522 at commit b29223f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AbhishekNew · 2019-09-10T05:14:31Z

load data should have limitation also because when user gives as below
load data local inpath '/opt/abhi/test/_typeddate.txt' into table wild1;
command is success but data will not be loaded in table because Hadoop treat this as Hidden File.
load from local or hdfs same behavior.

srowen

@dilipbiswal any final comments, based on your other reviews? this is looking OK.

dilipbiswal · 2019-09-17T16:06:43Z

@srowen Actually this pr may need a minor clarification. I have already discussed with Huaxin. We just need to clarify the "move" vs "copy" behavior.

srowen

What's the move vs copy issue?

srowen · 2019-10-21T18:46:30Z

docs/sql-ref-syntax-dml-load.md


-**This page is under construction**
+### Description
+`LOAD DATA` statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the `LOAD DATA` statement takes an optional partition specification. When a partition is specified, the data files ( when input source is a directory ) or the single file ( when input source is a file ) are loaded into the partition of the target table.


Nit: remove spaces inside parentheses

dilipbiswal · 2019-10-21T20:12:02Z

@srowen

What's the move vs copy issue?

If we see https://cwiki.apache.org/confluence/display/Hive/GettingStarted and look for "LOAD DATA" command, we see the following comments under NOTES
NO verification of data against the schema is performed by the load command.

If the file is in hdfs, it is moved into the Hive-controlled file system namespace.
The root of the Hive directory is specified by the option hive.metastore.warehouse.dir in hive-default.xml. We advise users to create this directory before trying to create tables via Hive.

The question i had was , "should we document our exact behaviour" i.e do we move the data from original location to the target location vs do we copy ? Can we move ahead on this PR as is and clarify it in a follow-up ?

srowen · 2019-10-21T20:17:03Z

Looks OK as-is then; I have one minor comment above otherwise.

dilipbiswal · 2019-10-21T20:19:53Z

@srowen Thanks a lot.
cc @huaxingao

SparkQA · 2019-10-21T20:53:55Z

Test build #112410 has finished for PR 25522 at commit d7fec55.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2019-10-22T13:55:43Z

Merged to master

huaxingao · 2019-10-22T14:49:47Z

Thanks a lot! @srowen @dilipbiswal

dongjoon-hyun added DOCUMENTATION SQL labels Aug 21, 2019

srowen requested changes Aug 21, 2019

View reviewed changes

huaxingao force-pushed the spark-28787 branch from 08ba5f6 to 8b23eae Compare August 30, 2019 02:07

dilipbiswal reviewed Aug 31, 2019

View reviewed changes

docs/sql-ref-syntax-dml-load.md Show resolved Hide resolved

srowen reviewed Sep 17, 2019

View reviewed changes

srowen reviewed Oct 21, 2019

View reviewed changes

huaxingao added 7 commits October 21, 2019 13:39

[SPARK-28787][DOC][SQL]Document LOAD DATA statement in SQL Reference

1c2a0fa

address comments

5c67ad1

add one more back-tick

d79b85b

fix issues

14e92a9

rebase and fix a few issues

750c4fb

address comments

8b254d1

remove spaces inside parentheses

d7fec55

huaxingao force-pushed the spark-28787 branch from b29223f to d7fec55 Compare October 21, 2019 20:41

srowen closed this in 8779938 Oct 22, 2019

huaxingao deleted the spark-28787 branch October 22, 2019 14:49

[SPARK-28787][DOC][SQL]Document LOAD DATA statement in SQL Reference #25522

[SPARK-28787][DOC][SQL]Document LOAD DATA statement in SQL Reference #25522

Uh oh!

Conversation

huaxingao commented Aug 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

SparkQA commented Aug 21, 2019

Uh oh!

dilipbiswal commented Aug 21, 2019

Uh oh!

srowen Aug 21, 2019

Choose a reason for hiding this comment

Uh oh!

srowen Aug 21, 2019

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Aug 21, 2019

Uh oh!

SparkQA commented Aug 21, 2019

Uh oh!

huaxingao commented Aug 21, 2019

Uh oh!

SparkQA commented Aug 21, 2019

Uh oh!

SparkQA commented Aug 30, 2019

Uh oh!

huaxingao commented Aug 30, 2019

Uh oh!

dilipbiswal Aug 31, 2019

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Aug 31, 2019

Choose a reason for hiding this comment

Uh oh!

dilipbiswal Aug 31, 2019

Choose a reason for hiding this comment

Uh oh!

huaxingao Sep 1, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SparkQA commented Sep 1, 2019

Uh oh!

AbhishekNew commented Sep 10, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

dilipbiswal commented Sep 17, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen Oct 21, 2019

Choose a reason for hiding this comment

Uh oh!

dilipbiswal commented Oct 21, 2019

Uh oh!

srowen commented Oct 21, 2019

Uh oh!

dilipbiswal commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SparkQA commented Oct 21, 2019

Uh oh!

srowen commented Oct 22, 2019

Uh oh!

huaxingao commented Oct 22, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

huaxingao commented Aug 21, 2019 •

edited

Loading

dilipbiswal commented Oct 21, 2019 •

edited

Loading