Skip to content
Merged
Changes from 1 commit
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
3d567a3
[MINOR][SQL] Avoid unnecessary invocation on checkAndGlobPathIfNecessary
Ngone51 Oct 22, 2019
484f93e
[SPARK-29530][SQL] Make SQLConf in SQL parse process thread safe
AngersZhuuuu Oct 22, 2019
467c3f6
[SPARK-29529][DOCS] Remove unnecessary orc version and hive version i…
denglingang Oct 22, 2019
811d563
[SPARK-29536][PYTHON] Upgrade cloudpickle to 1.1.1 to support Python 3.8
HyukjinKwon Oct 22, 2019
868d851
[SPARK-29232][ML] Update the parameter maps of the DecisionTreeRegres…
huaxingao Oct 22, 2019
3163b6b
[SPARK-29516][SQL][TEST] Test ThriftServerQueryTestSuite asynchronously
wangyum Oct 22, 2019
bb49c80
[SPARK-21492][SQL] Fix memory leak in SortMergeJoin
xuanyuanking Oct 22, 2019
b4844ee
[SPARK-29517][SQL] TRUNCATE TABLE should look up catalog/table like v…
viirya Oct 22, 2019
8779938
[SPARK-28787][DOC][SQL] Document LOAD DATA statement in SQL Reference
huaxingao Oct 22, 2019
c1c6485
[SPARK-28793][DOC][SQL] Document CREATE FUNCTION in SQL Reference
dilipbiswal Oct 22, 2019
2036a8c
[SPARK-29488][WEBUI] In Web UI, stage page has js error when sort table
jennyinspur Oct 22, 2019
8009468
[SPARK-29556][CORE] Avoid putting request path in error response in E…
srowen Oct 22, 2019
3bf5355
[SPARK-29539][SQL] SHOW PARTITIONS should look up catalog/table like …
huaxingao Oct 22, 2019
f23c5d7
[SPARK-29560][BUILD] Add typesafe bintray repo for sbt-mima-plugin
dongjoon-hyun Oct 22, 2019
e674909
[SPARK-29107][SQL][TESTS] Port window.sql (Part 1)
DylanGuedes Oct 23, 2019
c128ac5
[SPARK-29511][SQL] DataSourceV2: Support CREATE NAMESPACE
imback82 Oct 23, 2019
8c34690
[SPARK-29546][TESTS] Recover jersey-guava test dependency in docker-i…
dongjoon-hyun Oct 23, 2019
cbe6ead
[SPARK-29352][SQL][SS] Track active streaming queries in the SparkSes…
brkyvz Oct 23, 2019
70dd9c0
[SPARK-29542][SQL][DOC] Make the descriptions of spark.sql.files.* be…
turboFei Oct 23, 2019
0a70951
[SPARK-29499][CORE][PYSPARK] Add mapPartitionsWithIndex for RDDBarrier
ConeyLiu Oct 23, 2019
df00b5c
[SPARK-29569][BUILD][DOCS] Copy and paste minified jquery instead whe…
HyukjinKwon Oct 23, 2019
53a5f17
[SPARK-29513][SQL] REFRESH TABLE should look up catalog/table like v2…
imback82 Oct 23, 2019
bfbf282
[SPARK-29503][SQL] Remove conversion CreateNamedStruct to CreateNamed…
HeartSaVioR Oct 23, 2019
7e8e4c0
[SPARK-29552][SQL] Execute the "OptimizeLocalShuffleReader" rule when…
JkSelf Oct 23, 2019
5867707
[SPARK-29557][BUILD] Update dropwizard/codahale metrics library to 3.2.6
LucaCanali Oct 23, 2019
b91356e
[SPARK-29533][SQL][TESTS][FOLLOWUP] Regenerate the result on EC2
dongjoon-hyun Oct 23, 2019
7ecf968
[SPARK-29567][TESTS] Update JDBC Integration Test Docker Images
dongjoon-hyun Oct 23, 2019
fd899d6
[SPARK-29576][CORE] Use Spark's CompressionCodec for Ser/Deser of Map…
dbtsai Oct 24, 2019
55ced9c
[SPARK-29571][SQL][TESTS][FOLLOWUP] Fix UT in AllExecutionsPageSuite
07ARB Oct 24, 2019
177bf67
[SPARK-29522][SQL] CACHE TABLE should look up catalog/table like v2 c…
viirya Oct 24, 2019
9e77d48
[SPARK-21492][SQL][FOLLOW UP] Reimplement UnsafeExternalRowSorter in …
xuanyuanking Oct 24, 2019
1296bbb
[SPARK-29504][WEBUI] Toggle full job description on click
PavithraRamachandran Oct 24, 2019
67cf043
[SPARK-29145][SQL] Support sub-queries in join conditions
AngersZhuuuu Oct 24, 2019
1ec1b2b
[SPARK-28791][DOC] Documentation for Alter table Command
PavithraRamachandran Oct 24, 2019
76d4beb
[SPARK-29559][WEBUI] Support pagination for JDBC/ODBC Server page
shahidki31 Oct 24, 2019
a35fb4f
[SPARK-29578][TESTS] Add "8634" as another skipped day for Kwajalein …
srowen Oct 24, 2019
cdea520
[SPARK-29532][SQL] Simplify interval string parsing
cloud-fan Oct 24, 2019
dcf5eaf
[SPARK-29444][FOLLOWUP] add doc and python parameter for ignoreNullFi…
Oct 24, 2019
92b2529
[SPARK-21287][SQL] Remove requirement of fetch_size>=0 from JDBCOptions
fuwhu Oct 24, 2019
dec99d8
[SPARK-29526][SQL] UNCACHE TABLE should look up catalog/table like v2…
imback82 Oct 24, 2019
40df9d2
[SPARK-29227][SS] Track rule info in optimization phase
wenxuanguan Oct 25, 2019
7417c3e
[SPARK-29597][DOCS] Deprecate old Java 8 versions prior to 8u92
dongjoon-hyun Oct 25, 2019
1474ed0
[SPARK-29562][SQL] Speed up and slim down metric aggregation in SQL l…
Oct 25, 2019
091cbc3
[SPARK-9612][ML] Add instance weight support for GBTs
zhengruifeng Oct 25, 2019
cfbdd9d
[SPARK-29461][SQL] Measure the number of records being updated for JD…
HeartSaVioR Oct 25, 2019
8bd8f49
[SPARK-29500][SQL][SS] Support partition column when writing to Kafka
redsk Oct 25, 2019
0cf4f07
[SPARK-29545][SQL] Add support for bit_xor aggregate function
yaooqinn Oct 25, 2019
68dca9a
[SPARK-29527][SQL] SHOW CREATE TABLE should look up catalog/table lik…
viirya Oct 25, 2019
ae5b60d
[SPARK-29182][CORE][FOLLOWUP] Cache preferred locations of checkpoint…
viirya Oct 25, 2019
2baf7a1
[SPARK-29608][BUILD] Add `hadoop-3.2` profile to release build
dongjoon-hyun Oct 25, 2019
2549391
[SPARK-29580][TESTS] Add kerberos debug messages for Kafka secure tests
gaborgsomogyi Oct 25, 2019
5bdc58b
[SPARK-27653][SQL][FOLLOWUP] Fix `since` version of `min_by/max_by`
dongjoon-hyun Oct 26, 2019
9a46702
[SPARK-29554][SQL] Add `version` SQL function
yaooqinn Oct 26, 2019
2115bf6
[SPARK-29490][SQL] Reset 'WritableColumnVector' in 'RowToColumnarExec'
marin-ma Oct 26, 2019
077fb99
[SPARK-29589][WEBUI] Support pagination for sqlstats session table in…
shahidki31 Oct 26, 2019
74514b4
[SPARK-29614][SQL][TEST] Fix failures of DateTimeUtilsSuite and Times…
MaxGekk Oct 27, 2019
a43b966
[SPARK-29613][BUILD][SS] Upgrade to Kafka 2.3.1
dongjoon-hyun Oct 27, 2019
b19fd48
[SPARK-29093][PYTHON][ML] Remove automatically generated param setter…
huaxingao Oct 28, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[SPARK-28787][DOC][SQL] Document LOAD DATA statement in SQL Reference
### What changes were proposed in this pull request?
Document LOAD DATA statement in SQL Reference

### Why are the changes needed?
To complete the SQL Reference

### Does this PR introduce any user-facing change?
Yes

### How was this patch tested?
Tested using jykyll build --serve

Here are the screen shots:

![image](https://user-images.githubusercontent.com/13592258/64073167-e7cd0800-cc4e-11e9-9fcc-92fe4cb5a942.png)

![image](https://user-images.githubusercontent.com/13592258/64073169-ee5b7f80-cc4e-11e9-9a36-cc023bcd32b1.png)

![image](https://user-images.githubusercontent.com/13592258/64073170-f4516080-cc4e-11e9-9101-2609a01fe6fe.png)

Closes apache#25522 from huaxingao/spark-28787.

Authored-by: Huaxin Gao <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
huaxingao authored and srowen committed Oct 22, 2019
commit 877993847c0baa016003639e16708373e57ca64b
103 changes: 100 additions & 3 deletions docs/sql-ref-syntax-dml-load.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
layout: global
title: LOAD
displayTitle: LOAD
title: LOAD DATA
displayTitle: LOAD DATA
license: |
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
Expand All @@ -19,4 +19,101 @@ license: |
limitations under the License.
---

**This page is under construction**
### Description
`LOAD DATA` statement loads the data into a table from the user specified directory or file. If a directory is specified then all the files from the directory are loaded. If a file is specified then only the single file is loaded. Additionally the `LOAD DATA` statement takes an optional partition specification. When a partition is specified, the data files (when input source is a directory) or the single file (when input source is a file) are loaded into the partition of the target table.

### Syntax
{% highlight sql %}
LOAD DATA [ LOCAL ] INPATH path [ OVERWRITE ] INTO TABLE table_name
[ PARTITION ( partition_col_name = partition_col_val [ , ... ] ) ]
{% endhighlight %}

### Parameters
<dl>
<dt><code><em>path</em></code></dt>
<dd>Path of the file system. It can be either an absolute or a relative path.</dd>
</dl>

<dl>
<dt><code><em>table_name</em></code></dt>
<dd>The name of an existing table.</dd>
</dl>

<dl>
<dt><code><em>PARTITION ( partition_col_name = partition_col_val [ , ... ] )</em></code></dt>
<dd>Specifies one or more partition column and value pairs.</dd>
</dl>

<dl>
<dt><code><em>LOCAL</em></code></dt>
<dd>If specified, it causes the <code>INPATH</code> to be resolved against the local file system, instead of the default file system, which is typically a distributed storage.</dd>
</dl>

<dl>
<dt><code><em>OVERWRITE</em></code></dt>
<dd>By default, new data is appended to the table. If <code>OVERWRITE</code> is used, the table is instead overwritten with new data.</dd>
</dl>

### Examples
{% highlight sql %}
-- Example without partition specification.
-- Assuming the students table has already been created and populated.
SELECT * FROM students;

+ -------------- + ------------------------------ + -------------- +
| name | address | student_id |
+ -------------- + ------------------------------ + -------------- +
| Amy Smith | 123 Park Ave, San Jose | 111111 |
+ -------------- + ------------------------------ + -------------- +

CREATE TABLE test_load (name VARCHAR(64), address VARCHAR(64), student_id INT);

-- Assuming the students table is in '/user/hive/warehouse/'
LOAD DATA LOCAL INPATH '/user/hive/warehouse/students' OVERWRITE INTO TABLE test_load;

SELECT * FROM test_load;

+ -------------- + ------------------------------ + -------------- +
| name | address | student_id |
+ -------------- + ------------------------------ + -------------- +
| Amy Smith | 123 Park Ave, San Jose | 111111 |
+ -------------- + ------------------------------ + -------------- +

-- Example with partition specification.
CREATE TABLE test_partition (c1 INT, c2 INT, c3 INT) USING HIVE PARTITIONED BY (c2, c3);

INSERT INTO test_partition PARTITION (c2 = 2, c3 = 3) VALUES (1);

INSERT INTO test_partition PARTITION (c2 = 5, c3 = 6) VALUES (4);

INSERT INTO test_partition PARTITION (c2 = 8, c3 = 9) VALUES (7);

SELECT * FROM test_partition;

+ ------- + ------- + ----- +
| c1 | c2 | c3 |
+ ------- + --------------- +
| 1 | 2 | 3 |
+ ------- + ------- + ----- +
| 4 | 5 | 6 |
+ ------- + ------- + ----- +
| 7 | 8 | 9 |
+ ------- + ------- + ----- +

CREATE TABLE test_load_partition (c1 INT, c2 INT, c3 INT) USING HIVE PARTITIONED BY (c2, c3);

-- Assuming the test_partition table is in '/user/hive/warehouse/'
LOAD DATA LOCAL INPATH '/user/hive/warehouse/test_partition/c2=2/c3=3'
OVERWRITE INTO TABLE test_load_partition PARTITION (c2=2, c3=3);

SELECT * FROM test_load_partition;

+ ------- + ------- + ----- +
| c1 | c2 | c3 |
+ ------- + --------------- +
| 1 | 2 | 3 |
+ ------- + ------- + ----- +


{% endhighlight %}