-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29174][SQL] Support LOCAL in INSERT OVERWRITE DIRECTORY to data source #27039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol .. I didn't know there's such way.
|
ok to test |
|
Test build #115948 has finished for PR 27039 at commit
|
|
retest this please |
|
Test build #115960 has finished for PR 27039 at commit
|
|
gentle ping @HyukjinKwon @dongjoon-hyun |
|
Hi, @ajithme . Could you update your PR description? It seems that you always put some content before the first section, but it's not an Apache Spark's recommendation. You need to move all content before |
@dongjoon-hyun will ensure this, thanks for pointing out. I have updated the PR description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajithme, Can you double check how Hive works if other scheme are given? The current code seems always forcing to local scheme, e.g., hdfs:/a/b/c will becomes file:/a/b/c. Does Hive works in this way too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, i checked in Hive 2.3.5, it throws a exception if any other scheme is used apart form file:
success scenario
0: jdbc:hive2://QWE:10000/> insert overwrite local directory 'file:/tmp/sample' select 1;
No rows affected (1.436 seconds)
0: jdbc:hive2://QWE:10000/> insert overwrite local directory '/tmp/sample' select 1;
No rows affected (1.474 seconds)
fail scenario
Below is the stack from hive server on client
0: jdbc:hive2://QWE:10000/> insert overwrite local directory 'hdfs:///tmp/sample' select 1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source file:/tmp/root/28572e2b-63d0-49fd-9938-85b9741c7413/hive_2020-01-17_14-32-02_389_528060968304222996-1/-mr-10000 to destination hdfs:/tmp/sample
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source file:/tmp/root/28572e2b-63d0-49fd-9938-85b9741c7413/hive_2020-01-17_14-32-02_389_528060968304222996-1/-mr-10000 to destination hdfs:/tmp/sample
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:104)
at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:259)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
... 11 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs:/tmp/sample, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:630)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1437)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileFromDfsToLocal(MoveTask.java:143)
at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:101)
... 20 more (state=08S01,code=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Can i update the PR instead to follow Hive behavior i.e if a non local scheme is mentioned, the command fails.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR to align behaviour with Hive
f7486a8 to
6dfe0dc
Compare
|
Test build #116943 has finished for PR 27039 at commit
|
|
gentle ping @HyukjinKwon |
|
retest this please |
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
|
Test build #118573 has finished for PR 27039 at commit
|
|
The doc generation error seems unrelated |
|
retest this please |
|
Test build #118575 has finished for PR 27039 at commit
|
|
Test build #118557 has finished for PR 27039 at commit
|
77f6b8e to
6ddf84a
Compare
|
Rebased to latest master. Retest this please |
|
Test build #118584 has finished for PR 27039 at commit
|
|
Merged to master. |
…a source ### What changes were proposed in this pull request? `INSERT OVERWRITE LOCAL DIRECTORY` is supported with ensuring the provided path is always using `file://` as scheme and removing the check which throws exception if we do insert overwrite by mentioning directory with `LOCAL` syntax ### Why are the changes needed? without the modification in PR, ``` insert overwrite local directory <location> using ``` throws exception ``` Error: org.apache.spark.sql.catalyst.parser.ParseException: LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, pos 0) ``` which was introduced in apache#18975, but this restriction is not needed, hence dropping the same. Keep behaviour consistent for local and remote file-system in `INSERT OVERWRITE DIRECTORY` ### Does this PR introduce any user-facing change? Yes, after this change `INSERT OVERWRITE LOCAL DIRECTORY` will not throw exception ### How was this patch tested? Added UT Closes apache#27039 from ajithme/insertoverwrite2. Authored-by: Ajith <[email protected]> Signed-off-by: HyukjinKwon <[email protected]>
What changes were proposed in this pull request?
INSERT OVERWRITE LOCAL DIRECTORYis supported with ensuring the provided path is always usingfile://as scheme and removing the check which throws exception if we do insert overwrite by mentioning directory withLOCALsyntaxWhy are the changes needed?
without the modification in PR,
insert overwrite local directory <location> usingthrows exception
which was introduced in #18975, but this restriction is not needed, hence dropping the same.
Keep behaviour consistent for local and remote file-system in
INSERT OVERWRITE DIRECTORYDoes this PR introduce any user-facing change?
Yes, after this change
INSERT OVERWRITE LOCAL DIRECTORYwill not throw exceptionHow was this patch tested?
Added UT