Skip to content

Conversation

@ajithme
Copy link
Contributor

@ajithme ajithme commented Dec 29, 2019

What changes were proposed in this pull request?

INSERT OVERWRITE LOCAL DIRECTORY is supported with ensuring the provided path is always using file:// as scheme and removing the check which throws exception if we do insert overwrite by mentioning directory with LOCAL syntax

Why are the changes needed?

without the modification in PR, insert overwrite local directory <location> using

throws exception

Error: org.apache.spark.sql.catalyst.parser.ParseException:

LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, pos 0)

which was introduced in #18975, but this restriction is not needed, hence dropping the same.
Keep behaviour consistent for local and remote file-system in INSERT OVERWRITE DIRECTORY

Does this PR introduce any user-facing change?

Yes, after this change INSERT OVERWRITE LOCAL DIRECTORY will not throw exception

How was this patch tested?

Added UT

@ajithme
Copy link
Contributor Author

ajithme commented Dec 29, 2019

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol .. I didn't know there's such way.

@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115948 has finished for PR 27039 at commit 8917ef7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajithme
Copy link
Contributor Author

ajithme commented Dec 30, 2019

retest this please

@SparkQA
Copy link

SparkQA commented Dec 30, 2019

Test build #115960 has finished for PR 27039 at commit f7486a8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajithme
Copy link
Contributor Author

ajithme commented Jan 8, 2020

gentle ping @HyukjinKwon @dongjoon-hyun

@dongjoon-hyun
Copy link
Member

Hi, @ajithme . Could you update your PR description? It seems that you always put some content before the first section, but it's not an Apache Spark's recommendation. You need to move all content before What changes were proposed in this pull request? into somewhere below.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-29174][SQL] LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source [SPARK-29174][SQL] Support LOCAL in INSERT OVERWRITE DIRECTORY to data source Jan 10, 2020
@ajithme
Copy link
Contributor Author

ajithme commented Jan 10, 2020

Hi, @ajithme . Could you update your PR description? It seems that you always put some content before the first section, but it's not an Apache Spark's recommendation. You need to move all content before What changes were proposed in this pull request? into somewhere below.

@dongjoon-hyun will ensure this, thanks for pointing out. I have updated the PR description

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajithme, Can you double check how Hive works if other scheme are given? The current code seems always forcing to local scheme, e.g., hdfs:/a/b/c will becomes file:/a/b/c. Does Hive works in this way too?

Copy link
Contributor Author

@ajithme ajithme Jan 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, i checked in Hive 2.3.5, it throws a exception if any other scheme is used apart form file:

success scenario

0: jdbc:hive2://QWE:10000/> insert overwrite local directory 'file:/tmp/sample' select 1;
No rows affected (1.436 seconds)
0: jdbc:hive2://QWE:10000/> insert overwrite local directory '/tmp/sample' select 1;
No rows affected (1.474 seconds)

fail scenario

Below is the stack from hive server on client

0: jdbc:hive2://QWE:10000/> insert overwrite local directory 'hdfs:///tmp/sample' select 1;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source file:/tmp/root/28572e2b-63d0-49fd-9938-85b9741c7413/hive_2020-01-17_14-32-02_389_528060968304222996-1/-mr-10000 to destination hdfs:/tmp/sample
        at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
        at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
        at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91)
        at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
        at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source file:/tmp/root/28572e2b-63d0-49fd-9938-85b9741c7413/hive_2020-01-17_14-32-02_389_528060968304222996-1/-mr-10000 to destination hdfs:/tmp/sample
        at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:104)
        at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:259)
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
        at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
        ... 11 more
Caused by: java.lang.IllegalArgumentException: Wrong FS: hdfs:/tmp/sample, expected: file:///
        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:665)
        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:86)
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:630)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1437)
        at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileFromDfsToLocal(MoveTask.java:143)
        at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:101)
        ... 20 more (state=08S01,code=1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Can i update the PR instead to follow Hive behavior i.e if a non local scheme is mentioned, the command fails.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated PR to align behaviour with Hive

@SparkQA
Copy link

SparkQA commented Jan 17, 2020

Test build #116943 has finished for PR 27039 at commit 6dfe0dc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajithme
Copy link
Contributor Author

ajithme commented Jan 27, 2020

gentle ping @HyukjinKwon

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118573 has finished for PR 27039 at commit 77f6b8e.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajithme
Copy link
Contributor Author

ajithme commented Feb 17, 2020

The doc generation error seems unrelated

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118575 has finished for PR 27039 at commit 77f6b8e.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118557 has finished for PR 27039 at commit 6dfe0dc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@ajithme
Copy link
Contributor Author

ajithme commented Feb 17, 2020

Rebased to latest master. Retest this please

@SparkQA
Copy link

SparkQA commented Feb 17, 2020

Test build #118584 has finished for PR 27039 at commit 6ddf84a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to master.

sjincho pushed a commit to sjincho/spark that referenced this pull request Apr 15, 2020
…a source

### What changes were proposed in this pull request?
`INSERT OVERWRITE LOCAL DIRECTORY` is supported with ensuring the provided path is always using `file://` as scheme and removing the check which throws exception if we do insert overwrite by mentioning directory with `LOCAL` syntax

### Why are the changes needed?
without the modification in PR, ``` insert overwrite local directory <location> using ```

throws exception

```
Error: org.apache.spark.sql.catalyst.parser.ParseException:

LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, pos 0)
```
which was introduced in apache#18975, but this restriction is not needed, hence dropping the same.
Keep behaviour consistent for local and remote file-system in  `INSERT OVERWRITE DIRECTORY`

### Does this PR introduce any user-facing change?
Yes, after this change `INSERT OVERWRITE LOCAL DIRECTORY` will not throw exception

### How was this patch tested?
Added UT

Closes apache#27039 from ajithme/insertoverwrite2.

Authored-by: Ajith <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants