[SPARK-25299] Implement default version of the API for shuffle writes #6

ifilonenko · 2019-03-14T02:17:17Z

What changes were proposed in this pull request?

Implement default version of the API that would be used across all shuffle writers, writes to local disk.

Shuffle Writes [2/6] [3/6] [5/6]

How was this patch tested?

Compiled and unit tests

ifilonenko · 2019-03-14T02:17:57Z

@mccheah @yifeih for initial opinions

mccheah

Initial review. But the behavior here is tricky - I think we'd get clearer validation that we're doing things correctly if we integrated this into one of the shuffle writers and ran the unit tests.

mccheah · 2019-03-15T23:05:27Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShufflePartitionWriter.java

Hmm, looking at this more carefully I don't think this is going to have the transferTo behavior that we expect. The reason for this is that Channels.newChannel knows how to make a FileChannel from a FileOutputStream specifically - it checks instanceOf. But here, Channels.newChannel would be passed a DefaultShuffleBlockOutputStream, which is not a FileOutputStream.

True, in its current form, it would return a WritableByteChannelImpl

mccheah · 2019-03-15T23:07:28Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleWriteSupport.java

Indentation should be 4 spaces from public.

mccheah · 2019-03-15T23:10:14Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShufflePartitionWriter.java

This is rare, but in this case I think it makes more sense for this to be a static inner class of DefaultMapOutputWriter - the interactions between the length stored here and usage in DefaultMapOutputWriter#commitAllPartitions would be clearer this way. But I'd like a second opinion from @yifeih.

Yeah, I am okay with either. I thought this was more isolated.

I don't have a strong preference either way, although I did find myself needed to go to the other class to understand this one, so placing it together could help.

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

mccheah · 2019-03-15T23:13:04Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

Closing an underlying output stream I believe implies flushing it, but double check this.

It does, very true. close() I believe always calls flush()

mccheah · 2019-03-15T23:13:52Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

Check both here and in the next clause that the outputTempFile exists first. No point logging a warning trying to delete a file that never existed.

mccheah · 2019-03-15T23:15:08Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

When aborting, we don't want cleanup to throw an exception that halts the rest of this method - we want to always be able to attempt to delete the files.

yifeih · 2019-03-19T18:42:54Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleBlockOutputStream.java

Why can't it just be a class with the implementation contents instead of an abstract class? Will there ever need to be other implementations of this?

yifeih · 2019-03-19T18:58:24Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

Do you always want to use a BufferedOutputStream here? BypassMergeSortShuffleWriter doesn't use it at all I believe, although I'm not sure if it's only because then it can support transferTo, so this might be better

yifeih · 2019-03-19T19:18:57Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

hmm should closing the outputBufferedFileStream close this stream as well?

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

yifeih · 2019-03-19T19:20:41Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShufflePartitionWriter.java

I don't have a strong preference either way, although I did find myself needed to go to the other class to understand this one, so placing it together could help.

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

mccheah · 2019-03-20T21:39:15Z

core/src/test/scala/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriterSuite.scala

        }
    })
+    when(shuffleExecutorComponents.writes()).thenReturn(writeSupport)
+    when(writeSupport.createMapOutputWriter(


Why not just use a real DefaultShuffleWriteSupport? This is an integration test so we can feel free to use real objects if they're on the critical path.

mccheah · 2019-03-20T21:57:13Z

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

-import java.io.FileInputStream;
-import java.io.FileOutputStream;
-import java.io.IOException;
+import java.io.*;


Don't use * import

mccheah · 2019-03-20T22:11:56Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+  public void commitAllPartitions() throws IOException {
+    cleanUp();
+    blockResolver.writeIndexFileAndCommit(shuffleId, mapId, partitionLengths, outputTempFile);
+    if (!successfulWrite) {


There's a number of ways this can be done, right? For instance, can't we check that the file doesn't exist, and create it? That might be easier to read than using a boolean flag.

That happens later? its easier to ensure that some form of write happened by either the byte stream or byte channel as an initial filter, I would think

The action of committing would create the file if any bytes were written. But if no bytes are written, there would be no output file, right? Would be good to verify these behaviors by running BypassMergeSortShuffleWriterSuite locally and seeing what happens.

mccheah · 2019-03-20T22:14:54Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+
+    @Override
+    public WritableByteChannel openChannel() throws IOException {
+      return Channels.newChannel(outputFileStream);


This is strange - we create the newChannel around the outputFileStream, but openStream returns the DefaultShuffleBlockOutputStream. This can lead to an inconsistency in the count returned by this writer, because if we return the FileChannel here, the counter from the DefaultShuffleBlockOutputStream will not be reflected.

I think most of this can be caught by running BypassMergeSortShuffleWriterSuite locally.

yifeih · 2019-03-20T21:57:30Z

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

      SparkConf conf,
-      ShuffleWriteMetricsReporter writeMetrics) {
+      ShuffleWriteMetricsReporter writeMetrics,
+      ShuffleExecutorComponents shuffleExecutorComponents) {


I think you can pass in just the ShuffleWriteSupport because you won't need the reader components here.

yifeih · 2019-03-20T22:16:31Z

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

+          if (file.exists()) {
+            FileInputStream in = new FileInputStream(file);
+            try {
+              Utils.copyStream(in, tempOutputStream, false, false);


So using Utils.copyStream actually assumes that the outputStream is a FileOutputStream in order to take advantage of the transferTo logic inside that function. The function also does its own bufferring with an array of a fixed size. I wonder if we should just have the DefaultShuffleMapOutputWriter just deal with generic FileOutputStreams and wrap them with buffered output writers if necessary? It would make the DefaultShuffleMapOutputWriter much cleaner too so we're not keeping track of so many layers of outputStreams

Actually, on second thought, you probably just want to fork the code paths here into a "transferTo" block and a "outputstreams" block to allow more flexibility in the plugins to return whatever outputstreams they want.

yifeih · 2019-03-20T22:17:47Z

core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala

      mapId: Int,
      context: TaskContext,
      metrics: ShuffleWriteMetricsReporter): ShuffleWriter[K, V] = {
+    initializeExecutor()


You dont' want to to do this at every getWriter call, but only once per executor. Since there's only one SortShuffleManager per executor, I think you can do it as a private variable in this class. You can get rid of the initialized flag then too.

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

mccheah · 2019-03-22T19:04:05Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+  private class DefaultShufflePartitionWriter implements ShufflePartitionWriter {
+
+    private final int partitionId;
+    private DefaultShuffleBlockOutputStream stream = null;


We might be able to rename these to something a bit clearer for their purpose - maybe partitionWriterStream and partitionWriterChannel. While it's more verbose, it makes it clear that these are only for writing the given partition - given that there's outside context that writes to the map output file. I got confused in https://github.com/bloomberg/apache-spark-on-k8s/pull/6/files#r268301953 because of the naming of these vars.

mccheah · 2019-03-22T19:06:46Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+        try {
+          channel.close();
+        } catch (Exception e) {
+          log.error("Error with closing channel for partition", e);


Shouldn't we throw if we fail to close here?

mccheah · 2019-03-22T19:07:33Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+        try {
+          stream.close();
+        } catch (Exception e) {
+          log.error("Error with closing stream for partition", e);


Similarly might want to consider throwing here (need to make sure that interrupting the control flow retains correctness though).

mccheah · 2019-03-22T19:08:30Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    }
+
+    if (outputFileStream != null) {
+      outputFileStream.close();


Does this close the file channel as well?

Yea, looking at the implementation of FileOutputStream.close(), calling close on the outputFileStream will also close the outputFileChannel

mccheah · 2019-03-22T19:09:37Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    if (!outputFile.delete() && outputFile.exists()) {
+      log.warn("Failed to delete outputshuffle file at {}", outputFile.getAbsolutePath());
+    }
+    cleanUp();


cleanup will close streams but the files will be deleted - will that throw an exception? I think what we want is something like this:

try { cleanup(); } catch (IOException e) { logger.error(...., e); } deleteFiles();

mccheah · 2019-03-22T19:11:06Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+
+  private void cleanUp() throws IOException {
+    if (outputBufferedFileStream != null) {
+      outputBufferedFileStream.close();


Closing the buffered output stream will close the underlying file stream. Is closing the same file output stream twice prone to throwing an error? If it doesn't throw an error than this is fine (would rather be explicit about all the resources we're closing).

yifeih

overall, looks great! left a few comments and questions

yifeih · 2019-03-22T22:35:42Z

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

-      partitionWriters[partitioner.getPartition(key)].write(key, record._2());
-    }
+    ShuffleMapOutputWriter mapOutputWriter = shuffleWriteSupport
+      .createMapOutputWriter(appId, shuffleId, mapId, numPartitions);


quick question: can the appId actually just be passed to the ShuffleMapOutputWriter through the ShuffleDataIO? It should be part of the sparkConf and shouldn't change in the executors right?

It can, but its doesn't add much computation besides the above call to getAppId() so it seems pretty unintrusive.
However, the API was built so that you call:

public ShuffleMapOutputWriter createMapOutputWriter( String appId, int shuffleId, int mapId, int numPartitions)

so I am a bit bounded by that :)

Yea I'm just wondering whether we need that in the API or not, since some implementations, like this refactor one that we're doing, don't necessarily need it, although all remote implementations might. @mccheah thoughts?

Yeah we can change the API, it should be passed through ShuffleDataIO - maybe ShuffleExecutorComponents#initialize?

yifeih · 2019-03-22T23:18:01Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+            "Failed to create shuffle file directory at %s.",
+            outputFile.getParentFile().getAbsolutePath()));
+      }
+      if (!outputFile.isFile() && !outputFile.createNewFile()) {


Wait, i'm confused by this block. You're checking whether the output file is there, and then trying to create it? Shouldn't you be throwing if it's not there regardless of whether you can create it or not because when blockResolver.writeIndexFileAndCommit() is done, the file should be there? But let me know if I'm misunderstanding this.

If the committer didn't create the file, we want to try to create an empty one by default, I think. But it's strange to me that shuffles can write empty files. Was this the case before our plugin refactor?

Oh i see... well, looking at this code, I don't think it actually writes an empty file if there are no records: https://github.com/bloomberg/apache-spark-on-k8s/pull/6/files#diff-8b6b7a5dadc0d8e97307d0f8e8378d8fL126

yifeih · 2019-03-22T23:35:33Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    }
+  }
+
+  private void cleanUp() throws IOException {


You're closing the streams, but I noticed that Channel also has a close() method. Do you need to call close() on the outputFileChannel too?

Heh I am wrong... closing the fileOutputStream closes the channel :P

yifeih · 2019-03-22T23:47:16Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    }
+
+    @Override
+    public void close() {


I wonder if we need some equivalent of flush() here, especially since the method we use to write to the WritableByteChannel is called Utils.copyFileStreamNIO, which is non-blocking, so I'm not sure we're guaranteed to have written everything by the time getCount() is called. I was trying to do some research on byteChannels and the closest thing I found was force()

@mccheah any ideas here, I am a bit worried about this being non-blocking as well

I looked at Utils.copyFileStreamNIO which doesn't have any indication that it's non-blocking, any idea why we think it is?

I assumed NIO stood for non-blocking IO. The javadoc for the underlying method that it's using, FileChannel.transferTo() also implies that the underlying channel could be non-blocking. https://docs.oracle.com/javase/7/docs/api/java/nio/channels/FileChannel.html#transferTo(long,%20long,%20java.nio.channels.WritableByteChannel)

@mccheah and I just discussed some of this offline.

The FileChannel.transferTo() implementation checks for whether the WriteableByteChannel is an instance of FileChannelImpl and behaves different if it isn't. Because we're not passing in a FileChannelImpl, this might actually behave in ways that we don't expect that deviate from the current codepath. Therefore, we shouldn't be wrapping the the FileChannel with another WritableByteChannel as we're currently doing.

This limits our choices in how we can get the lengths to return in DefaultShufflePartitionWriter.getLengths(). The simplest way seems to be to use FileChannel.position() calls to get the positions when instantiating the byte channel and when closing the byte channel in DefaultShufflePartitionWriter.getLengths().

Also, looking at the DefaultShufflePartitionWriter.getLengths() method, since we want to ensure that the shuffle partitions writers are all closed after getting the lengths, perhaps we should rename this method to close() so that it's clear what's should be happening (i.e. you can only get the lengths once everything is written)

yifeih · 2019-03-22T23:52:54Z

core/src/main/scala/org/apache/spark/util/Utils.scala

-         |after transferTo, please check your kernel version to see if it is 2.6.32,
-         |this is a kernel bug which will lead to unexpected behavior when using transferTo.
-         |You can set spark.file.transferTo = false to disable this NIO feature.
-           """.stripMargin)


read the links, and it seems to be a kernel bug? is this safe to delete?

There is no position in writeableByteChannels :/ soo... idk ... meep

If we keep this check it should be in the implementation of ShuffleMapOutputWriter I would guess. Maybe in close / abort / commit etc?

Perhaps i'm missing something, but how do you get the expected position in the ShuffleMapOutputWriter for the assert? If that's not easy to get, I would say it's better to keep this here with a if-statement to check if it's an instance of FileChannel to maintain the same behavior as before.

yifeih · 2019-03-22T23:57:39Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleExecutorComponents.java

+  public void intitializeExecutor(String appId, String execId) {
+    blockManager = SparkEnv.get().blockManager();
+    blockResolver = new IndexShuffleBlockResolver(sparkConf, blockManager);
+    metrics = TaskContext.get().taskMetrics();


wait, i don't think you want this here actually. The TaskContext is associated with each individual shuffle task, but you're only calling initializeExecutor once per executor. You want to get the metrics from TaskContext after you have the set the taskContext here (https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/Task.scala#L102) for each task

@mccheah if so, can we pass metrics into the ShuffleMapOutputWriter so that it can be properly mocked (can't mock Static methods).

There's a few ways you can do it. You can pass it in as you suggested. In this particular case, you can also call TaskContext.set(mockTaskContext) to initialize the static variable before the test. I prefer the latter way, but you could make a case for either.

Actually you might want to pass it through so that concurrent tests don't collide...

The problem is that TaskMetrics is currently marked as DeveloperApi which means it's questionable to pass it in to a public API. We could proposed some alternative metrics API that delegates to the Spark default TaskMetrics API. But I think for now we can use TaskContext#get from inside the writer and then in tests call TaskContext#setTaskContext. I took a closer look and a lot of tests use TaskContext#setTaskContext indicating they don't anticipate tests to be run in parallel in the same JVM.

Just make sure we call TaskContext#unset appropriate after each test.

…annel

core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java

mccheah · 2019-03-25T22:11:33Z

core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java

-  default WritableByteChannel openChannel() throws IOException {
-    return Channels.newChannel(openStream());
-  }
+  FileChannel openChannel() throws IOException;


This should still return WritableByteChannel, but the implementation returns FileChannel in DefaultShufflePartitionWriter.

The code in Utils#transferTo will eventually check instanceOf this WritableByteChannel and optimize on FileChannel accordingly,

Proposes the following changes to the API: - closeAndGetLength() is split into separate close() and getNumBytesWritten() operations. - openChannel and openStream renamed to toChannel and toStream Proposes the following changes to the implementation: - close() in the default implementation now persists the length in the partitionLengths array - getNumBytesWritten() doesn't necessitate the writer's resources to be closed ahead of it - Don't close the stream in BypassMergeSortShuffleWriter - only close it in DefaultShufflePartitionWriter#close (for consistency with how we treat channels)

ifilonenko · 2019-03-26T00:11:53Z

@yifeih for +1 :)

yifeih

looks great! just had a few small thoughts that might or might not be valid

core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java

yifeih · 2019-03-26T06:38:04Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+            "Failed to create shuffle file directory at %s.",
+            outputFile.getParentFile().getAbsolutePath()));
+      }
+      if (!outputFile.isFile() && !outputFile.createNewFile()) {


Oh i see... well, looking at this code, I don't think it actually writes an empty file if there are no records: https://github.com/bloomberg/apache-spark-on-k8s/pull/6/files#diff-8b6b7a5dadc0d8e97307d0f8e8378d8fL126

yifeih · 2019-03-26T06:44:27Z

core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java

+   * <p>
+   * Note that the default version of {@link #toChannel()} returns a {@link WritableByteChannel}
+   * that does not itself need to be closed up front; only the underlying output stream given by
+   * {@link #toStream()} must be closed.


I'm looking at the Channels.newChannel() implementation, and it uses ReadableByteChannelImpl, which all it seems to do is create an in-memory buffer, but it'll get garbage collected when the WritableByteChannel falls out of scope, so I think this is fine to not close that channel for now. However, if the Channels.newChannel() implementation changes to require additional resources, then we're out of luck, but that might not even be a valid concern?

We can cross that bridge when we get there - might be something to check for future versions of Java.

yifeih · 2019-03-26T06:50:09Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    }
+
+    if (outputFileStream != null) {
+      outputFileStream.close();


Yea, looking at the implementation of FileOutputStream.close(), calling close on the outputFileStream will also close the outputFileChannel

yifeih · 2019-03-26T06:51:15Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    }
+  }
+
+  private void cleanUp() throws IOException {


Heh I am wrong... closing the fileOutputStream closes the channel :P

yifeih · 2019-03-26T21:17:03Z

@ifilonenko I dug into the making of the temp file. If you look at https://github.com/apache/spark/blob/master/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java#L126, it's actually calling in with the dataTmp file set to null if there's nothing to iterate. This means that writeIndexFileAndCommit() is taking care of creating the temp file and whatever else it does.

To preserve functionality as closely as possibly (in case there are other side effects down that code path), it's probably better to set this.outputTempFile = null in the constructor and create it in getNextPartitionWriter() if it's null.

mccheah · 2019-03-26T22:17:52Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+    } catch (Exception e) {
+      log.error("Unable to close appropriate underlying file stream", e);
+    }
+    if (!outputTempFile.delete() && outputTempFile.exists()) {


Check for null

mccheah · 2019-03-26T22:20:22Z

+1 from me after #6 (comment)

mccheah · 2019-03-27T00:10:41Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+      if (!outputTempFile.delete() && outputTempFile.exists()) {
+        log.warn("Failed to delete temporary shuffle file at {}", outputTempFile.getAbsolutePath());
+      }
+      if (!outputFile.delete() && outputFile.exists()) {


Actually I don't think we want to delete here - a concurrent attempt shouldn't have its results removed if this attempt fails. I think this lines up with the behavior from the original shuffle writer.

yifeih · 2019-03-27T00:13:21Z

looks good from my end! thanks for working on this!

mccheah · 2019-03-27T00:24:23Z

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java

+      log.error("Unable to close appropriate underlying file stream", e);
+    }
+    if (outputTempFile != null) {
+      if (!outputTempFile.delete() && outputTempFile.exists()) {


Switch the ordering of the statements in this boolean condition.

…#6) Implement default version of the API that would be used across all shuffle writers, writes to local disk. Shuffle Writes [2/6] [3/6] [5/6]

@ifilonenko

…524) Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files. The logic in `BypassMergeSortShuffleWriter` has been refactored to use the base implementation of the plugin instead. APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers. Follow-up commits are to come for `SortShuffleWriter` and `UnsafeShuffleWriter`. Ported from bloomberg#6, credits to @ifilonenko.

@ifilonenko

…pache-spark-on-k8s#524) Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files. The logic in `BypassMergeSortShuffleWriter` has been refactored to use the base implementation of the plugin instead. APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers. Follow-up commits are to come for `SortShuffleWriter` and `UnsafeShuffleWriter`. Ported from #6, credits to @ifilonenko.

@ifilonenko

…524) Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files. The logic in `BypassMergeSortShuffleWriter` has been refactored to use the base implementation of the plugin instead. APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers. Follow-up commits are to come for `SortShuffleWriter` and `UnsafeShuffleWriter`. Ported from bloomberg#6, credits to @ifilonenko.

…enkins's test results ### What changes were proposed in this pull request? See https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109834/testReport/junit/org.apache.spark.sql/SQLQueryTestSuite/ ![Screen Shot 2019-08-28 at 4 08 58 PM](https://user-images.githubusercontent.com/6477701/63833484-2a23ea00-c9ae-11e9-91a1-0859cb183fea.png) ```xml <?xml version="1.0" encoding="UTF-8"?> <testsuite hostname="C02Y52ZLJGH5" name="org.apache.spark.sql.SQLQueryTestSuite" tests="3" errors="0" failures="0" skipped="0" time="14.475"> ... <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Scala UDF" time="6.703"> </testcase> <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Regular Python UDF" time="4.442"> </testcase> <testcase classname="org.apache.spark.sql.SQLQueryTestSuite" name="sql - Scalar Pandas UDF" time="3.33"> </testcase> <system-out/> <system-err/> </testsuite> ``` Root cause seems a bug in SBT - it truncates the test name based on the last dot. sbt/sbt#2949 https://github.com/sbt/sbt/blob/v0.13.18/testing/src/main/scala/sbt/JUnitXmlTestsListener.scala#L71-L79 I tried to find a better way but couldn't find. Therefore, this PR proposes a workaround by appending the test file name into the assert log: ```diff [info] - inner-join.sql *** FAILED *** (4 seconds, 306 milliseconds) + [info] inner-join.sql [info] Expected "1 a [info] 1 a [info] 1 b [info] 1[]", but got "1 a [info] 1 a [info] 1 b [info] 1[ b]" Result did not match for query #6 [info] SELECT tb.* FROM ta INNER JOIN tb ON ta.a = tb.a AND ta.tag = tb.tag (SQLQueryTestSuite.scala:377) [info] org.scalatest.exceptions.TestFailedException: [info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528) ``` It will at least prevent us to search full logs to identify which test file is failed by clicking filed test. Note that this PR does not fully fix the issue but only fix the logs on its failed tests. ### Why are the changes needed? To debug Jenkins logs easier. Otherwise, we should open full logs and search which test was failed. ### Does this PR introduce any user-facing change? It will print out the file name of failed tests in Jenkins' test reports. ### How was this patch tested? Manually tested but Jenkins tests are required in this PR. Now it at least shows which file it is: ![Screen Shot 2019-08-30 at 10 16 32 PM](https://user-images.githubusercontent.com/6477701/64023705-de22a200-cb73-11e9-8806-2e98ad35adef.png) Closes apache#25630 from HyukjinKwon/SPARK-28894-1. Authored-by: HyukjinKwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

mccheah suggested changes Mar 15, 2019

View reviewed changes

musings without unit tests

7160ce3

yifeih reviewed Mar 19, 2019

View reviewed changes

first pass of comments

460f0ea

ifilonenko force-pushed the spark-25299-master-2 branch from c977121 to 460f0ea Compare March 20, 2019 02:23

ifilonenko changed the base branch from SPARK-25299-master-1 to SPARK-25299-master March 20, 2019 02:23

ifilonenko changed the title ~~[SPARK-25299][WIP] Implement default version of the API for shuffle writes~~ [SPARK-25299] Implement default version of the API for shuffle writes Mar 20, 2019

mccheah reviewed Mar 20, 2019

View reviewed changes

yifeih reviewed Mar 20, 2019

View reviewed changes

ifilonenko added 2 commits March 22, 2019 11:40

resolve some comments

96d1774

small style choice with java.io.

64fb327

mccheah reviewed Mar 22, 2019

View reviewed changes

core/src/main/java/org/apache/spark/shuffle/sort/io/DefaultShuffleMapOutputWriter.java Outdated Show resolved Hide resolved

mccheah reviewed Mar 22, 2019

View reviewed changes

resolved more comments

996e903

yifeih reviewed Mar 22, 2019

View reviewed changes

ifilonenko added 5 commits March 22, 2019 18:34

initial unit tests of outputstream writing

3b9d33c

slight update

1f1c159

added tests and resolved comments

0737515

initial comments resolved

1ded83d

trying to change API to return FileChannel instead of WriteableByteCh…

3353155

…annel

mccheah reviewed Mar 25, 2019

View reviewed changes

core/src/main/java/org/apache/spark/api/shuffle/ShufflePartitionWriter.java Show resolved Hide resolved

mccheah reviewed Mar 25, 2019

View reviewed changes

ifilonenko and others added 3 commits March 25, 2019 15:15

small comments to resolve

7a79bd9

fix to issue of closing channel

9e3f05c

mccheah approved these changes Mar 26, 2019

View reviewed changes

small comment addiiton

14df750

yifeih reviewed Mar 26, 2019

View reviewed changes

resolving comments

8cf80f7

resolving comments

46a0174

mccheah reviewed Mar 26, 2019

View reviewed changes

added null check

1325903

mccheah reviewed Mar 27, 2019

View reviewed changes

mccheah mentioned this pull request Mar 27, 2019

[SPARK-25299] Local shuffle implementation of the shuffle writer API palantir/spark#524

Merged

ifilonenko merged commit fc03132 into SPARK-25299-master Mar 27, 2019

[SPARK-25299] Implement default version of the API for shuffle writes #6

[SPARK-25299] Implement default version of the API for shuffle writes #6

Uh oh!

Conversation

ifilonenko commented Mar 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

ifilonenko commented Mar 14, 2019

Uh oh!

mccheah left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifeih Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifeih Mar 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ifilonenko commented Mar 14, 2019 •

edited

Loading

yifeih Mar 20, 2019 •

edited

Loading

yifeih Mar 20, 2019 •

edited

Loading