Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Reflect review comments
  • Loading branch information
HeartSaVioR committed Oct 16, 2019
commit 2ff349bcba539c5d1e0ea40ac52fd4b4bf75c3b8
Original file line number Diff line number Diff line change
Expand Up @@ -293,7 +293,7 @@ class RollingEventLogFilesWriter(

import RollingEventLogFilesWriter._

private val eventFileMaxLength = sparkConf.get(EVENT_LOG_ROLLED_LOG_MAX_FILE_SIZE)
private val eventFileMaxLength = sparkConf.get(EVENT_LOG_ROLLING_MAX_FILE_SIZE)

private val logDirForAppPath = getAppEventLogDirPath(logBaseDir, appId, appAttemptId)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -180,16 +180,18 @@ package object config {
ConfigBuilder("spark.eventLog.longForm.enabled").booleanConf.createWithDefault(false)

private[spark] val EVENT_LOG_ENABLE_ROLLING =
ConfigBuilder("spark.eventLog.logRolling.enabled")
ConfigBuilder("spark.eventLog.rolling.enabled")
.doc("Whether rolling over event log files is enabled. If set to true, it cuts down " +
"each event log file to the configured size.")
.booleanConf
.createWithDefault(false)

private[spark] val EVENT_LOG_ROLLED_LOG_MAX_FILE_SIZE =
ConfigBuilder("spark.eventLog.logRolling.maxFileSize")
private[spark] val EVENT_LOG_ROLLING_MAX_FILE_SIZE =
ConfigBuilder("spark.eventLog.rolling.maxFileSize")
.doc("The max size of event log file to be rolled over.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry leaving a comment late like this but it should have been better to say this configuration is only effective when spark.eventLog.rolling.enabled is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there're counter examples in Spark configurations which rely on the fact once there's a configuration .enabled, others are effective only when that is enabled.

Even we only check from the SHS configuration, spark.history.fs.cleaner.*, spark.history.kerberos.*, spark.history.ui.acls.* fall into the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I tend to disagree with omitting such dependent configurations in their documentations. Can we add and link related configurations in the documentations?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Looks like we'll have to agree to disagree then. No one has privilege to make someone do the work under his/her authorship which he/she disagrees with - it will end up putting wrong authorship on commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HeartSaVioR, it had to be reviewed. I just happened to review and leave some comments late. Logically if that's not documented, how do users know what configuration is effective when? At least I had to read the codes to confirm.

Also, I am trying to make sure we're on the same page so I wouldn't happen to leave this comment again since you are a regular contributor. I don't think this is a good pattern to don't document the relationship between configurations. I am going to send an email to the dev list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon
Thanks for initiating the thread in dev mailing list. I'm following up the thread and will be back once we get some sort of consensus.

.bytesConf(ByteUnit.BYTE)
.checkValue(_ >= (1024 * 1024 * 10), "Max file size of event log should be configured to" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ByteUnit.MiB.toBytes(10)

" be at least 10 MiB.")
.createWithDefaultString("128m")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a side note for reviewer. This seems to aim to match with HDFS configuration, but, this might be too small with other cloud storage like S3. Actually, I observed frequently big log files with over 60GB (in plain text format). Although this is configurable, this can make over 600 log files for a single spark job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing info!

I'll comment about possible case of desiring to set max file size smaller: assuming we go with the plan - we would want to set up max number of event log files to retain, to address issue where storage quota is limited. (I believe I saw this requirement - limited storage quota - earlier in somewhere but can't remember.) In that case, "max file size" and "max number of event log files" work together to forecast the size of event log dir for the app (snapshot files should be considered so in reality it's going to be bigger than that), and to control the value finely, we may want to have smaller max file size.


private[spark] val EXECUTOR_ID =
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,14 +234,16 @@ class RollingEventLogFilesReaderSuite extends EventLogFileReadersSuite {

val conf = getLoggingConf(testDirPath, codecShortName)
conf.set(EVENT_LOG_ENABLE_ROLLING, true)
conf.set(EVENT_LOG_ROLLED_LOG_MAX_FILE_SIZE.key, "1k")
conf.set(EVENT_LOG_ROLLING_MAX_FILE_SIZE.key, "10m")

val writer = createWriter(appId, attemptId, testDirPath.toUri, conf,
SparkHadoopUtil.get.newConfiguration(conf))

writer.start()

writeTestEvents(writer, "dummy", 1024 * 2)
// write log more than 20m (intended to roll over to 3 files)
val dummyStr = "dummy" * 1024
writeTestEvents(writer, dummyStr, 1024 * 1024 * 20)

val logPathIncompleted = getCurrentLogPath(writer.logPath, isCompleted = false)
val readerOpt = EventLogFileReader(fileSystem,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ package org.apache.spark.deploy.history

import java.io.{File, FileOutputStream, IOException}
import java.net.URI
import java.nio.charset.StandardCharsets

import scala.collection.mutable
import scala.io.Source
Expand Down Expand Up @@ -300,31 +299,47 @@ class RollingEventLogFilesWriterSuite extends EventLogFileWritersSuite {

val conf = getLoggingConf(testDirPath, codecShortName)
conf.set(EVENT_LOG_ENABLE_ROLLING, true)
conf.set(EVENT_LOG_ROLLED_LOG_MAX_FILE_SIZE.key, "1k")
conf.set(EVENT_LOG_ROLLING_MAX_FILE_SIZE.key, "10m")

val writer = createWriter(appId, attemptId, testDirPath.toUri, conf,
SparkHadoopUtil.get.newConfiguration(conf))

writer.start()

// write log more than 2k (intended to roll over to 3 files)
val expectedLines = writeTestEvents(writer, "dummy", 1024 * 2)
// write log more than 20m (intended to roll over to 3 files)
val dummyStr = "dummy" * 1024
val expectedLines = writeTestEvents(writer, dummyStr, 1024 * 1024 * 21)

val logDirPath = getAppEventLogDirPath(testDirPath.toUri, appId, attemptId)

val eventLogFiles = listEventLogFiles(logDirPath)
assertEventLogFilesIndex(eventLogFiles, 3, 1024 * 1024)
assertEventLogFilesIndex(eventLogFiles, 3, 1024 * 1024 * 10)

writer.stop()

val eventLogFiles2 = listEventLogFiles(logDirPath)
assertEventLogFilesIndex(eventLogFiles2, 3, 1024 * 1024)
assertEventLogFilesIndex(eventLogFiles2, 3, 1024 * 1024 * 10)

verifyWriteEventLogFile(appId, attemptId, testDirPath.toUri,
codecShortName, expectedLines)
}
}

test(s"rolling event log files - the max size of event log file size less than lower limit") {
val appId = getUniqueApplicationId
val attemptId = None

val conf = getLoggingConf(testDirPath, None)
conf.set(EVENT_LOG_ENABLE_ROLLING, true)
conf.set(EVENT_LOG_ROLLING_MAX_FILE_SIZE.key, "9m")

val e = intercept[IllegalArgumentException] {
createWriter(appId, attemptId, testDirPath.toUri, conf,
SparkHadoopUtil.get.newConfiguration(conf))
}
assert(e.getMessage.contains("should be configured to be at least"))
}

override protected def createWriter(
appId: String,
appAttemptId: Option[String],
Expand Down
4 changes: 2 additions & 2 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -1009,15 +1009,15 @@ Apart from these, the following properties are also available, and may be useful
</td>
</tr>
<tr>
<td><code>spark.eventLog.logRolling.enabled</code></td>
<td><code>spark.eventLog.rolling.enabled</code></td>
<td>false</td>
<td>
Whether rolling over event log files is enabled. If set to true, it cuts down each event
log file to the configured size.
</td>
</tr>
<tr>
<td><code>spark.eventLog.logRolling.maxFileSize</code></td>
<td><code>spark.eventLog.rolling.maxFileSize</code></td>
<td>128m</td>
<td>
The max size of event log file before it's rolled over.
Expand Down