Skip to content

Conversation

@gatorsmile
Copy link
Member

What changes were proposed in this pull request?

  • The original comment about updateDriverMetrics is not right.
  • Refactor the code to ensure selectedPartitions has been set before sending the driver-side metrics.
  • Restore the original name, which is more general and extendable.

How was this patch tested?

The existing tests.

}

/**
* Send the updated metrics to driver, while this function calling, selectedPartitions has
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong.

Map("numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"),
"numFiles" -> SQLMetrics.createMetric(sparkContext, "number of files"),
"fileListingTime" -> SQLMetrics.createMetric(sparkContext, "file listing time (ms)"),
"metadataTime" -> SQLMetrics.createMetric(sparkContext, "metadata time"),
Copy link
Member Author

@gatorsmile gatorsmile Dec 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the original name is more straightfoward to end users who has no idea about file listing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we will add more metadata operation and reuse this metrics. Anyway it was not a good idea to do the renaming in a bug fix PR. @xuanyuanking can you create a ticket and send a new PR for renaming?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy that, original thinking is this bug fix is part of https://issues.apache.org/jira/browse/SPARK-26222.

case _ =>
createNonBucketedReadRDD(readFile, selectedPartitions, relation)
}
sendDriverMetrics()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 313 and line 315, both are calling selectedPartitions. Thus, it is safer to say selectedPartitions is initialized before we send the driver-side metrics

@gatorsmile
Copy link
Member Author

cc @cloud-fan @xuanyuanking

@cloud-fan
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100208 has finished for PR 23328 at commit ec1b30c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100209 has finished for PR 23328 at commit e9f75b9.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val optimizerMetadataTimeNs = relation.location.metadataOpsTimeNs.getOrElse(0L)
val startTime = System.nanoTime()
val ret = relation.location.listFiles(partitionFilters, dataFilters)
driverMetrics("filesNum") = ret.map(_.files.size.toLong).sum
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @gatorsmile . It looks like a typo of numFiles.

@SparkQA
Copy link

SparkQA commented Dec 17, 2018

Test build #100217 has finished for PR 23328 at commit b90f47a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member Author

Thanks! Merged to master.

@asfgit asfgit closed this in 5960a82 Dec 17, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…cs name

## What changes were proposed in this pull request?

- The original comment about `updateDriverMetrics` is not right.
- Refactor the code to ensure `selectedPartitions `  has been set before sending the driver-side metrics.
- Restore the original name, which is more general and extendable.

## How was this patch tested?
The existing tests.

Closes apache#23328 from gatorsmile/followupSpark-26142.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…cs name

## What changes were proposed in this pull request?

- The original comment about `updateDriverMetrics` is not right.
- Refactor the code to ensure `selectedPartitions `  has been set before sending the driver-side metrics.
- Restore the original name, which is more general and extendable.

## How was this patch tested?
The existing tests.

Closes apache#23328 from gatorsmile/followupSpark-26142.

Authored-by: gatorsmile <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants