Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Apr 1, 2020

What changes were proposed in this pull request?

Why are the changes needed?

Before:

Save dates to ORC:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop                                   8410           8410           0         11.9          84.1       1.0X
before 1582, noop                                  8408           8408           0         11.9          84.1       1.0X
after 1582                                        15507          15507           0          6.4         155.1       0.5X
before 1582                                       15099          15099           0          6.6         151.0       0.6X

Load dates from ORC:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off                               26858          27479         727          3.7         268.6       1.0X

After:

Save dates to ORC:                        Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop                                   7518           7518           0         13.3          75.2       1.0X
before 1582, noop                                  7292           7292           0         13.7          72.9       1.0X
after 1582                                        13286          13286           0          7.5         132.9       0.6X
before 1582                                       16213          16213           0          6.2         162.1       0.5X

Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Load dates from ORC:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off                               17867          17918          65          5.6         178.7       1.0X

Does this PR introduce any user-facing change?

How was this patch tested?

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 1, 2020

@cloud-fan @HyukjinKwon This optimization requires to restore ugly code from Spark 2.4. I am not sure that it is worth

Maybe, it makes sense for the read path but not for write. before 1582 , it becomes slower.

tf.format(us)
}

private def millisToDays(millisUtc: Long, timeZone: TimeZone): SQLDate = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we use microsToDays?

after 1582, vec on 3813 3844 28 26.2 38.1 5.5X
before 1582, vec off 25912 25949 38 3.9 259.1 0.8X
before 1582, vec on 4322 4343 19 23.1 43.2 4.8X
after 1582, vec off 14060 14251 193 7.1 140.6 1.0X
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Read results look good.

before 1582 17979 17979 0 5.6 179.8 0.5X
after 1582, noop 9250 9250 0 10.8 92.5 1.0X
before 1582, noop 9522 9522 0 10.5 95.2 1.0X
after 1582 16377 16377 0 6.1 163.8 0.6X
Copy link
Member Author

@MaxGekk MaxGekk Apr 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan I don't think it's worth to optimize the write path. The code looks ugly, from my point of view.

@SparkQA
Copy link

SparkQA commented Apr 1, 2020

Test build #120674 has finished for PR 28091 at commit ebb8d13.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 1, 2020

Test build #120676 has finished for PR 28091 at commit 0830158.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 1, 2020

Test build #120679 has finished for PR 28091 at commit ee78740.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…om-to-java-date

# Conflicts:
#	sql/core/benchmarks/DateTimeRebaseBenchmark-jdk11-results.txt
#	sql/core/benchmarks/DateTimeRebaseBenchmark-results.txt
@SparkQA
Copy link

SparkQA commented Apr 9, 2020

Test build #121014 has finished for PR 28091 at commit 586517f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class SparkAvroKeyOutputFormat extends AvroKeyOutputFormat<GenericRecord>
  • static class SparkRecordWriterFactory extends RecordWriterFactory<GenericRecord>
  • class FMClassifierWrapperWriter(instance: FMClassifierWrapper) extends MLWriter
  • class FMClassifierWrapperReader extends MLReader[FMClassifierWrapper]
  • class LinearRegressionWrapperWriter(instance: LinearRegressionWrapper) extends MLWriter
  • class LinearRegressionWrapperReader extends MLReader[LinearRegressionWrapper]
  • case class LengthOfJsonArray(child: Expression) extends UnaryExpression
  • case class JsonObjectKeys(child: Expression) extends UnaryExpression with CodegenFallback
  • case class ShowViews(
  • case class ShowViewsCommand(

@MaxGekk
Copy link
Member Author

MaxGekk commented Apr 12, 2020

I am closing this PR because the final results look not so good:

OpenJDK 64-Bit Server VM 1.8.0_242-8u242-b08-0ubuntu3~18.04-b08 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Load dates from ORC:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-after 1582, vec off                               39651          39686          31          2.5         396.5       1.0X
-after 1582, vec on                                 3647           3660          13         27.4          36.5      10.9X
-before 1582, vec off                              38155          38219          61          2.6         381.6       1.0X
-before 1582, vec on                                4041           4046           6         24.7          40.4       9.8X
+after 1582, vec off                               76436          77047         877          1.3         764.4       1.0X
+after 1582, vec on                                 3790           3797          10         26.4          37.9      20.2X
+before 1582, vec off                              52369          52460         105          1.9         523.7       1.5X
+before 1582, vec on                                4171           4182          10         24.0          41.7      18.3X

@MaxGekk MaxGekk closed this Apr 12, 2020
@MaxGekk MaxGekk deleted the optimize-from-to-java-date branch June 5, 2020 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants