Skip to content

Conversation

@gaborgsomogyi
Copy link
Contributor

@gaborgsomogyi gaborgsomogyi commented Jun 22, 2020

What changes were proposed in this pull request?

OracleIntegrationSuite is not using the latest oracle JDBC driver. In this PR I've upgraded the driver to the latest which supports JDK8, JDK9, and JDK11.

Why are the changes needed?

Old JDBC driver.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit tests.
Existing integration tests (especially OracleIntegrationSuite)

@gaborgsomogyi
Copy link
Contributor Author

I've just re-executed OracleIntegrationSuite on my local Mac and passed.

@gaborgsomogyi
Copy link
Contributor Author

cc @gatorsmile @dongjoon-hyun @maropu @MaxGekk to pull everybody in who helps in #28863

@SparkQA
Copy link

SparkQA commented Jun 22, 2020

Test build #124364 has finished for PR 28893 at commit 1c5525d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@gaborgsomogyi
Copy link
Contributor Author

Filed SPARK-32054

@SparkQA
Copy link

SparkQA commented Jun 22, 2020

Test build #124367 has finished for PR 28893 at commit 1c5525d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32049][SQL]Upgrade oracle JDBC driver to the latest [SPARK-32049][SQL][TESTS] Upgrade oracle JDBC driver to ojdbc8 Jun 22, 2020
@dongjoon-hyun dongjoon-hyun changed the title [SPARK-32049][SQL][TESTS] Upgrade oracle JDBC driver to ojdbc8 [SPARK-32049][SQL][TESTS] Upgrade Oracle JDBC Driver 8 Jun 22, 2020
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @gaborgsomogyi .

Unfortunately, this doesn't pass the IT test because this PR doesn't include the following as you described.

The timeout and interval parameter to be increased to a high value for oracle test in
DockerJDBCIntegrationSuite.scala (Locally tested with 20 min timeout and 1 sec interval

Technically, this dies like the following.

$ docker logs 1380b6c16cff -f
ORACLE PASSWORD FOR SYS AND SYSTEM: oracle
Specify a password to be used for database accounts. Oracle recommends that the password entered should be at least 8 characters in length, contain at least 1 uppercase character, 1 lower case character and 1 digit [0-9]. Note that the same password will be used for SYS, SYSTEM and PDBADMIN accounts:
Confirm the password:
Configuring Oracle Listener.
Listener configuration succeeded.
Configuring Oracle Database XE.
Enter SYS user password:
******
Enter SYSTEM user password:
*****
Enter PDBADMIN User Password:
******
Prepare for db operation
7% complete
Copying database files
29% complete
Creating and starting Oracle instance
30% complete
31% complete
34% complete
---- die here ----

In this case, the documentation is insufficient because this cannot be used during RC vote or automated test. During RC vote or automated test, we are testing on the untouched source code.

* 6. Run spark test - ./build/sbt "test-only org.apache.spark.sql.jdbc.OracleIntegrationSuite"
* 4. The timeout and interval parameter to be increased to a high value for oracle test in
* DockerJDBCIntegrationSuite.scala (Locally tested with 20 min timeout and 1 sec interval
* then executed successfully).
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

20 min looks like a magic number. After repeating some, 5 minutes seems sufficient in some cases.
Or, we can introduction a new val or def connectionTimeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's magically high enough not to cause issues. I personally would not add further complexity but just increase the timeout.

@dongjoon-hyun
Copy link
Member

I made a PR to you. Could you review and consider that?

<artifactId>ojdbc8</artifactId>
<version>19.6.0.0</version>
<scope>test</scope>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update this root pom in this PR? It seems this change is related to the ohter PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is questionable but I think rolling this back and adding it in the next commit ends-up more work nominally. Apart from that purely personal taste but I like the dependency management way of dep handling (even if only one reference exists).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, but, if this PR is reverted for some reason after the two PRs have been merged, will the other PR break? I think each PR should be as independent as possible for backports/reverts.

Copy link
Contributor Author

@gaborgsomogyi gaborgsomogyi Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly not considered independency but I agree with the direction. I would like to understand your point but not yet able. Here we're considering on which side the dep management will be. We can consider dep management change as an additional complexity at the top of the actual code base. If we move that from this PR into the other one will make this more independent but on the other side will make the other one less independent. I see this as a tradeoff.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but on the other side will make the other one less independent.

You meant #28863 can avoid the dependency for this oracle driver in the root pom? I thought you added the dependency in #28863 for adding some secure connector's tests in the core package:
https://github.com/apache/spark/pull/28863/files#diff-0b713a218f4e99665e339404f7b6c6c3R153-R157

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this one, I believe it's okay because now we have all SQL drivers in one place. In this root pom, we have MySQL/Maria/PostgreSQL/DB2/MsSqlServer already.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think about that, @maropu ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. Looks okay now. Thanks, @dongjoon-hyun and @gaborgsomogyi

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you guys for the clarification.

@maropu
Copy link
Member

maropu commented Jun 23, 2020

Thanks for the work, @gaborfeher !

@gaborgsomogyi
Copy link
Contributor Author

@dongjoon-hyun

During RC vote or automated test, we are testing on the untouched source code.

This justifies the need to fix the hack properly.

I've just double checked your suggestion and seems like I have bad luck because the test consumed ~5 minutes and 3 seconds.

20/06/23 00:46:45.940 ScalaTest-run INFO DefaultDockerClient: Starting container with Id: ffb8996e0471656089bdebae35b112e204ceb72487e0bc9099c9d96dda935917
20/06/23 00:51:48.089 ScalaTest-run INFO SharedState: loading hive config file: file:/Users/gaborsomogyi/spark/sql/core/target/scala-2.12/test-classes/hive-site.xml

Increasing it to 7 minutes to have some room and avoid flaky behaviour.

@gaborfeher
Copy link

Thanks for the work, @gaborfeher !

Other Gabor :)

jdbcUrl = db.getJdbcUrl(dockerIp, externalPort)
var conn: Connection = null
eventually(timeout(2.minutes), interval(1.second)) {
eventually(timeout(7.minutes), interval(1.second)) {
Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gaborgsomogyi . 7 minutes is okay for Oracle suite. But, it seems that you misunderstand my PR. You can use more higher values, but you should not increase the timeout of the other test suites. Please see my PR. I keep 2.minutes for all the other test suite and only allow longer timeout for OracleIntegrationSuite.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the way...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, just read back the situation. Here I've read 5 minutes what I've thought is a bit small so changed the code manually. Just seen your effort which contains 10 minutes :) Sorry...

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Ya. During review, I didn't update my original comment. My bad. I should be more clear on the PR.

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124412 has finished for PR 28893 at commit e56e02b.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124413 has finished for PR 28893 at commit e56e02b.

  • This patch fails build dependency tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

Gosh, again this issue:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:3.0.0-M1:install (default-cli) on project spark-parent_2.12: ArtifactInstallerException: Failed to install metadata org.apache.spark:spark-parent_2.12/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/spark/spark-parent_2.12/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got > (position: END_TAG seen ...</metadata>\n>... @13:2) -> [Help 1]

Retrying after lunch...

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. This is an integration test and I verified manually.
Thank you, @gaborgsomogyi and @maropu .
Merged to master.

@SparkQA
Copy link

SparkQA commented Jun 23, 2020

Test build #124405 has finished for PR 28893 at commit b546885.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gaborgsomogyi
Copy link
Contributor Author

Just arrived back and seen the status.
Thank you @dongjoon-hyun and @maropu for all the efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants