Skip to content

Conversation

@wayneguow
Copy link
Contributor

@wayneguow wayneguow commented Jun 11, 2024

What changes were proposed in this pull request?

This pr replaces deprecated classes and methods of commons-io called in Spark:

  • writeStringToFile(final File file, final String data) -> writeStringToFile(final File file, final String data, final Charset charset)
  • CountingInputStream -> BoundedInputStream

Why are the changes needed?

Clean up deprecated API usage.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Passed related test cases in UDFXPathUtilSuite and XmlSuite.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Jun 11, 2024
@LuciferYang
Copy link
Contributor

Is this the only case related to commons-io?

@wayneguow
Copy link
Contributor Author

Is this the only case related to commons-io?

For FileUtils#writeStringToFile, this is the only one left. Do we need to fix all other deprecated methods of commons-io called in Spark?

@LuciferYang
Copy link
Contributor

Yes, if there are other cases related to commons-io, let's fix them together all in one

@LuciferYang
Copy link
Contributor

@wayneguow seems there are other cases related to commons-io in org.apache.spark.sql.execution.datasources.xml.XmlRecordReader

@wayneguow
Copy link
Contributor Author

@wayneguow seems there are other cases related to commons-io in org.apache.spark.sql.execution.datasources.xml.XmlRecordReader

@LuciferYang Thank you for pointing that out. I'll fix this and other left in a batch.

@wayneguow wayneguow changed the title [SPARK-48583][SQL][TESTS] Replace deprecated FileUtils#writeStringToFile [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of commons-io called in Spark Jun 12, 2024
@wayneguow
Copy link
Contributor Author

@LuciferYang I made a double check, the following commons-io classes are used in Spark:

  • org.apache.commons.io.filefilter.TrueFileFilter
  • org.apache.commons.io.FilenameUtils
  • org.apache.commons.io.FileUtils
  • org.apache.commons.io.input.CountingInputStream
  • org.apache.commons.io.IOUtils
  • org.apache.commons.io.output.ByteArrayOutputStream
  • org.apache.commons.io.output.TeeOutputStream

In the build log, only FileUtils and CountingInputStream appear with deprecated prompts.

@wayneguow wayneguow changed the title [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of commons-io called in Spark [SPARK-48583][SQL][CORE][TESTS] Replace deprecated classes and methods of commons-io called in Spark Jun 12, 2024
@wayneguow wayneguow changed the title [SPARK-48583][SQL][CORE][TESTS] Replace deprecated classes and methods of commons-io called in Spark [SPARK-48583][SQL][TESTS] Replace deprecated classes and methods of commons-io called in Spark Jun 12, 2024
countingIn = new CountingInputStream(fsin)
countingIn = BoundedInputStream.builder()
.setInputStream(fsin)
.setMaxCount(fileSplit.getLength)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary in the current scenario? Perhaps the unbound one is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, changed to unbound .

@LuciferYang
Copy link
Contributor

Merged into master for Spark 4.0. Thanks @wayneguow

@wayneguow wayneguow deleted the deprecated branch February 11, 2025 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants