-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Description
If you want to use a CSV file-based dataset, that contains very long values in the columns, you have to increase the maxCharsPerColumn property from the default value of 4096. If, however, you don't know what is the length of the largest datapoint in your dataset, or cannot be sure to set a hard limit for future expansion, a logical thing to do would be to set the value to the largest possible one, i.e. Integer.MAX_VALUE.
This to my surprise crashes the test execution with :
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at org.junit.jupiter.params.shadow.com.univocity.parsers.common.input.DefaultCharAppender.<init>(DefaultCharAppender.java:40)
at org.junit.jupiter.params.shadow.com.univocity.parsers.csv.CsvParserSettings.newCharAppender(CsvParserSettings.java:93)
at org.junit.jupiter.params.shadow.com.univocity.parsers.common.ParserOutput.<init>(ParserOutput.java:111)
at org.junit.jupiter.params.shadow.com.univocity.parsers.common.AbstractParser.<init>(AbstractParser.java:91)
at org.junit.jupiter.params.shadow.com.univocity.parsers.csv.CsvParser.<init>(CsvParser.java:70)
at org.junit.jupiter.params.provider.CsvParserFactory.createParser(CsvParserFactory.java:61)
at org.junit.jupiter.params.provider.CsvParserFactory.createParserFor(CsvParserFactory.java:40)
at org.junit.jupiter.params.provider.CsvFileArgumentsProvider.provideArguments(CsvFileArgumentsProvider.java:64)
at org.junit.jupiter.params.provider.CsvFileArgumentsProvider.provideArguments(CsvFileArgumentsProvider.java:44)
at org.junit.jupiter.params.provider.AnnotationBasedArgumentsProvider.provideArguments(AnnotationBasedArgumentsProvider.java:52)
at org.junit.jupiter.params.ParameterizedTestExtension.arguments(ParameterizedTestExtension.java:145)
at org.junit.jupiter.params.ParameterizedTestExtension.lambda$provideTestTemplateInvocationContexts$2(ParameterizedTestExtension.java:90)
at org.junit.jupiter.params.ParameterizedTestExtension$$Lambda/0x00007427a8142bb0.apply(Unknown Source)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
I was absolutely shocked to see that the CSVParser implementation used by JUnit is really pre-allocating a char array to store the CSV values in it. See 1.
This is completely unusable for CSV strings of unknown length. One could of course provide a value that would fit within the JVM limits (i.e. Integer.MAX_VALUE - 8), at which point this doesn't crash, but that just means your unit test is now allocating absolutely ridiculous amounts of heap memory to run.
Digging further, I found that the shaded univocity parsers library does actually have another implementation of DefaultCharAppender called ExpandingCharAppender, which seems to grow the char buffer at runtime, starting from a modest 8192 buffer length value.
The library is basing it's decision on which Appender to use in the CsvParserSettings, see 2. Apparently, all that is required to switch to the ExpandingCharAppender is to pass a value of -1 for the maxCharsPerColumn.
Unfortunately, the maxCharsPerColumn property of the @CsvFileSource annotation requires the value to be a positive integer:
org.junit.platform.commons.PreconditionViolationException: maxCharsPerColumn must be a positive number: -1
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
Suppressed: org.junit.platform.commons.PreconditionViolationException: Configuration error: You must configure at least one set of arguments for this @ParameterizedTest
at java.base/java.util.stream.AbstractPipeline.close(AbstractPipeline.java:323)
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
... 9 more
Steps to reproduce
- OutOfMemoryError when using large column length limits
@ParameterizedTest
@CsvFileSource(resources = "/file.csv", numLinesToSkip = 1, maxCharsPerColumn = Integer.MAX_VALUE)
void dummy(String columnA, String columnB) {
}- PreconditionViolationException for trying to use an unbounded
ExpandingCharAppenderwithmaxCharsPerColumn = -1
@ParameterizedTest
@CsvFileSource(resources = "/file.csv", numLinesToSkip = 1, maxCharsPerColumn = -1)
void dummy(String columnA, String columnB) {
}Context
- Used versions (Jupiter/Vintage/Platform): JUnit 5.10.3
- Build Tool/IDE: JDK 21
TLDR
Please switch to the ExpandingCharAppender by default when using @CsvFileSource, or at least allow its usage by removing the positive integer validation of maxCharsPerColumn property, and document the valid range.
Alternatively, you may consider switching to a better CSV parser implementation altogether. This obscure "Univocity" library has last seen a commit in 2021 and its website univocity.com returns an HTTP 404 error page.