@CsvFileSource is unusable for datasets involving very long lines

# Description

If you want to use a CSV file-based dataset, that contains very long values in the columns, you have to increase the `maxCharsPerColumn` property from the default value of `4096`.  If, however, you don't know what is the length of the largest datapoint in your dataset, or cannot be sure to set a hard limit for future expansion, a logical thing to do would be to set the value to the largest possible one, i.e. `Integer.MAX_VALUE`.

This to my surprise crashes the test execution with :
```
Exception in thread "main" java.lang.OutOfMemoryError: Requested array size exceeds VM limit
	at org.junit.jupiter.params.shadow.com.univocity.parsers.common.input.DefaultCharAppender.<init>(DefaultCharAppender.java:40)
	at org.junit.jupiter.params.shadow.com.univocity.parsers.csv.CsvParserSettings.newCharAppender(CsvParserSettings.java:93)
	at org.junit.jupiter.params.shadow.com.univocity.parsers.common.ParserOutput.<init>(ParserOutput.java:111)
	at org.junit.jupiter.params.shadow.com.univocity.parsers.common.AbstractParser.<init>(AbstractParser.java:91)
	at org.junit.jupiter.params.shadow.com.univocity.parsers.csv.CsvParser.<init>(CsvParser.java:70)
	at org.junit.jupiter.params.provider.CsvParserFactory.createParser(CsvParserFactory.java:61)
	at org.junit.jupiter.params.provider.CsvParserFactory.createParserFor(CsvParserFactory.java:40)
	at org.junit.jupiter.params.provider.CsvFileArgumentsProvider.provideArguments(CsvFileArgumentsProvider.java:64)
	at org.junit.jupiter.params.provider.CsvFileArgumentsProvider.provideArguments(CsvFileArgumentsProvider.java:44)
	at org.junit.jupiter.params.provider.AnnotationBasedArgumentsProvider.provideArguments(AnnotationBasedArgumentsProvider.java:52)
	at org.junit.jupiter.params.ParameterizedTestExtension.arguments(ParameterizedTestExtension.java:145)
	at org.junit.jupiter.params.ParameterizedTestExtension.lambda$provideTestTemplateInvocationContexts$2(ParameterizedTestExtension.java:90)
	at org.junit.jupiter.params.ParameterizedTestExtension$$Lambda/0x00007427a8142bb0.apply(Unknown Source)
	at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
```  

I was absolutely shocked to see that the  `CSVParser` implementation used by JUnit is really **pre-allocating** a `char` array to store the CSV values in it. See [1](https://github.com/uniVocity/univocity-parsers/blob/7e7d1b3c0a3dceaed4a8413875eb1500f2a028ec/src/main/java/com/univocity/parsers/common/input/DefaultCharAppender.java#L40).

This is completely unusable for CSV strings of unknown length. One could of course provide a value that would fit within the JVM limits (i.e. `Integer.MAX_VALUE - 8`), at which point this doesn't crash, but that just means your unit test is now allocating absolutely ridiculous amounts of heap memory to run.

Digging further, I found that the shaded univocity parsers library does actually have another implementation of `DefaultCharAppender` called `ExpandingCharAppender`, which seems to grow the `char` buffer at runtime, starting from a modest `8192` buffer length value.

The library is basing it's decision on which `Appender` to use in the `CsvParserSettings`, see [2](https://github.com/uniVocity/univocity-parsers/blob/7e7d1b3c0a3dceaed4a8413875eb1500f2a028ec/src/main/java/com/univocity/parsers/csv/CsvParserSettings.java#L92). Apparently, all that is required to switch to the `ExpandingCharAppender` is to pass a value of `-1` for the `maxCharsPerColumn`.

Unfortunately, the `maxCharsPerColumn` property of the `@CsvFileSource` annotation **requires** the value to be a positive integer:

```
org.junit.platform.commons.PreconditionViolationException: maxCharsPerColumn must be a positive number: -1
	at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:276)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1708)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
	Suppressed: org.junit.platform.commons.PreconditionViolationException: Configuration error: You must configure at least one set of arguments for this @ParameterizedTest
		at java.base/java.util.stream.AbstractPipeline.close(AbstractPipeline.java:323)
		at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273)
		... 9 more
```

## Steps to reproduce

1. OutOfMemoryError when using large column length limits
```java
@ParameterizedTest
@CsvFileSource(resources = "/file.csv", numLinesToSkip = 1, maxCharsPerColumn = Integer.MAX_VALUE)
void dummy(String columnA, String columnB) {
}
```

2. PreconditionViolationException for trying to use an unbounded `ExpandingCharAppender` with `maxCharsPerColumn = -1`
```java
@ParameterizedTest
@CsvFileSource(resources = "/file.csv", numLinesToSkip = 1, maxCharsPerColumn = -1)
void dummy(String columnA, String columnB) {
}
```

## Context

 - Used versions (Jupiter/Vintage/Platform): JUnit 5.10.3
 - Build Tool/IDE: JDK 21

## TLDR

Please switch to the `ExpandingCharAppender` by default when using `@CsvFileSource`, or at least allow its usage by removing the positive integer validation of `maxCharsPerColumn` property, and document the valid range.

Alternatively, you may consider switching to a better CSV parser implementation altogether. This obscure "Univocity" library has last seen a commit in 2021 and its website [univocity.com](https://univocity.com/) returns an HTTP 404 error page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

@CsvFileSource is unusable for datasets involving very long lines #3923

Description

Steps to reproduce

Context

TLDR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

@CsvFileSource is unusable for datasets involving very long lines #3923

Description

Description

Steps to reproduce

Context

TLDR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions