Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
[Improve] Improve parameters
  • Loading branch information
chl-wxp committed Dec 12, 2025
commit 7f3fd9417d4fd1d73d4943a07382f3a39a77d2dc
10 changes: 5 additions & 5 deletions docs/en/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ If you use SeaTunnel Engine, It automatically integrated the hadoop jar when you
| tables_configs | list | no | used to define a multiple table task |
| file_filter_modified_start | string | no | - |
| file_filter_modified_end | string | no | - |
| enable_split_file | boolean | no | false |
| split_size | long | no | 134217728 |
| enable_file_split | boolean | no | false |
| file_split_size | long | no | 134217728 |

### path [string]

Expand Down Expand Up @@ -417,13 +417,13 @@ File modification time filter. The connector will filter some files base on the

File modification time filter. The connector will filter some files base on the last modification end time (not include end time). The default data format is `yyyy-MM-dd HH:mm:ss`.

### enable_split_file [string]
### enable_file_split [string]

Turn on the file splitting function, the default is false。It can be selected when the file type is csv, text, json and non-compressed format.

### split_size
### file_split_size [long]

File split size, which can be filled in when the enable_split_file parameter is true. The unit is the number of bytes. The default value is the number of bytes of 128MB, which is 134217728.
File split size, which can be filled in when the enable_file_split parameter is true. The unit is the number of bytes. The default value is the number of bytes of 128MB, which is 134217728.

### common options

Expand Down
10 changes: 5 additions & 5 deletions docs/zh/connector-v2/source/LocalFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ import ChangeLog from '../changelog/connector-file-local.md';
| tables_configs | list | 否 | 用于定义多表任务 |
| file_filter_modified_start | string | 否 | - |
| file_filter_modified_end | string | 否 | - |
| enable_split_file | boolean | 否 | false |
| split_size | long | 否 | 134217728 |
| enable_file_split | boolean | 否 | false |
| file_split_size | long | 否 | 134217728 |

### path [string]

Expand Down Expand Up @@ -417,13 +417,13 @@ null_format 定义哪些字符串可以表示为 null。

按照最后修改时间过滤文件。 要过滤的结束时间(不包括改时间),时间格式是:`yyyy-MM-dd HH:mm:ss`。

### enable_split_file [string]
### enable_file_split [boolean]

开启文件分割功能,默认为false。文件类型为csv、text、json、非压缩格式时可选择。

### split_size
### file_split_size [long]

文件分割大小,enable_split_file参数为true时可以填写。单位是字节数。默认值为128MB的字节数,即134217728。
文件分割大小,enable_file_split参数为true时可以填写。单位是字节数。默认值为128MB的字节数,即134217728。

### 通用选项

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,16 +151,16 @@ public class FileBaseOptions extends ConnectorCommonOptions {
.defaultValue(ArchiveCompressFormat.NONE)
.withDescription("Archive compression codec");

public static final Option<Boolean> ENABLE_SPLIT_FILE =
Options.key("enable_split_file")
public static final Option<Boolean> ENABLE_FILE_SPLIT =
Options.key("enable_file_split")
.booleanType()
.defaultValue(false)
.withDescription("Turn on the file splitting function, the default is false");

public static final Option<Long> SPLIT_SIZE =
Options.key("split_size")
public static final Option<Long> FILE_SPLIT_SIZE =
Options.key("file_split_size")
.longType()
.defaultValue(128 * 1024 * 1024L)
.withDescription(
"File split size, which can be filled in when the enable_split_file parameter is true. The unit is the number of bytes. The default value is the number of bytes of 128MB, which is 128*1024*1024.");
"File split size, which can be filled in when the enable_file_split parameter is true. The unit is the number of bytes. The default value is the number of bytes of 128MB, which is 128*1024*1024.");
}
Original file line number Diff line number Diff line change
Expand Up @@ -247,9 +247,9 @@ public void setPluginConfig(Config pluginConfig) {
pluginConfig.getString(
FileBaseSourceOptions.FILE_FILTER_MODIFIED_END.key()));
}
if (pluginConfig.hasPath(FileBaseSourceOptions.ENABLE_SPLIT_FILE.key())) {
if (pluginConfig.hasPath(FileBaseSourceOptions.ENABLE_FILE_SPLIT.key())) {
enableSplitFile =
pluginConfig.getBoolean(FileBaseSourceOptions.ENABLE_SPLIT_FILE.key());
pluginConfig.getBoolean(FileBaseSourceOptions.ENABLE_FILE_SPLIT.key());
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import org.apache.seatunnel.connectors.seatunnel.file.local.source.split.LocalFileAccordingToSplitSizeSplitStrategy;
import org.apache.seatunnel.connectors.seatunnel.file.source.BaseMultipleTableFileSource;
import org.apache.seatunnel.connectors.seatunnel.file.source.split.DefaultFileSplitStrategy;
import org.apache.seatunnel.connectors.seatunnel.file.source.split.FileSplitStrategy;

import static org.apache.seatunnel.connectors.seatunnel.file.config.FileBaseSourceOptions.DEFAULT_ROW_DELIMITER;

Expand All @@ -32,22 +33,29 @@ public class LocalFileSource extends BaseMultipleTableFileSource {
public LocalFileSource(ReadonlyConfig readonlyConfig) {
super(
new MultipleTableLocalFileSourceConfig(readonlyConfig),
readonlyConfig.get(FileBaseSourceOptions.ENABLE_SPLIT_FILE)
? new LocalFileAccordingToSplitSizeSplitStrategy(
readonlyConfig.get(FileBaseSourceOptions.ROW_DELIMITER) == null
? DEFAULT_ROW_DELIMITER
: readonlyConfig.get(FileBaseSourceOptions.ROW_DELIMITER),
readonlyConfig.get(FileBaseSourceOptions.CSV_USE_HEADER_LINE)
? 1L
: readonlyConfig.get(
FileBaseSourceOptions.SKIP_HEADER_ROW_NUMBER),
readonlyConfig.get(FileBaseSourceOptions.ENCODING),
readonlyConfig.get(FileBaseSourceOptions.SPLIT_SIZE))
: new DefaultFileSplitStrategy());
initFileSplitStrategy(readonlyConfig));
}

@Override
public String getPluginName() {
return FileSystemType.LOCAL.getFileSystemPluginName();
}

private static FileSplitStrategy initFileSplitStrategy(ReadonlyConfig readonlyConfig) {
if (readonlyConfig.get(FileBaseSourceOptions.ENABLE_FILE_SPLIT)) {
return new DefaultFileSplitStrategy();
}
String rowDelimiter =
!readonlyConfig.getOptional(FileBaseSourceOptions.ROW_DELIMITER).isPresent()
? DEFAULT_ROW_DELIMITER
: readonlyConfig.get(FileBaseSourceOptions.ROW_DELIMITER);
long skipHeaderRowNumber =
readonlyConfig.get(FileBaseSourceOptions.CSV_USE_HEADER_LINE)
? 1L
: readonlyConfig.get(FileBaseSourceOptions.SKIP_HEADER_ROW_NUMBER);
String encodingName = readonlyConfig.get(FileBaseSourceOptions.ENCODING);
long splitSize = readonlyConfig.get(FileBaseSourceOptions.FILE_SPLIT_SIZE);
return new LocalFileAccordingToSplitSizeSplitStrategy(
rowDelimiter, skipHeaderRowNumber, encodingName, splitSize);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -87,11 +87,11 @@ public OptionRule optionRule() {
.conditional(
FileBaseSourceOptions.FILE_FORMAT_TYPE,
Arrays.asList(FileFormat.TEXT, FileFormat.JSON, FileFormat.CSV),
FileBaseSourceOptions.ENABLE_SPLIT_FILE)
FileBaseSourceOptions.ENABLE_FILE_SPLIT)
.conditional(
FileBaseSourceOptions.ENABLE_SPLIT_FILE,
FileBaseSourceOptions.ENABLE_FILE_SPLIT,
Boolean.TRUE,
FileBaseSourceOptions.SPLIT_SIZE)
FileBaseSourceOptions.FILE_SPLIT_SIZE)
.optional(FileBaseSourceOptions.PARSE_PARTITION_FROM_PATH)
.optional(FileBaseSourceOptions.DATE_FORMAT_LEGACY)
.optional(FileBaseSourceOptions.DATETIME_FORMAT_LEGACY)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ source {
field_delimiter = ","
row_delimiter = "\n"
skip_header_row_number = 1
enable_split_file = true
split_size = 3
enable_file_split = true
file_split_size = 3
schema = {
fields {
c_map = "map<string, string>"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ source {
LocalFile {
path = "/seatunnel/read/json"
file_format_type = "json"
enable_split_file = true
split_size = 3
enable_file_split = true
file_split_size = 3
schema = {
fields {
c_map = "map<string, string>"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ source {
LocalFile {
path = "/seatunnel/read/text"
file_format_type = "text"
enable_split_file = true
split_size = 3
enable_file_split = true
file_split_size = 3
schema = {
fields {
c_map = "map<string, string>"
Expand Down