Light Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[Feature][Connectors-v2] Refactor DateTime Utils and Enhance Time String Auto-Format#10486

Open
hawk9821 wants to merge 9 commits intoapache:devfrom
hawk9821:csv_dateformat
Open

[Feature][Connectors-v2] Refactor DateTime Utils and Enhance Time String Auto-Format#10486
hawk9821 wants to merge 9 commits intoapache:devfrom
hawk9821:csv_dateformat

Conversation

Copy link
Contributor

hawk9821 commented Feb 12, 2026

Purpose of this pull request

  1. Refactor time utility class and enhance automatic formatting of time strings , and optimize auto-parsing performance

  2. Fix error caused by inconsistent single-column time string formats during file parsing in File-Connector

create_date
2026-01-01
2026/01/01
2026-1-1
2026-1-10

throw execption

org.apache.seatunnel.connectors.seatunnel.file.exception.FileConnectorException: ErrorCode:[FILE-08], ErrorDescription:[File read failed] - Read data from this file [default.default.default_file:/data/app/seatunnel/seatunnel- 2.3.12/tmp.txt] failed
at org.apache.seatunnel.connectors.seatunnel.file.source.reader .MultipleTableFileSourceReader.pollNext(MultipleTableFileSou rceReader.java:85) ~[?:?]
at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeC ycle.collect(SourceFlowLifeCycle.java:159) ~[seatunnel-starter.jar:2.3.9]
at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask. collect(SourceSeaTunnelTask.java:127) ~[seatunnel-starter.jar:2.3.9]
at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateP rocess(SeaTunnelTask.java:169) ~[seatunnel-starter.jar:2.3.9]
at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask. call(SourceSeaTunnelTask.java:132) ~[seatunnel-starter.jar:2.3.9]
at org.apache.seatunnel.engine.server.TaskExecutionService$Bloc kingWorker.run(TaskExecutionService.java:694) ~[seatunnel-starter.jar:2.3.9]
at org.apache.seatunnel.engine.server.TaskExecutionService$Name dTaskWrapper.run(TaskExecutionService.java:1019) ~[seatunnel-starter.jar:2.3.9]
at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable .java:43) ~[seatunnel-starter.jar:2.3.9]
at java.util.concurrent.Executors$RunnableAdapter.call(Executor s.java:511) ~[?:1.8.0_161]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1149) ~[?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:624) ~[?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.time.format.DateTimeParseException: Text '2026/01/01' could not be parsed at index 4
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFo rmatter.java:1949) ~[?:1.8.0_161]
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.j ava:1777) ~[?:1.8.0_161]
at org.apache.seatunnel.format.text.TextDeserializationSchema.c onvert(TextDeserializationSchema.java:318) ~[?:?]
at org.apache.seatunnel.format.text.TextDeserializationSchema.d eserialize(TextDeserializationSchema.java:188) ~[?:?]
at org.apache.seatunnel.format.text.TextDeserializationSchema.d eserialize(TextDeserializationSchema.java:60) ~[?:?]
at org.apache.seatunnel.connectors.seatunnel.file.source.reader .TextReadStrategy.lambda$readProcess$0(TextReadStrategy.java :108) ~[?:?]
at java.util.Iterator.forEachRemaining(Iterator.java:116) ~[?:1.8.0_161]
at java.util.Spliterators$IteratorSpliterator.forEachRemaining( Spliterators.java:1801) ~[?:1.8.0_161]
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePip eline.java:580) ~[?:1.8.0_161]
at org.apache.seatunnel.connectors.seatunnel.file.source.reader .TextReadStrategy.readProcess(TextReadStrategy.java:104) ~[?:?]
at org.apache.seatunnel.connectors.seatunnel.file.source.reader .AbstractReadStrategy.resolveArchiveCompressedInputStream(Ab stractReadStrategy.java:268) ~[?:?]
at org.apache.seatunnel.connectors.seatunnel.file.source.reader .TextReadStrategy.read(TextReadStrategy.java:71) ~[?:?]
at org.apache.seatunnel.connectors.seatunnel.file.source.reader .MultipleTableFileSourceReader.pollNext(MultipleTableFileSou rceReader.java:81) ~[?:?]
... 12 more
  1. Fix user-defined configuration : date_format, datetime_format,time_format does not take effect.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Added unit and e2e test cases.

Check list

github-actions bot added connectors-v2 format api labels Feb 12, 2026
Copy link

DanielCarter-stack commented Feb 12, 2026

Issue 1: Error message not clear enough after re-matching fails

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/DateTimeParseHelper.java:82-95

Related context:

  • Interface definition: DateTimeParseHelper.java:31-96
  • Caller 1: TextDeserializationSchema.java:310-312 (DATE case)
  • Caller 2: CsvDeserializationSchema.java:297-299 (TIMESTAMP case)
  • Caller 3: JsonToRowConverters.java:134-136 (DATE case)

Problem description:
When the first auto-matched format fails to parse, and the second re-match also fails (returns null), a generic error is thrown without indicating it was caused by a format change.

Potential risks:

  • Risk 1: Users cannot determine whether they need to configure an explicit format
  • Risk 2: Difficult to debug, unclear what format was matched the first time

Impact scope:

  • Direct impact: All Format classes using DateTime Parse
  • Indirect impact: User data quality validation processes
  • Affected area: Multiple Formats (Text, CSV, JSON)

Severity: MINOR

Improvement suggestion:

// DateTimeParseHelper.java:88-94
if (isUserConfigured) {
throw errorSupplier.get(fieldVal, fieldName);
}
// Record original format information before retry
DateTimeFormatter originalFormatter = fieldFormatterCache.get(fieldName);
formatter = autoFormatterSupplier.get(fieldVal);
if (formatter == null) {
// Improve error message to indicate that automatic matching was attempted but failed
throw new SeaTunnelRuntimeException(
CommonErrorCode.FORMAT_DATETIME_ERROR,
Map.of(
"datetime", fieldVal,
"field", fieldName,
"reason", "Auto-match failed after initial format mismatch: " +
(originalFormatter != null ? "cached format incompatible" : "no matching pattern")
)
);
}

Rationale: Provide more context to help users understand whether they need to explicitly configure a format or clean the data.


Issue 2: FormatterConfig.getPatternStr() uses instanceof chain, violating Open-Closed Principle

Location: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/DateTimeParseHelper.java:98-109

Related context:

  • Implementation: DateTimeParseHelper.java:98-109
  • Caller: DateTimeParseHelper.parseDateTimeValue() (line 69)
  • Related classes: DateUtils.Formatter, TimeUtils.Formatter, DateTimeUtils.Formatter

Problem description:
Uses instanceof chain to determine Formatter type; adding new Formatter types requires modifying this method.

Potential risks:

  • Risk 1: Maintenance burden - must modify every time a new Formatter is added
  • Risk 2: Easy to miss a branch, resulting in returning an empty string

Impact scope:

  • Direct impact: DateTimeParseHelper interface
  • Indirect impact: Future Formatter enum additions
  • Affected area: Core utility class

Severity: MINOR

Improvement suggestion:

// Add to Formatter interface
public interface Formatter<T> {
String getValue();
String getPattern(); // New addition
}

// In each Formatter implementation
public enum Formatter implements Formatter<Formatter> {
YYYY_MM_DD("yyyy-MM-dd");

@Override
public String getPattern() {
return getValue(); // Default implementation
}
}

// Simplify getPatternStr
default String getPatternStr(FormatterConfig formatterConfig) {
return formatterConfig.getFormatter().getPattern();
}

Rationale: Leverage polymorphism to avoid type checking, complying with the Open-Closed Principle.


Issue 3: Missing documentation and CHANGELOG

Location:

  • PR description (claims to follow checklist but not executed)
  • Documentation files: No .md modifications found

Related context:

  • PR description: If necessary, please update the documentation to describe the new feature.
  • Configuration items: DATE_FORMAT, DATETIME_FORMAT, TIME_FORMAT (newly added)
  • Configuration items: DATE_FORMAT_LEGACY, DATETIME_FORMAT_LEGACY, TIME_FORMAT_LEGACY (retained)

Problem description:

  1. Users are unaware of the existence and purpose of new configuration items
  2. Unclear about the relationship between new and old configuration items and which one is recommended
  3. Possible breaking changes not explained in incompatible-changes.md

Potential risks:

  • Risk 1: Users continue to use old DATE_FORMAT_LEGACY, unaware of new DATE_FORMAT
  • Risk 2: After upgrading to a new version, configuration behavior changes causing confusion

Impact scope:

  • Direct impact: All File Connector users
  • Indirect impact: Documentation maintainers
  • Affected area: Configuration files and documentation

Severity: MAJOR (user experience issue)

Improvement suggestion:

  1. Update docs/en/connector-v2/source/FileSource.md to explain:

    • Newly added configuration items: date_format, datetime_format, time_format
    • Old configuration items retained but marked as deprecated
    • Configuration examples
  2. Update docs/en/about/incompatible-changes.md:

    ## [Version X.Y.Z]

    ### File Source Configuration

    - **Change**: Added new configuration options `date_format`, `datetime_format`, `time_format` for more explicit format specification
    - **Impact**: Old options `date_format_legacy`, `datetime_format_legacy`, `time_format_legacy` are still supported but deprecated
    - **Migration**: Update to use new options without `_legacy` suffix
  3. Zai CHANGELOG.md Zhong Tian Jia Tiao Mu

** Reason**: Follow Apache project specifications to ensure smooth user upgrades.


## Issue 4: Caching in concurrent scenarios may cause race conditions

** Location**: seatunnel-formats/seatunnel-format-text/src/main/java/org/apache/seatunnel/format/text/TextDeserializationSchema.java:70

** Related context**:

  • Ding Yi :TextDeserializationSchema.java:70
  • Shi Yong :DateTimeParseHelper.parseDateTimeValue() line 63, 79, 89, 93
  • Shi Xian Lei :TextDeserializationSchema, CsvDeserializationSchema, JsonToRowConverters

** Problem description**:
fieldFormatterCache Shi ConcurrentHashMap,Dan Zai DateTimeParseHelper.parseDateTimeValue() Zhong De Shi Yong Mo Shi :

DateTimeFormatter formatter = fieldFormatterCache.get(fieldName); // read
if (formatter == null) {
// ... compute new formatter ...
fieldFormatterCache.put(fieldName, formatter); // write
}

Sui Ran ConcurrentHashMap De get() He put() Shi Xian Cheng An Quan De ,Dan Cun Zai check-then-act Jing Tai :

  • Xian Cheng A He Xian Cheng B Tong Shi Jie Xi Tong Yi Zi Duan
  • Liang Ge Xian Cheng Du Kan Dao formatter == null
  • Liang Ge Xian Cheng Du Diao Yong autoFormatterSupplier.get(fieldVal) Bing Zhi Xing put()
  • Jie Guo :Yi Ge Xian Cheng De Ji Suan Jie Guo Fu Gai Ling Yi Ge Xian Cheng De Jie Guo

** Potential risks**:

  • Feng Xian 1:Zai Ji Duan Qing Kuang Xia (Duo Xian Cheng Tong Shi Qi Dong ),Huan Cun Ke Neng Bu Wen Ding
  • Feng Xian 2:Qing Wei De Xing Neng Sun Hao (Zhong Fu Ji Suan )
  • Feng Xian 3:Li Lun Shang Ke Neng Dao Zhi Liang Ge Xian Cheng Shi Yong Bu Tong De Ge Shi Jie Xi Tong Yi Zi Duan

** Impact scope**:

  • Zhi Jie Ying Xiang :TextDeserializationSchema, CsvDeserializationSchema, JsonToRowConverters
  • Jian Jie Ying Xiang :Duo Xian Cheng Source Ren Wu
  • Ying Xiang Mian :Format Ceng

** Severity**: MINOR (In actual scenarios, field parsing for the first time is usually single-threaded, but the design is not robust enough)

** Improvement suggestions**:

// Use computeIfAbsent to guarantee atomicity
DateTimeFormatter formatter = fieldFormatterCache.computeIfAbsent(fieldName, key -> {
if (isUserConfigured) {
String pattern = getPatternStr(formatterConfig);
return DateTimeFormatter.ofPattern(pattern);
} else {
DateTimeFormatter matched = autoFormatterSupplier.get(fieldVal);
if (matched == null) {
throw errorSupplier.get(fieldVal, fieldName);
}
return matched;
}
});

// Parse phase
try {
return parser.parse(fieldVal, formatter);
} catch (Exception e) {
if (isUserConfigured) {
throw errorSupplier.get(fieldVal, fieldName);
}
// Re-match: replace cached formatter
DateTimeFormatter newFormatter = autoFormatterSupplier.get(fieldVal);
if (newFormatter == null) {
throw errorSupplier.get(fieldVal, fieldName);
}
// Atomic replacement (note: there may be concurrency issues here, but rare in actual scenarios)
fieldFormatterCache.replace(fieldName, formatter, newFormatter);
return parser.parse(fieldVal, newFormatter);
}

** Reason**: Use computeIfAbsent() to eliminate check-then-act race conditions, ensuring that each field's format is calculated only once.


## Issue 5: Time format parsing lacks unified automatic matching support

** Location**: seatunnel-common/src/main/java/org/apache/seatunnel/common/utils/TimeUtils.java:116-123

** Related context**:

  • TimeUtils.matchTimeFormatter(): line 116-123
  • DateTimeUtils.matchDateTimeFormatter(): line 348-367(You An Chang Du Fen Zu You Hua )
  • DateUtils.matchDateFormatter(): line 124-131(Xian Xing Lie Biao Bian Li )

** Problem description**:
TimeUtils.matchTimeFormatter() Shi Yong Xian Xing Bian Li Lie Biao (8 Ge Mo Shi ),Er DateTimeUtils.matchDateTimeFormatter() Shi Yong Liao An Chang Du Fen Zu De You Hua (Xing Neng Ti Sheng 4-5 Bei ).

Shi Jian Ge Shi De Duo Yang Xing (HH:mm:ss vs H:mm:ss Deng )Ye Zhi De Lei Si You Hua .

** Potential risks**:

  • Feng Xian 1:Shi Jian Zi Duan Jie Xi Xing Neng Xiang Dui Jiao Chai
  • Feng Xian 2:Jia Gou Bu Yi Zhi (Wei Shi Yao DateTime You You Hua Er Time Mei You )

** Impact scope**:

  • Zhi Jie Ying Xiang :TimeUtils.parse(String) De Du Li Diao Yong Zhe
  • Jian Jie Ying Xiang :Tong Guo DateTimeParseHelper De Diao Yong Bu Shou Ying Xiang (Yi Tong Yi )
  • Ying Xiang Mian :Core Gong Ju Lei

** Severity**: MINOR (Performance optimization opportunity)

** Improvement suggestions**:

// TimeUtils.java
private static final Map<Integer, List<TimePattern>> TIME_PATTERN_MAP = new HashMap<>();
private static final int TIME_LENGTH_THRESHOLD = 12; // e.g. "HH:mm:ss.SSSSSSSSS"

static {
initTimePatternMap();
}

private static void initTimePatternMap() {
// Group by string length: 8 (HH:mm:ss), 9 (H:mm:ss), 12 (HH:mm:ss.SSSSSSSSS), etc.
List<TimePattern> length8Patterns = new ArrayList<>();
length8Patterns.add(new TimePattern("\\d{2}:\\d{2}:\\d{2}", Formatter.HH_MM_SS));
TIME_PATTERN_MAP.put(8, length8Patterns);

// ... other lengths
}

public static DateTimeFormatter matchTimeFormatter(String timeStr) {
if (timeStr == null || timeStr.isEmpty()) {
throw new IllegalArgumentException("Time string cannot be null or empty");
}
int strLength = timeStr.length();
List<TimePattern> timePatterns = TIME_PATTERN_MAP.getOrDefault(strLength, Collections.emptyList());

for (TimePattern pattern : timePatterns) {
if (pattern.getPattern().matcher(timeStr).matches()) {
return pattern.getFormatter();
}
}
return null; // or return match from extra-long group
}

Rationale: Maintain architectural consistency, improve time field parsing performance.


Copy link
Member

wuchunfu commented Feb 13, 2026

@hawk9821 Do the following time formats of data support

1/2/2026 12:01:30
01/2/2026 12:01:30
1/02/2026 12:01:30
01/02/2026 12:01:30

1/2/2026 12:1:30
01/2/2026 12:1:30
1/02/2026 12:1:30
01/02/2026 12:1:30

1/2/2026
01/2/2026
1/02/2026
01/02/2026

12:1:30
12:01:30
12:01:03
12:01:3
12:01:30
01:02:03
1:2:3
12:1:03
1:01:30
01:1:30

......

hawk9821 force-pushed the csv_dateformat branch from 1c7a97a to d84f418 Compare February 27, 2026 00:43
Copy link
Contributor Author

hawk9821 commented Feb 27, 2026

@hawk9821 Do the following time formats of data support

1/2/2026 12:01:30
01/2/2026 12:01:30
1/02/2026 12:01:30
01/02/2026 12:01:30

1/2/2026 12:1:30
01/2/2026 12:1:30
1/02/2026 12:1:30
01/02/2026 12:1:30

1/2/2026
01/2/2026
1/02/2026
01/02/2026

12:1:30
12:01:30
12:01:03
12:01:3
12:01:30
01:02:03
1:2:3
12:1:03
1:01:30
01:1:30

......

Supported

hawk9821 added 9 commits March 2, 2026 09:52
[Feature][Connector-v2]Time Format Extension

[Feature][Connector-v2]Time Format Extension

# Conflicts:
# seatunnel-common/src/main/java/org/apache/seatunnel/common/exception/CommonErrorCode.java
# Conflicts:
# seatunnel-common/src/main/java/org/apache/seatunnel/common/exception/CommonErrorCode.java
hawk9821 force-pushed the csv_dateformat branch from 06fce66 to e1aecb5 Compare March 2, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants