feat(import): validate primary keys early #9349

codeaucafe · 2025-06-13T06:48:28Z

NOTE: currently this is just checking primary keys, but the related issue also mentioned in #1083 references issues with number of args. I'm wondering if I should split that into two separate PRs, or if it's fine to do both - currently I only have the PK validation in place? Personally I would argue to separate them since they are slightly different, albeit similar. But happy to take either path.

Add early validation to check if specified primary keys exist in the import file's schema before processing rows. This prevents users from waiting for large files to be processed only to discover that their primary key column names are invalid.

Changes:

Add validatePrimaryKeysAgainstSchema function to check primary key existence against file schema
Integrate validation into newImportDataReader for create operations
Provide helpful error messages listing available columns when primary keys are not found
Add unit tests covering various validation scenarios
Add BATS integration tests for CSV, PSV, and large file scenarios

The validation only runs for create operations when primary keys are explicitly specified and no schema file is provided. This ensures fast failure while maintaining backward compatibility.

Before: Users waited minutes for large files to process before seeing "provided primary key not found" errors

After: Users get immediate feedback with helpful column suggestions

Closes: #1083

Add early validation to check if specified primary keys exist in the import file's schema before processing rows. This prevents users from waiting for large files to be processed only to discover that their primary key column names are invalid. Changes: - Add validatePrimaryKeysAgainstSchema function to check primary key existence against file schema - Integrate validation into newImportDataReader for create operations - Provide helpful error messages listing available columns when primary keys are not found - Add unit tests covering various validation scenarios - Add BATS integration tests for CSV, PSV, and large file scenarios The validation only runs for create operations when primary keys are explicitly specified and no schema file is provided. This ensures fast failure while maintaining backward compatibility. Before: Users waited minutes for large files to process before seeing "provided primary key not found" errors After: Users get immediate feedback with helpful column suggestions Refs: dolthub#1083

timsehn · 2025-06-13T20:10:20Z

A quick once over of the

This change has broken four existing bats tests.

not ok 954 import-create-tables: try to table import with nonexistent --pk arg
# (in test file ./import-create-tables.bats, line 257)
#   `[[ "$output" =~ "column 'batmansparents' not found" ]] || false' failed
# Successfully initialized dolt data repository.
not ok 955 import-create-tables: try to table import with one valid and one nonexistent --pk arg
# (in test file ./import-create-tables.bats, line 264)
#   `[[ "$output" =~ "column 'batmansparents' not found" ]] || false' failed
# Successfully initialized dolt data repository.
ok 956 import-create-tables: create a table with two primary keys from csv import
ok 957 import-create-tables: import data from psv and create the table
ok 958 import-create-tables: import table using --delim
not ok 959 import-create-tables: create a table with a name map
# (in test file ./import-create-tables.bats, line 313)
#   `[ "$status" -eq 0 ]' failed
# Successfully initialized dolt data repository.
not ok 960 import-create-tables: use a name map with missing and extra entries
# (in test file ./import-create-tables.bats, line 331)
#   `[ "$status" -eq 0 ]' failed

And your two new tests fail:

not ok 1040 import-tables: validate primary keys exist in CSV file (issue #1083)
# (in test file ./import-tables.bats, line 69)
#   `[[ "$output" =~ "primary keys \[invalid1 invalid2\] not found" ]] || false' failed
# Successfully initialized dolt data repository.
not ok 1041 import-tables: primary key validation with schema file should skip validation
# (in test file ./import-tables.bats, line 152)
#   `[ "$status" -eq 0 ]' failed

These need to be fixed before merge.

timsehn

A comment on. test location. Also see PR comments. This does not pass test.

integration-tests/bats/import-tables.bats

codeaucafe · 2025-06-13T22:15:48Z

Thanks @timsehn, I'll fix those. Should I add more validations in this PR or just stick with primary key validation for now and break up other checks (like the related issue referenced in the issue regarding number of args) into another PR?

timsehn · 2025-06-13T22:19:05Z

I think this is fine as an isolated change. I would make new PRs for additional checks.

go/cmd/dolt/commands/tblcmds/import.go

Update validation function to support name mapping during primary key validation, allowing users to specify primary keys using either original CSV column names or mapped target column names when using the --map option. Ensure validation is properly skipped when schema files are provided since --pk and --schema parameters are mutually exclusive. Update test expectations in import-create-tables.bats to match the existing detailed error message format that includes available columns, rather than changing the error messages to match the old test expectations. Move new validation tests from import-tables.bats to import-create-tables.bats as requested in code review. The two failing tests "try to table import with nonexistent --pk arg" and "try to table import with one valid and one nonexistent --pk arg" now pass because our early validation catches these errors before InferSchema, producing more helpful error messages that include available columns. Updated the test expectations to match our validation's error format instead of the old generic "column not found" message. Refs: dolthub#1083

codeaucafe · 2025-06-23T22:24:24Z

Hey just bumping this up just in case y'all missed my replies/comments

macneale4

This looks great! Thanks for the contribution 🙏

Requested a couple trivial changes to the bats tests. I'm running tests which need special permissions, so when make updates I can usher the merge through.

integration-tests/bats/import-create-tables.bats

go/cmd/dolt/commands/tblcmds/import.go

Remove unnecessary cleanup commands and update test comment format as requested in code review. The test harness automatically handles file cleanup, making manual rm commands redundant. Update test comment to reference GitHub issue dolthub#1083 using the preferred format. Refs: dolthub#1083

codeaucafe · 2025-06-24T06:17:47Z

Thank you! I just pushed the commit resolving your suggestions. Let me know if I missed anything.

macneale4

LGTM! I'll merge when tests complete. Thanks again for your contribution!

    __|__ |___| |\
    |o__| |___| | \
    |___| |___| |o \
   _|___| |___| |__o\
  /...\_____|___|____\_/
  \   o * o * * o o  /
~~~~~~~~~~~~~~~~~~~~~~~~~~

codeaucafe · 2025-06-25T00:14:19Z

Thank you!

coffeegoddd added the contribution label Jun 13, 2025

timsehn reviewed Jun 13, 2025

View reviewed changes

integration-tests/bats/import-tables.bats Outdated Show resolved Hide resolved

codeaucafe commented Jun 15, 2025

View reviewed changes

go/cmd/dolt/commands/tblcmds/import.go Show resolved Hide resolved

macneale4 mentioned this pull request Jun 23, 2025

contribution test #9382

Merged

macneale4 requested changes Jun 24, 2025

View reviewed changes

macneale4 self-requested a review June 24, 2025 16:39

macneale4 approved these changes Jun 24, 2025

View reviewed changes

macneale4 merged commit 44d79e9 into dolthub:main Jun 24, 2025
38 of 42 checks passed

BrewTestBot mentioned this pull request Jun 26, 2025

dolt 1.55.3 Homebrew/homebrew-core#228192

Merged

Uh oh!

feat(import): validate primary keys early #9349

feat(import): validate primary keys early #9349

Uh oh!

Conversation

codeaucafe commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timsehn commented Jun 13, 2025

Uh oh!

timsehn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codeaucafe commented Jun 13, 2025

Uh oh!

timsehn commented Jun 13, 2025

Uh oh!

Uh oh!

codeaucafe commented Jun 23, 2025

Uh oh!

macneale4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codeaucafe commented Jun 24, 2025

Uh oh!

macneale4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codeaucafe commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codeaucafe commented Jun 13, 2025 •

edited

Loading