[jdbc-v2] Unescape backtick-quoted INSERT column names before schema lookup#2899
[jdbc-v2] Unescape backtick-quoted INSERT column names before schema lookup#2899polyglotAI-bot wants to merge 2 commits into
Conversation
…lookup The ANTLR4 parser stored the raw parse-tree text of each INSERT column name, keeping backticks, so the canonical Nested sub-column wire form `directory`.`id` failed the by-name schema lookup in WriterStatementImpl and threw NoSuchColumnException. Unescape each identifier component and rejoin with '.', mirroring how the table/database identifiers are already handled in the same listener. Fixes: #2896
|
Repository collaborators can run the JMH benchmark suite against this PR by commenting: Optional regression threshold override (Δ% on Time or Alloc/op; defaults to 10%): Only one benchmark run per PR is active at a time — issuing a new |
There was a problem hiding this comment.
Pull request overview
Fixes a JDBC v2 ANTLR4 parsing mismatch where backtick/double-quoted INSERT column names (including dotted Nested sub-columns like `directory`.`id`) were stored with quotes intact and then failed schema lookup by name in the beta RowBinary writer path.
Changes:
- Unescape each
nestedIdentifiercomponent inenterInsertStmtand re-join with.before storinginsertColumns. - Add a regression test covering quoted/unquoted, mixed quoting, embedded dots, and escaped backticks in INSERT column lists.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| jdbc-v2/src/main/java/com/clickhouse/jdbc/internal/SqlParserFacade.java | Normalize INSERT column names by unescaping per-identifier component to match schema column names. |
| jdbc-v2/src/test/java/com/clickhouse/jdbc/internal/BaseSqlParserFacadeTest.java | Adds regression coverage for unescaped INSERT column name extraction across parser backends. |
Address review feedback on #2899: assertInsertColumns silently skipped the per-name assertions whenever getInsertColumns() was null, which is correct for the JavaCC backend (it does not extract INSERT column names) but would let an ANTLR4 regression — where column extraction stops working and returns null — pass unnoticed. Now null is allowed only for the JavaCC backend; the ANTLR4 backends must extract the columns, so a null fails the test loudly.
TriageCategory: Summary What this impacts
Concerns
Required reviewer action
|
|
Client V2 CoverageCoverage Report
Class Coverage
|
JDBC V2 CoverageCoverage Report
Class Coverage
|
JDBC V1 CoverageCoverage Report
Class Coverage
|
Client V1 CoverageCoverage Report
Class Coverage
|



Description
Fixes #2896.
When the beta RowBinary writer is enabled (
DriverProperties.BETA_ROW_BINARY_WRITER→WriterStatementImpl), the JDBC driver maps the INSERT column list parsed from the SQL to the server table schema by name. In the ANTLR4 parser,ParsedPreparedStatementListener.enterInsertStmtstored each INSERT column name as the raw parse-tree text viaNestedIdentifierContext.getText(), which keeps the backticks. So the canonical wire form of aNestedsub-column —`directory`.`id`— was stored verbatim and never matched the clean schema column namedirectory.id, andWriterStatementImpl'stableSchema.getColumnByName(...)lookup missed and threwNoSuchColumnException. Even a simple single backtick-quoted column (`id`) hit the same miss. Note the asymmetry the fix removes: the table and database identifiers in the very same method were already unescaped (ClickHouseSqlUtils.unescape(...)), only the INSERT column names were not.ClickHouseSqlUtils.unescapeonly strips the outer quote pair, so unescaping the whole compound`directory`.`id`as one string would leave the inner backticks. The fix instead unescapes eachidentifier()component and rejoins them with., which is a verbatim reuse of the pattern the same listener already uses for the database identifier inextractAndSetDatabaseAndTable(...).Changes
jdbc-v2/.../internal/SqlParserFacade.java— inParsedPreparedStatementListener.enterInsertStmt, unescape each component of every INSERT columnnestedIdentifier(identifier().stream().map(ClickHouseSqlUtils::unescape).collect(joining("."))) instead of taking the rawgetText(). This covers both ANTLR4 backends (ANTLR4ParserandANTLR4AndParamsParser, whose listener inherits this method). The JavaCC backend does not extract INSERT column names, so it is unaffected.jdbc-v2/.../internal/BaseSqlParserFacadeTest.java— newtestInsertColumnNamesAreUnescapedregression test (runs against all three parser backends).Test
testInsertColumnNamesAreUnescapedasserts the parsedgetInsertColumns()are the clean server names for:`directory`.`id`,`directory`.`name`→directory.id,directory.name`id`→iddirectory.id/idmust keep their already-clean values (proves the fix is byte-for-byte identical for the unquoted path)`directory`.id,directory.`name`)"directory"."id") and a mixed backtick/double-quote name`a.b`stays one column)and` `` formsThe per-name assertion is guarded on
getInsertColumns() != nullbecause only the ANTLR4 backends extract INSERT column names (JavaCC leaves them null). Verified the test fails onmainfor both ANTLR4 backends (directory.id != \directory`.`id`) and **passes** with the fix; the fullAntlr4ParserTest/Antlr4ParamsParserTest/JavaCCParserTest` suites stay green (1146 tests, 0 failures) with no changes to existing tests.changes_checklist.md — "String, SQL, or serialized output changed"
ClickHouseSqlUtils.unescape, applied per identifier component exactly as table/database identifiers already are — consistent with the documented identifier-quoting rules, not a new escaping scheme.NoSuchColumnExceptionfor legal backtick-quoted INSERTs). Nodocs/features.mdchange needed — this restores expected behavior for an already-documented feature (INSERT with optional column lists), it does not add or alter a documented feature.Pre-PR validation gate
main, passes on branch)AGENTS.md/docs/changes_checklist.md/docs/features.mdNote for reviewers
This is the Java sibling of the same identifier-unescaping defect filed against the Python client in ClickHouse/clickhouse-connect#820 (there
unescape_identifieronly peeled the outer backtick pair). No backport is required from my side; maintainers own backport decisions.