Skip to content

Add json_decode flags for duplicate JSON object keys#22473

Open
PurHur wants to merge 1 commit into
php:masterfrom
PurHur:json-decode-duplicate-key-flags
Open

Add json_decode flags for duplicate JSON object keys#22473
PurHur wants to merge 1 commit into
php:masterfrom
PurHur:json-decode-duplicate-key-flags

Conversation

@PurHur

@PurHur PurHur commented Jun 26, 2026

Copy link
Copy Markdown

Summary

JSON objects can legally repeat the same key (RFC 8259 does not forbid it), but PHP has only ever given you one outcome: the last value wins. That is a fine default, yet it is not great when you are parsing data that actually relies on duplicate keys and you need either every value or a controlled merge — not an implicit overwrite or a full deep merge every time.

This PR adds two opt-in json_decode() flags:

  • JSON_DUPLICATE_KEY_ARRAY — collect duplicate values for the same key into a list
  • JSON_DUPLICATE_KEY_MERGE — recursively merge nested objects/arrays when a key repeats; scalars still overwrite (same as today)

If you do not pass either flag, nothing changes. The two flags cannot be combined.

The new behavior is wired through the existing JSON parser method table, so the default decode path keeps using the original object_update implementation.

Motivation

Mainly std compatibility / predictability around duplicate keys. Right now callers have to pre-process JSON or work around last-key-wins if their format uses repeated names on purpose. Having explicit flags makes that choice visible instead of hoping the parser does what you need.

Test plan

  • make test TESTS=ext/json/tests/ — all JSON tests pass
  • PHPT coverage for default last-key-wins BC, merge, array, and invalid flag combination
  • Verified default-path output matches master for a range of decode scenarios

Happy to move this through an RFC / internals discussion if that is required before merge.

Made with Cursor

RFC 8259 allows the same name to appear more than once in a JSON object,
but it does not say what a parser should do with the values. PHP has
always used last-key-wins, which is reasonable as a default, yet there are
real payloads where you want every value instead of silently dropping the
earlier ones or always deep-merging nested structures.

This adds two opt-in flags:

- JSON_DUPLICATE_KEY_ARRAY keeps each duplicate value under the same key
  as a list.
- JSON_DUPLICATE_KEY_MERGE recursively merges nested objects/arrays when
  the same key appears again; scalars still overwrite like today.

The flags are mutually exclusive. If neither is passed, behavior is
unchanged. The merge/array logic is hooked up through the existing parser
method table so the default decode path stays on the original
object_update implementation.

Co-authored-by: Cursor <cursoragent@cursor.com>
@PurHur

PurHur commented Jun 26, 2026

Copy link
Copy Markdown
Author

Benchmarked this against a clean master build (same ./configure --disable-all --enable-cli, -O2, 200k iterations per scenario). Wanted to make sure the default path does not regress before asking anyone to review the flag behavior.

Default / legacy path (flags=0) — no meaningful regression. Deltas vs baseline are mostly within a few percent (noise on the really small scalar cases is higher because they finish in ~0.04 µs/op):

Scenario baseline feature delta
unique 50 keys, assoc 2.792 µs/op 2.776 µs/op -0.6%
unique 200 keys, assoc 10.961 µs/op 10.949 µs/op -0.1%
api response 25 items 14.138 µs/op 13.913 µs/op -1.6%
dup scalar 50x, flags=0 2.704 µs/op 2.713 µs/op +0.3%
dup nested 20x, flags=0 3.580 µs/op 3.557 µs/op -0.6%

Also ran a separate output parity check (20 default-path payloads, assoc/object/scalars/nested dupes) — baseline and feature output is byte-identical.

New flag paths (feature build only, as expected a bit slower on duplicate-heavy JSON because we actually do extra work):

Scenario feature
dup scalar 50x, JSON_DUPLICATE_KEY_MERGE 2.803 µs/op
dup scalar 50x, JSON_DUPLICATE_KEY_ARRAY 2.793 µs/op
dup nested 20x, JSON_DUPLICATE_KEY_MERGE 3.849 µs/op
dup nested 20x, JSON_DUPLICATE_KEY_ARRAY 3.715 µs/op

Those are roughly on par with the default duplicate-key cases above — the extra cost only shows up when you opt in.

The merge/array handlers are selected once via the parser method table at init time, so the hot path for normal json_decode() stays on the original object_update code.

Build/env if anyone wants to reproduce locally:

./configure --disable-all --enable-cli && make -j$(nproc) sapi/cli/php
sapi/cli/php benchmarks/bench_json_decode.php 200000   # feature tree only

(Comparison script is in my fork branch if useful; not part of the PR diff.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants