fix: write and read figure JSON as UTF-8#5633
Open
LukeTheoJohnson wants to merge 1 commit into
Open
Conversation
write_json wrote figure JSON with Path.write_text(json_str) and read_json read it back with Path.read_text(), both omitting the encoding. On platforms whose default text encoding is not UTF-8 (e.g. cp1252 on Windows), writing a figure containing non-ASCII text raised UnicodeEncodeError and reading produced mojibake. write_html already passes "utf-8" explicitly; apply the same to the JSON I/O path so figures round-trip everywhere. Update the existing pathlib mock tests to assert the UTF-8 encoding.
b6cbd44 to
1c182c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
pio.write_json(fig, path)raisesUnicodeEncodeErrorfor a figure whose text contains a character outside the platform's default codec, on platforms whose default text encoding is not UTF-8. On Windows the default is cp1252:read_jsonhas the matching bug: it reads the file without specifying an encoding, so a UTF-8 JSON file is decoded with the platform codec (cp1252) and the text comes back mangled — or raisesUnicodeDecodeErroron byte sequences cp1252 leaves undefined. This works on macOS/Linux only because their default encoding is already UTF-8.What actually triggers it
Two conditions both have to hold, which is why it is easy to miss:
é,ö,°, the micro signµ(U+00B5), the em-dash — so they do not raise. It takes a character outside cp1252, such as Greekμ(U+03BC), CJK (中文), or an emoji, to trip it.orjson(the default when orjson is installed). orjson emits real UTF-8, so the string handed towrite_textstill contains the non-ASCII characters. The pure-Pythonjsonengine usesensure_ascii=True, escaping everything to\uXXXX, so its output is pure ASCII — it never trips the codec on write and reads back cleanly.Root cause
plotly/io/_json.pyopened the file without an encoding on both sides:write_json:path.write_text(json_str)read_json:path.read_text()write_htmlalready does this correctly (path.write_text(html_str, "utf-8")), and the same class of bug was previously fixed for HTML output (#3898). The JSON I/O path was simply missed.Fix
Pass
"utf-8"explicitly in bothwrite_jsonandread_json, matchingwrite_html. JSON is UTF-8 by default per RFC 8259.Verification
On Windows (
locale.getpreferredencoding(False)→cp1252, orjson installed):write_jsonof a figure titledμ 中文raisesUnicodeEncodeError: 'charmap' codec can't encode character 'μ'.read_jsonreturns the original title unchanged.test_write_json_pathlib/test_read_json_from_pathlibassert the"utf-8"argument is passed; both fail on the unpatched source and pass with the fix.tests/test_io/test_to_from_json.pyotherwise shows only the pre-existingFigureWidgetImportErrors (anywidget not installed) — no new regressions.Scope
Two-line source change plus test assertions. No behavior change on platforms that already default to UTF-8, or when the
jsonengine is in use.