Skip to content

Fix escaping of names in IR#8880

Open
tlively wants to merge 5 commits into
mainfrom
no-escape-names-section
Open

Fix escaping of names in IR#8880
tlively wants to merge 5 commits into
mainfrom
no-escape-names-section

Conversation

@tlively

@tlively tlively commented Jul 1, 2026

Copy link
Copy Markdown
Member

The binary reader (but not the text parser) would previous escape names read from the names section and use those escaped names in the IR. But there is no reason to escape names in the IR; we can use the unmodified byte sequences instead. Remove this unnecessary escaping and always store names unescaped in the IR.

The binary writer was partially unescaping names when writing them back to the names section, but this is no longer necessary. After removing this, we simply preserve whatever names were present in the input, modulo any changes we might have made to avoid duplicate names.

Asyncify and wasm-split read function names as input from the command line and from manifest files and would previously escape these names on reading so they would match up with the escaped names in the IR. Now that the IR does not store escaped names, fix Asyncify and wasm-split to no longer escape these names. As a follow-on change we might consider unescaping these names instead, which would allow users to more easily pass names containing unprintable characters.

The binary reader (but not the text parser) would previous escape names read from the names section and use those escaped names in the IR. But there is no reason to escape names in the IR; we can use the unmodified byte sequences instead. Remove this unnecessary escaping and always store names unescaped in the IR.

The binary writer was further escaping names when writing them back to the names section. Remove this escaping so that we simply preserve whatever names were present in the input, modulo any deduplication we might have done.

Asyncify and wasm-split read function names as input from the command line and from manifest files and would previously escape these names on reading so they would match up with the escaped names in the IR. Now that the IR does not store escaped names, fix Asyncify and wasm-split to no longer escape these names. As a follow-on change we might consider _unescaping_ these names instead, which would allow users to more easily pass names containing unprintable characters.

As a drive-by, fix the printing in MinimizeImportsAndExports to properly use JSON escaping.
@tlively tlively requested a review from a team as a code owner July 1, 2026 01:13
@tlively tlively requested review from aheejin, kripken and stevenfontanella and removed request for a team July 1, 2026 01:13
@tlively

tlively commented Jul 1, 2026

Copy link
Copy Markdown
Member Author

@aheejin, it looks like you were right earlier about the IR storing escaped names. I was confused because that does not happen when the input is text. This will make everything more consistent and punts on the question of how and whether we should accept escaped names in the wasm-split manifests.

@aheejin aheejin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This makes things cleaner.

Can we also remove escape ane unescape in https://github.com/aheejin/binaryen/blob/main/src/ir/names.cpp and https://github.com/aheejin/binaryen/blob/main/src/ir/names.h I added in #8868?

Comment thread src/passes/MinifyImportsAndExports.cpp Outdated
Comment on lines +115 to +117
String::printEscapedJSON(std::cout, key.first.view()) << ", ";
String::printEscapedJSON(std::cout, key.second.view()) << ", ";
String::printEscapedJSON(std::cout, new_.view()) << "]";

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be printEscapedJSON and not printEscaped? And what are the differences?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pass is printing JSON, so it should be printing JSON escape sequences rather than WebAssembly text format escape sequences. An example of the difference is that JSON uses "\uABCD" for unicode character U+ABCD, whereas the WebAssembly text format would use "\u{ABCD}".

However, I've reverted this part of the change. printEscapedJSON expects the representation of its string argument to be WTF-16, but that is not the case here.

Comment thread src/wasm/wasm-binary.cpp
Comment on lines -1858 to -1878
void WasmBinaryWriter::writeEscapedName(std::string_view name) {
if (name.find('\\') == std::string_view::npos) {
writeInlineString(name);
return;
}
// decode escaped by escapeName (see below) function names
std::string unescaped;
for (size_t i = 0; i < name.size();) {
char ch = name[i++];
// support only `\xx` escapes; ignore invalid or unsupported escapes
if (ch != '\\' || i + 1 >= name.size() || !isHexDigit(name[i]) ||
!isHexDigit(name[i + 1])) {
unescaped.push_back(ch);
continue;
}
unescaped.push_back(
char((decodeHexNibble(name[i]) << 4) | decodeHexNibble(name[i + 1])));
i += 2;
}
writeInlineString({unescaped.data(), unescaped.size()});
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary writer was further escaping names when writing them back to the names section.

Isn't this unescaping names despite the name tells the opposite? Anyway, if the IR has unescaped names now and the name section should have the same, removing this looks correct.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, you're right. I'll update the description.

Comment thread src/ir/names.h
Name escape(Name name);
// Unescapes a WebAssembly identifier back into its original human-readable
// string.
std::string unescape(Name name);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we remove unescape?

It looks it's used in wasm-split --print-profile --unescape. Do we still need this given that IR has unescaped names already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants