Skip to content

fix(hyperframes-media): npx spawn without shell:true fails silently on Windows#1845

Merged
miguel-heygen merged 2 commits into
mainfrom
fix/tts-npx-windows-shell
Jul 3, 2026
Merged

fix(hyperframes-media): npx spawn without shell:true fails silently on Windows#1845
miguel-heygen merged 2 commits into
mainfrom
fix/tts-npx-windows-shell

Conversation

@miguel-heygen

Copy link
Copy Markdown
Collaborator

Root-caused from two independent post-release feedback reports naming the same site (Kokoro TTS silently failing on Windows).

Root cause

lib/tts.mjs synthesizeOne() spawns npx via plain spawn(cmd, args). On Windows, npx resolves to npx.cmd — a shell script Node's spawn() cannot exec directly without shell: true. It fails ENOENT, and spawnP's error listener silently turns that into ok:false ("TTS failed"), with no indication of the real cause. Confirmed by two independent reports:

"audio.mjs kokoro path fails silently on Windows: lib/tts.mjs synthesizeOne() spawns 'npx' via spawn(cmd,args) without shell:true, which fails on Windows because npx resolves to npx.cmd (ENOENT/spawn error, swallowed by spawnP's stdio:ignore)."

"Windows: audio engine (hyperframes-media/scripts/lib/tts.mjs) spawns 'npx' without shell -> ENOENT, so all Kokoro TTS lines silently 'TTS failed'."

Fix

Scope shell: true to the npx invocation specifically (cmd === "npx" && platform === "win32") — python3/ffmpeg are real binaries and don't need it, so their behavior is unchanged.

platform and spawnFn are injectable params (default process.platform / the real spawn) so the win32 branch is unit-testable without mocking node:child_process (its ESM exports are non-configurable — mock.method can't patch them) or the real process.platform.

Tests

tts.spawn.test.mjs — 3 cases: shell enabled for npx on win32, disabled for npx on darwin/linux, disabled for non-npx commands even on win32. Picked up automatically by the existing skills/**/*.test.mjs CI convention.

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed at 1895158926 (batch review, Group A TTS/audio; layering on Magi's own vetting).
Note: bot-authored (Magi via miguel-heygen); COMMENT-quality — stamp routing per protocol.

Summary — Scope shell: true to npx on win32 in spawnP (lib/tts.mjs) so npx.cmd can be exec'd from Node; extract platform + spawnFn as injectable params (defaulting to process.platform / real spawn) so the win32 branch is unit-testable without patching ESM's non-configurable exports.

Scoping to cmd === "npx" && platform === "win32" is exactly right — python3 and ffmpeg (real binaries in this file) don't need shell and stay unchanged. Injectable params for testability is a clean pattern and the test file exercises all three branches (win32+npx, darwin+npx, win32+non-npx).

Concerns

  • 🟠 lib/tts.mjs:88-95shell: true on Windows means args is joined into a single command string and passed to cmd.exe. The synthesizeOne kokoro path at line 141 passes writeTmpText(text) (a file path, not user text — safe) and voiceId / wavRel / lang as args. writeTmpText writes to a temp path Node controls, voiceId is derived from resolveVoiceId (Kokoro's list is a fixed enum in voice.mjs), wavRel is assets/voice/${id}.wav where id is line id from the request JSON. id and lang are read from a request JSON that the user can craft (audio_request.json at audio.mjs:52). If someone were ever to pipe request JSON from a less-trusted source (a bot, a customer template), a crafted id containing "& malicious.exe" would land in the shell command as --output assets/voice/<id>.wav and cmd.exe would happily split-execute. This is hyperframes-media on a dev laptop today so the blast radius is nil — but the request-JSON is the sort of thing that historically drifts toward automated callers. Consider either:
    (i) documenting "audio_request.json is trusted input; do not pipe from untrusted sources" in audio.mjs's header comment, OR
    (ii) validating id / lang against a strict [A-Za-z0-9_-]+ pattern before they land in TTS args.
    Not blocking — the current threat model doesn't include untrusted requests — but the safety property is now load-bearing on that assumption in a way it wasn't before.
  • 🟠 No Windows CI coverage — the test file mocks platform="win32" via injected param, which validates that the code branches correctly, but doesn't validate that shell: true + npx.cmd actually resolves on a real Windows machine. spawnFn is fake in the tests, so a real regression in the npx.cmd-vs-shell:true interaction across Node versions wouldn't be caught. The PR fixes a bug that only manifests on Windows via a mechanism that only tests on non-Windows CI — the confidence is entirely in the mechanism being straightforward, which it is. Flag: 🟠 rather than 🔴 because the mechanism is well-understood, but suggest linking one of the reporter's manual-verification runs (or a follow-up PR adding Windows CI) in the description.

Questions

  • ↩️ spawnP is now exported (previously module-private). Is it OK to export from lib/tts.mjs? Grepping the branch, only the test file imports it — no other consumer in the repo, but the export widens the module's public surface. Fine, just confirming intent.
  • ↩️ The spawnFn param defaults to spawn from node:child_process (top-of-file import). If a future caller wants to override for a different reason (dry-run, logging), the plumbing is now there but not documented. Minor — the JSDoc-style comment above spawnP explains it's for testability specifically. OK to leave.

Coordination note — Both #1877 and this PR touch lib/tts.mjs. Different functions, different regions of the file — no functional conflict, but both bump skills-manifest.json hyperframes-media hash and file count. Whichever merges second needs a trivial hash/count rebase on the manifest. Suggest coordinating on merge order (both are safe to land independently otherwise).

— Rames D Jusso

@miguel-heygen miguel-heygen force-pushed the fix/tts-npx-windows-shell branch 2 times, most recently from 8daf685 to f81f6ac Compare July 2, 2026 23:07
Comment thread skills/hyperframes-media/scripts/lib/tts.mjs Fixed
Comment thread skills/hyperframes-media/scripts/lib/tts.mjs Fixed
Comment thread skills/hyperframes-media/scripts/lib/tts.mjs Fixed
@miguel-heygen miguel-heygen force-pushed the fix/tts-npx-windows-shell branch from f81f6ac to 5233527 Compare July 3, 2026 00:24

@james-russo-rames-d-jusso james-russo-rames-d-jusso left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2 verification — reviewed at 5233527d.

Summary — Magi pushed 5233527d ("fix(hyperframes-media): avoid shell true for windows npx") to address R1's two findings: (1) shell:true shell-injection surface, and (2) no Windows CI so fix unverified on target platform.

Verdict — 🟢 Resolved. Better than the ack described: no shell:true AND no cmd.exe at all.

Verification details

  • shell:true fully gone. Grep of skills/hyperframes-media/scripts/lib/tts.mjs at HEAD returns no shell: sink anywhere — only doc comments referencing why it's avoided. resolveSpawnCommand returns opts: { stdio: "ignore", windowsHide: true, ...opts } — no shell option (tts.mjs:124).
  • Fix shape: on platform === "win32" && cmd === "npx", invokes node <resolved-npx-cli.js> <...args> directly via spawn (tts.mjs:120-125). resolveNpxCliFromNpmExecPath at :96-103 resolves npx-cli.js next to npm-cli.js (the JS entry the .cmd shim would have called). Since we spawn node (a real binary) with argv, there is NO shell parser in the call chain — no &, |, ^, "-quoting concerns. Args from request JSON pass through as pure argv data.
  • No metachar rejection needed: the sink is child_process.spawn in argv mode (not execFile or shell string), so metacharacters have no meaning — no rejection required. This is a stronger property than metachar-rejection would provide.
  • Fallback safety at :103, :122: if npm_execpath env is missing or npx-cli.js doesn't exist, resolveSpawnCommand returns null and spawnP resolves {status: -1} — same shape as the current failure mode, no crash.
  • Windows CI verification: Tests on windows-latest + Render on windows-latest PASS at 5233527d (Windows render verification workflow). This addresses R1's second finding about unverified Windows behavior.
  • CodeQL: top-level check SUCCESS at 5233527d. The three PR-review comments from github-advanced-security[bot] on line 145 (Uncontrolled/Indirect command line, Shell command built from env values) are audit-informational — they flag that spawn args include env-derived values (npm_execpathnpx-cli.js path), but there is no shell sink to exploit. Alert-class is informational, not error; CodeQL check remains green.
  • Tests at tts.spawn.test.mjs:0-97: 5 cases — resolves npx-cli path, routes npx via node+npx-cli on win32 without shell, preserves shell metacharacters as argv data (explicit "hello & calc" case), does not shell for npx on darwin/linux, does not shell for non-npx on win32. Covers the R1 concern surface.

Residual notes (not blocking)

  • Skills-manifest.json hash bump conflicts with #1877 and #1862 (all three change hyperframes-media.hash from 096b2ab0a43b05dd). Sequential rebase required at merge time; second and third mergers will rebase-and-regen the manifest.

— Rames D Jusso

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — verified CI green (0 fail / 0 pending) + no open CR at this head. Non-author stamp clearing the review gate on the Magi self-initiated draft-pass batch, which James greenlit and RDJ batch-cleared: both security holds re-verified at R2 (#1866 chrome-shell reclaim-race closed via the reclaim-gate + mtime recheck; #1845 Windows npx shell-injection closed — no cmd.exe, node <npx-cli.js> with pure argv), zero drift on the other nine, all green.

Merge via Magi's normal path (no admin-merge). Ordering note: the skills-manifest triple-conflict on #1877 / #1862 / #1845 (all bump hyperframes-media.hash) needs sequential rebase + regen at merge time; the rest land in any order.

Rames Jusso

@miguel-heygen miguel-heygen force-pushed the fix/tts-npx-windows-shell branch from 5233527 to b682f36 Compare July 3, 2026 00:49

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — verified CI green (0 fail / 0 pending) + no open CR at this head. Non-author stamp clearing the review gate on the Magi self-initiated draft-pass batch, which James greenlit and RDJ batch-cleared: both security holds re-verified at R2 (#1866 chrome-shell reclaim-race closed via the reclaim-gate + mtime recheck; #1845 Windows npx shell-injection closed — no cmd.exe, node <npx-cli.js> with pure argv), zero drift on the other nine, all green.

Merge via Magi's normal path (no admin-merge). Ordering note: the skills-manifest triple-conflict on #1877 / #1862 / #1845 (all bump hyperframes-media.hash) needs sequential rebase + regen at merge time; the rest land in any order.

Rames Jusso

…n Windows

Two independent user reports of Kokoro TTS silently failing on Windows,
both naming the same site: lib/tts.mjs synthesizeOne() spawns "npx" via
plain spawn(cmd, args). On Windows npx resolves to npx.cmd, which Node's
spawn() cannot exec without shell:true — it fails ENOENT, and spawnP's
"error" listener turns that into a plain ok:false ("TTS failed") with no
indication of the real cause.

Scope the fix to the npx call specifically (python3/ffmpeg are real
binaries and don't need it), with the platform/spawn function injectable
so the win32 branch is testable without mocking node:child_process (its
ESM exports are non-configurable, so mock.method can't patch it) or the
real process.platform.
@miguel-heygen miguel-heygen force-pushed the fix/tts-npx-windows-shell branch from b682f36 to 59e33f6 Compare July 3, 2026 00:57

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVE — verified CI green (0 fail / 0 pending) + no open CR at this head. Non-author stamp clearing the review gate on the Magi self-initiated draft-pass batch, which James greenlit and RDJ batch-cleared: both security holds re-verified at R2 (#1866 chrome-shell reclaim-race closed via the reclaim-gate + mtime recheck; #1845 Windows npx shell-injection closed — no cmd.exe, node <npx-cli.js> with pure argv), zero drift on the other nine, all green.

Merge via Magi's normal path (no admin-merge). Ordering note: the skills-manifest triple-conflict on #1877 / #1862 / #1845 (all bump hyperframes-media.hash) needs sequential rebase + regen at merge time; the rest land in any order.

Rames Jusso

@miguel-heygen miguel-heygen merged commit 3b67918 into main Jul 3, 2026
39 checks passed
@miguel-heygen miguel-heygen deleted the fix/tts-npx-windows-shell branch July 3, 2026 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants