Skip to content

fix: route avxvnni FMA ops through fma3<avx2> kernels#1368

Open
DiamonDinoia wants to merge 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:fix/avxvnni-fma3-dispatch
Open

fix: route avxvnni FMA ops through fma3<avx2> kernels#1368
DiamonDinoia wants to merge 1 commit into
xtensor-stack:masterfrom
DiamonDinoia:fix/avxvnni-fma3-dispatch

Conversation

@DiamonDinoia

Copy link
Copy Markdown
Contributor

batch<T, avxvnni> derived from avx2, so fnma/fnms fell back to the generic neg(x*y)+z form (vxorpd + vfmadd) instead of the hardware vfnmadd/vfnmsub kernels registered for fma3. This bites -march=native on Alder/Meteor Lake and Zen 5, where default_arch resolves to avxvnni.

Derive avxvnni from fma3 instead. fma3 always derives from avx2 and its kernels are guarded by XSIMD_WITH_FMA3_AVX2, so when FMA is disabled the base is transparent and dispatch falls through to avx2 unchanged.

Validated with Intel SDE across avxvnni+FMA (-adl), fma3 (-hsw), avx2-only (-hsw) and avxvnni-without-FMA (-adl).

batch<T, avxvnni> derived from avx2, so fnma/fnms fell back to the
generic neg(x*y)+z form (vxorpd + vfmadd) instead of the hardware
vfnmadd/vfnmsub kernels registered for fma3<avx2>. This bites
-march=native on Alder/Meteor Lake and Zen 5, where default_arch
resolves to avxvnni.

Derive avxvnni from fma3<avx2> instead. fma3<avx2> always derives from
avx2 and its kernels are guarded by XSIMD_WITH_FMA3_AVX2, so when FMA
is disabled the base is transparent and dispatch falls through to avx2
unchanged.

Validated under Intel SDE across avxvnni+FMA (-adl), fma3<avx2> (-hsw),
avx2-only (-hsw) and avxvnni-without-FMA (-adl): correct results, full
test_batch suite passes, vfnmadd emitted only where FMA is enabled.
@DiamonDinoia DiamonDinoia force-pushed the fix/avxvnni-fma3-dispatch branch 2 times, most recently from c3bcc77 to 3a25348 Compare June 26, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant