Fix generic im2row NHWC layout to match Python reference#20708
Conversation
Summary: The generic Cadence `im2row` kernel wrote its NHWC (`channel_last=True`) output in kernel-position-major order `[kp][c]`, but the operator contract — defined by the Python reference in `ref_implementations.py` via `torch.nn.functional.unfold` — is channel-major `[c][kp]`, i.e. column index `c*(kH*kW) + kh*kW + kw`. The conv-lowering pass `ReplaceConvWithIm2RowAndLinear` packs the matmul weights in the same `[c][kp]` order (permute `[OC,kH,kW,IC]` -> `[OC,IC,kH,kW]` -> `[OC,K]`), so the generic kernel's `[kp][c]` output was transposed relative to the weights. This rewrites the generic NHWC branch to write `[c][kp]` (per-channel scatter `data_col[i_col*channels_col + c*num_kp + kp]`) Differential Revision: D110508326
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20708
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 2312458 with merge base 8965e51 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@ethansfng has exported this pull request. If you are a Meta employee, you can view the originating Diff in D110508326. |
This PR needs a
|
Summary:
The generic Cadence
im2rowkernel wrote its NHWC (channel_last=True) output in kernel-position-major order[kp][c], but the operator contract — defined by the Python reference inref_implementations.pyviatorch.nn.functional.unfold— is channel-major[c][kp], i.e. column indexc*(kH*kW) + kh*kW + kw. The conv-lowering passReplaceConvWithIm2RowAndLinearpacks the matmul weights in the same[c][kp]order (permute[OC,kH,kW,IC]->[OC,IC,kH,kW]->[OC,K]), so the generic kernel's[kp][c]output was transposed relative to the weights.This rewrites the generic NHWC branch to write
[c][kp](per-channel scatterdata_col[i_col*channels_col + c*num_kp + kp])Differential Revision: D110508326