Onboard ragged sorting kernels from tpu inference to maxtext by NuojCheng · Pull Request #3814 · AI-Hypercomputer/maxtext · GitHub

NuojCheng · 2026-05-05T16:23:12Z

Description

This PR onboards two critical kernels: ragged_gather and ragged_gather_reduce.

The key difference lies in their routing and output shapes:

ragged_gather: Performs simultaneous permutation and fan-out (num_tokens x emb_dim → num_tokens x top_k x emb_dim).
ragged_gather_reduce: Adds an accumulation step for fan-in (num_tokens x top_k x emb_dim → num_tokens x emb_dim).

During the forward pass, we use ragged_gather for dispatch and ragged_gather_reduce for combine. In the backward pass, these roles are swapped.

FIXES: b/496676734

Tests

We observe performance advantage when ragged sort is enabled, while also notice some regression issue between JAX==0.10.0 and 0.10.1.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-05T16:29:24Z

Codecov Report

❌ Patch coverage is 11.77945% with 352 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/kernels/ragged/ragged_gather_reduce.py	9.09%	159 Missing and 1 partial ⚠️
src/maxtext/kernels/ragged/ragged_gather.py	9.70%	120 Missing and 1 partial ⚠️
src/maxtext/kernels/ragged/ragged_sort.py	8.45%	65 Missing ⚠️
src/maxtext/layers/moe.py	66.66%	4 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

NuojCheng added pull ready draft Draft PR and removed pull ready labels May 5, 2026

NuojCheng added 7 commits May 11, 2026 17:57

onboard ragged gather to moe

1768b38

integrate ragged gather in moe

1916d48

add correctness test

ef19b5d

use gather_reduce_sc

78153f0

remoave scatter add

9bafc9c

add ragged gather reduce supporting ragged gather backward pass

26878d9

replace ragged scatter with ragged gather reduce

6b14d68

NuojCheng changed the title ~~[Draft]~~ Onboard ragged sorting kernels from tpu inference to maxtext May 11, 2026

NuojCheng force-pushed the chengnuojin-ragged-gather-nightly branch from 803c1de to 4ac0bcf Compare May 11, 2026 18:25

update for jax nightly version

b1ce84c

NuojCheng force-pushed the chengnuojin-ragged-gather-nightly branch from 4ac0bcf to b1ce84c Compare May 11, 2026 18:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Onboard ragged sorting kernels from tpu inference to maxtext#3814

Onboard ragged sorting kernels from tpu inference to maxtext#3814
NuojCheng wants to merge 8 commits into
mainfrom
chengnuojin-ragged-gather-nightly

NuojCheng commented May 5, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

NuojCheng commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NuojCheng commented May 5, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading