In scenarios like RLVR, training MOE models can be unstable. This instability arises because differences between the computation in the training and inference engines can lead to inconsistent router ...