Bases: FlashInferCutlassMoEPrepareAndFinalize
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
    
 finalize(
    output: Tensor,
    fused_expert_output: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    apply_router_weight_on_input: bool,
    weight_and_reduce_impl: TopKWeightAndReduce,
) -> None
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 prepare(
    a1: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    num_experts: int,
    expert_map: Optional[Tensor],
    apply_router_weight_on_input: bool,
    quant_config: FusedMoEQuantConfig,
) -> PrepareResultType
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
  Bases: FlashInferCutlassMoEPrepareAndFinalize
FlashInfer implementation using AllToAll communication.
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
  Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 finalize(
    output: Tensor,
    fused_expert_output: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    apply_router_weight_on_input: bool,
    weight_and_reduce_impl: TopKWeightAndReduce,
) -> None
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 prepare(
    a1: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    num_experts: int,
    expert_map: Optional[Tensor],
    apply_router_weight_on_input: bool,
    quant_config: FusedMoEQuantConfig,
) -> PrepareResultType
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
  Bases: FusedMoEPrepareAndFinalize
Base class for FlashInfer MoE prepare and finalize operations.
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
    
 _apply_router_weight_on_input(
    a1: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    apply_router_weight_on_input: bool,
) -> None
Apply router weight on input if needed.
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
    
 create_flashinfer_prepare_finalize(
    use_dp: bool,
    use_nvfp4: bool = False,
    enable_alltoallv: bool = False,
) -> FlashInferCutlassMoEPrepareAndFinalize
Factory function to create the appropriate FlashInfer implementation.
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 flashinfer_alltoall_combine(
    all2all_manager: All2AllManagerBase,
    output: Tensor,
    top_k: int,
    token_count: int,
    alltoall_info,
)
Source code in vllm/model_executor/layers/fused_moe/flashinfer_cutlass_prepare_finalize.py
  
 flashinfer_alltoall_dispatch(
    all2all_manager: All2AllManagerBase,
    global_num_tokens_cpu: list[int],
    x: Tensor,
    gs: Tensor,
    topk_ids: Tensor,
    topk_weights: Tensor,
    top_k: int,
    num_experts: int,
    quant_config: FusedMoEQuantConfig,
)