GP1132: Scaling and Optimizing Large MoE Models

distance

NVIDIA GTC PARIS - Pavillon 7

calendar_today

June 12

3:00 PM - 3:45 PM - CET

Room: S01 At Perplexity, we serve production traffic on NVIDIA Hopper and NVIDIA Blackwell GPUs. Our in-house runtime, built on CUTLASS, FlashInfer, NVLink™, and NVSHMEM, serves models ranging from embeddings to large language models. Powered by NVIDIA GTC Paris

Speakers

Nandor Licker Perplexity AI

GP1132: Scaling and Optimizing Large MoE Models

Speakers

Partners

NVIDIA GTC PARIS