GP1132: Scaling and Optimizing Large MoE Models
distance
NVIDIA GTC PARIS - Pavillon 7
calendar_today
June 12
3:00 PM - 3:45 PM - CET
Room: S01
At Perplexity, we serve production traffic on NVIDIA Hopper and NVIDIA Blackwell GPUs. Our in-house runtime, built on CUTLASS, FlashInfer, NVLink™, and NVSHMEM, serves models ranging from embeddings to large language models. Powered by NVIDIA GTC Paris