NVIDIA GTC 2025 Trip Report

Conor Hoekstra · March 20, 2025

GTC Trip Report

This will be a short “trip report” of the talks I watched over the week of the GTC 2025 (March 18 - 21). All talks are freely available online. You can read previous GTC trip reports here:

Summary

Speaker(s)	Talk
Huang	🌟⭐GTC 2025 Keynote with NVIDIA CEO Jensen Huang
Bhat et al.	Accelerated Python: The Community and Ecosystem
Lelbach	⭐ The CUDA C++ Developer’s Toolbox
Armstrong	What’s CUDA All About Anyway?
Jones	⭐ CUDA: New Features and Beyond
Goel et al.	Horizontal Scaling of LLM Training with JAX
Tillet & Miller	Blackwell Programming for the Masses With OpenAI Triton
Riehl	The CUDA Python Developer’s Toolbox
Fang	⭐ 1,001 Ways to Write CUDA Kernels in Python
Evtushenko	⭐ How You Should Write a CUDA C++ Kernel
Maynard	Build CUDA Software at the Speed of Light
Cecka & Awatramani	Programming Blackwell Tensor Cores with CUTLASS
Sun & Bando	Enable Tensor Core Programming in Python with CUTLASS 4.0

Legend

🌟 - Keynote

⭐ - Best talks

- Available on YouTube

Highlights and Thoughts

Jensen’s Keynote

I tweeted this while recording ADSP Episodes 226-228.

Jensen vs Bryce

[image or embed]
— Conor Hoekstra (@codereport.bsky.social) March 20, 2025 at 10:50 AM

Here is the standalone slide from Jensen’s deck.

Accelerated Python: The Community and Ecosystem

Speakers: Anshuman Bhat, Jeremy Tanner, Andy Terrel

This talk covers Python initiatives at NVIDIA. Pretty crazy to think that the first decade of GPU Python initiatives all started outside of NVIDIA.

Screenshot from 2025-03-19 15-06-45

The CUDA C++ Developer’s Toolbox

Speaker: Bryce Adelstein Lelbach

Screenshots from Bryce’s talk.

Screenshot from 2025-03-20 02-07-19

What’s CUDA All About Anyway?

Speaker: Rob Armstrong

This image is similar to the 2nd screenshot from Bryce’s talk above. Rob also notes that Stephen has a slide in his talk covering the same thing. So 3 different slides for the same ecosystem (4 if you include Jensen’s slide).

CUDA: New Features and Beyond

Speaker: Stephen Jones

This seems to be the slide that Rob (Armstrong) referred to.

And a Python version of this graphic:

And a slightly different graphic for kernel authoring:

Blackwell Programming for the Masses With OpenAI Triton

Speakers: Phil Tillet & dePaul Miller

An interesting graphic showing where OpenAI Triton sits between PyTorch and CUDA C++ for authoring kernels.

How You Should Write a CUDA C++ Kernel

Speakers: Georgii Evtushenko

This is a fantastic talk. Walks you through how to handroll your own transform kernel using nvbench while comparing it to speed of light thrust::transform. A must watch.

Share: Twitter, Facebook