NVIDIA GTC 2025 Trip Report

Conor Hoekstra · March 20, 2025

🚧 This post is under construction 🚧

This will be a short “trip report” of the talks I watched over the week of the GTC 2025 (March 18 - 21). All talks are freely available online. You can read previous GTC trip reports here:

Summary

Speaker(s) Talk
Huang 🌟⭐GTC 2025 Keynote with NVIDIA CEO Jensen Huang image
Bhat et al. Accelerated Python: The Community and Ecosystem
Lelbach ⭐ The CUDA C++ Developer’s Toolbox
Armstrong What’s CUDA All About Anyway?
Jones ⭐ CUDA: New Features and Beyond
Goel et al. Horizontal Scaling of LLM Training with JAX
Tillet & Miller Blackwell Programming for the Masses With OpenAI Triton
Riehl The CUDA Python Developer’s Toolbox
Fang ⭐ 1,001 Ways to Write CUDA Kernels in Python
Evtushenko ⭐ How You Should Write a CUDA C++ Kernel

Other talks to be added.

Legend 🌟 - Keynote ⭐ - Best talks image - Available on YouTube

Highlights and Thoughts

Jensen’s Keynote

I tweeted this while recording ADSP Episodes 226-228.

Jensen vs Bryce

[image or embed]

— Conor Hoekstra (@codereport.bsky.social) March 20, 2025 at 10:50 AM

Here is the standalone slide from Jensen’s deck.

image

Accelerated Python: The Community and Ecosystem

Speakers: Anshuman Bhat, Jeremy Tanner, Andy Terrel

This talk covers Python initiatives at NVIDIA. Pretty crazy to think that the first decade of GPU Python initiatives all started outside of NVIDIA.

Screenshot from 2025-03-19 15-06-45

The CUDA C++ Developer’s Toolbox

Speaker: Bryce Adelstein Lelbach

Screenshots from Bryce’s talk.

image Screenshot from 2025-03-20 02-07-19 Screenshot from 2025-03-20 02-08-48 Screenshot from 2025-03-20 02-12-57 Screenshot from 2025-03-20 02-22-46

What’s CUDA All About Anyway?

Speaker: Rob Armstrong

This image is similar to the 2nd screenshot from Bryce’s talk above. Rob also notes that Stephen has a slide in his talk covering the same thing. So 3 different slides for the same ecosystem (4 if you include Jensen’s slide).

image

CUDA: New Features and Beyond

Speaker: Stephen Jones

This seems to be the slide that Rob (Armstrong) referred to.

image

And a Python version of this graphic:

image

And a slightly different graphic for kernel authoring:

image

Blackwell Programming for the Masses With OpenAI Triton

Speakers: Phil Tillet & dePaul Miller

An interesting graphic showing where OpenAI Triton sits between PyTorch and CUDA C++ for authoring kernels.

image

How You Should Write a CUDA C++ Kernel

Speakers: Georgii Evtushenko

This is a fantastic talk. Walks you through how to handroll your own transform kernel using nvbench while comparing it to speed of light thrust::transform. A must watch.

Twitter, Facebook