🚧 This post is under construction 🚧
This will be a short “trip report” of the talks I watched over the week of the GTC 2025 (March 18 - 21). All talks are freely available online. You can read previous GTC trip reports here:
Summary
Speaker(s) | Talk |
---|---|
Huang | 🌟⭐GTC 2025 Keynote with NVIDIA CEO Jensen Huang ![]() |
Bhat et al. | Accelerated Python: The Community and Ecosystem |
Lelbach | ⭐ The CUDA C++ Developer’s Toolbox |
Armstrong | What’s CUDA All About Anyway? |
Jones | ⭐ CUDA: New Features and Beyond |
Goel et al. | Horizontal Scaling of LLM Training with JAX |
Tillet & Miller | Blackwell Programming for the Masses With OpenAI Triton |
Riehl | The CUDA Python Developer’s Toolbox |
Fang | ⭐ 1,001 Ways to Write CUDA Kernels in Python |
Evtushenko | ⭐ How You Should Write a CUDA C++ Kernel |
Other talks to be added.
Legend | 🌟 - Keynote | ⭐ - Best talks | ![]() |
Highlights and Thoughts
Jensen’s Keynote
I tweeted this while recording ADSP Episodes 226-228.
Jensen vs Bryce
— Conor Hoekstra (@codereport.bsky.social) March 20, 2025 at 10:50 AM
[image or embed]
Here is the standalone slide from Jensen’s deck.
Accelerated Python: The Community and Ecosystem
Speakers: Anshuman Bhat, Jeremy Tanner, Andy Terrel
This talk covers Python initiatives at NVIDIA. Pretty crazy to think that the first decade of GPU Python initiatives all started outside of NVIDIA.
The CUDA C++ Developer’s Toolbox
Speaker: Bryce Adelstein Lelbach
Screenshots from Bryce’s talk.
What’s CUDA All About Anyway?
Speaker: Rob Armstrong
This image is similar to the 2nd screenshot from Bryce’s talk above. Rob also notes that Stephen has a slide in his talk covering the same thing. So 3 different slides for the same ecosystem (4 if you include Jensen’s slide).
CUDA: New Features and Beyond
Speaker: Stephen Jones
This seems to be the slide that Rob (Armstrong) referred to.
And a Python version of this graphic:
And a slightly different graphic for kernel authoring:
Blackwell Programming for the Masses With OpenAI Triton
Speakers: Phil Tillet & dePaul Miller
An interesting graphic showing where OpenAI Triton sits between PyTorch and CUDA C++ for authoring kernels.
How You Should Write a CUDA C++ Kernel
Speakers: Georgii Evtushenko
This is a fantastic talk. Walks you through how to handroll your own transform
kernel using nvbench
while comparing it to speed of light thrust::transform
. A must watch.