NVIDIA's CEO handed Stanford students the playbook for the next 10 years
Jensen Huang on design philosophy, energy math, agent architecture, and why the people scaring you about AI are wrong
There is a number Jensen Huang keeps using, and most people hear it wrong. 1,000,000x. That is how much faster NVIDIA made computing over the last 10 years, against the 10x that Moore’s Law would have handed you.
That gap is the reason AI exists at all. It is why researchers trained on the entire internet instead of curating datasets, and why everything is getting disrupted now instead of in 2040.
I watched the full Stanford CS153 lecture so you do not have to. Here are the 10 things that matter for any founder, operator, or investor paying attention.
Here are the 10 things that matter 👇
Together with Attio:
Jensen’s whole point is that agents stop demoing and start doing the work. Attio is where that already happens to your revenue. It is the AI CRM that runs around the clock, turning every signal into one living view of every account.
Here is what that looks like inside your pipeline:
▫️ Agents on every account that research, qualify, and move each deal forward while you sleep
▫️ One live picture of every account, built from emails, meetings, and agent activity as it happens
▫️ Ask Attio anything about your business and get instant answers and actions from one chat thread
It is the CRM for the new way of going to market. Join the 30,000+ teams already on it.
1. The 1,000,000x number holds, and the reason behind it is the only strategy lesson you need this decade
Moore’s Law is over. It has been for a decade.
“In the case of NVIDIA and co-design, we got 1,000,000x over 10 years. Somewhere between 100,000x and 1,000,000x. When you’re talking about numbers that big, it really doesn’t matter.”
Dennard scaling collapsed around 2015, and without it Moore’s Law compounds to maybe 10x over a decade. Semiconductor physics ran out of road, and NVIDIA delivered the full million anyway.
The mechanism is co-design: optimizing CPUs, GPUs, networking, switches, storage, software, and compilers at the same time, against the same objective, instead of leaving each layer to its own team. It is the same vertical integration logic that turns a hardware company into a compounding machine.
Jensen made it concrete with RISC. John Hennessy at Stanford proved that a simpler instruction set co-designed with a compiler beats two separately optimized systems every time, because the whole outperforms the sum of the parts. Scale that across an entire compute stack and a million-x follows.
The downstream effect is not a faster computer, it is a different category of possibility. When compute gets a million times faster, researchers stop asking which data to use and reach for all of it, the entire internet. That abundance is what unlocked modern AI, not a breakthrough algorithm but a design philosophy.
Try this → before your next architecture decision, ask whether you are optimizing a layer or co-designing across layers. The answer predicts your ceiling.
2. NVIDIA built a multi-billion-dollar system with zero potential customers, and first principles is how you do that without going insane
The most expensive supercomputer ever sold cost $350 million.
“You would have precisely zero customers. The reason for that is because the most expensive thing that has ever been sold was $350 million. And you’re building something that’s multiple billions of dollars. So you’re building for a precisely marketplace of zero.”
Jensen did not survey the market, he reasoned through the problem. Pre-training was going to be enormous, the systems to run it at scale would cost more than anything ever built, and no customer existed yet, so they built it anyway. Hopper was designed for that bet, and the architecture landed exactly when the demand arrived.
The lesson is narrower than “be contrarian.” Market research tells you what customers want today. First-principles reasoning tells you what they will need to want for the world to work the way the evidence says it is heading. One builds the future, the other optimizes the present, and the best founders know which question they are answering.
Try this → take your most important product bet and ask whether it came from customer interviews or first-principles reasoning. If you cannot tell the difference, that is the problem to solve first.
3. Inference is where the money gets made, and the bottleneck is not what most engineering teams optimize for
Training builds intelligence, inference delivers it, and the entire business of AI runs on inference.
“The speed up over the previous generation: 50 times. In two years, we improved something by 50 times. Moore’s law would have improved it by 2x.”
Here is the constraint most engineers miss. Generating tokens is bandwidth-constrained, not compute-constrained. The pre-fill phase processes context, the decode phase generates every token, and decode dominates because it needs more memory bandwidth than a single chip can provide.
NVIDIA’s answer was Grace Blackwell NVLink72: 72 chips ganged into the world’s first rack-scale computer, solving for the actual bottleneck and landing a 50x gain in two years. The co-design philosophy compounds, because a million-x over a decade is 50x here and 50x there, stacking.
Teams optimizing MFU, model flops utilization, are measuring the wrong thing. Jensen would rather run low MFU and be over-provisioned than hit 100% and fight Amdahl’s Law on every workload. Optimize for the constraint that limits you, not the metric that is easy to track.
Try this → name one metric your team optimizes that does not map to the constraint actually limiting your output, and retire it.
4. Stanford has a $40 billion endowment and no supercomputer, and Jensen says that is Stanford's fault
There is no chip shortage, there is a budget structure problem.
There is no chip shortage, there is a budget structure problem.
“It is not true that people are giving me orders, placing orders, and we’re not delivering chips. It is just not true. You’ve got to place orders.”
Every research department at Stanford raises its own grants, no grant is large enough to justify shared compute infrastructure, nobody pools resources, and the result is a campus of laptops and zero supercomputers.
Jensen’s fix is blunt. Stanford has $40 billion, so cut $1 billion, contract a cloud provider, and give every student access to AI supercomputers this year. The chips exist, the money exists, the structure is the problem.
His accountability framing is the sharpest insight here. When you tell someone it is not their fault, you take away their ability to fix it, because fault and agency are the same thing. Assign the fault correctly and you assign the power to solve it. He is not criticizing Stanford, he is refusing to let it off the hook.
The same pattern sits inside every large company: fragmented budgets, siloed compute, everyone underpowered. The organizations that aggregate first will produce the work that defines the next decade.
Try this → map every compute and data resource your team uses, find what is fragmented that could be pooled, and treat the gap as a budget structure problem rather than a resource problem.
5. The last time computing changed this fundamentally, it was 1964, and that date tells you exactly what to audit
This is a reinvention, not a technology cycle.
“For the first time, the way you write the software, how you process the neural network versus the software, and what the applications can do has now dramatically changed. Everything is fundamentally different.”
The IBM System/360 defined the computing model in 1964, and everything since (PCs, internet, mobile, cloud) was built on that foundation of prerecorded software, compiled binaries, and explicit instructions. The model held for 60 years.
Neural networks run differently than compiled code, and that one fact breaks every assumption the old model made:
▫️ Software is no longer written and compiled, it is trained
▫️ Output is generated as it happens, rather than retrieved from storage
▫️ The computer responds to intention, not only to instruction
▫️ Applications that needed human-level perception are now buildable
Alpamayo, NVIDIA’s self-driving system, is the proof. Thirteen years of self-driving work, nothing good enough, then deep learning arrived and the entire application category unlocked. That is what a genuine reinvention looks like: a different category of possible, not a faster version of the old thing.
Try this → audit your product’s core assumptions, find the ones shaped by the 1964 computing model, and ask whether they still hold. The ones that do not are either your biggest risk or your biggest opportunity.
6. The moment GPT shipped, agentic systems were obvious, and the founders who saw it built the infrastructure
Generative AI did not make images. It made thinking visible.
“Thinking is generating tokens that you consume internally. Generating tokens that you consume externally would be called tool use. And so the idea that after GPT happened two years ago, that we would be at this moment was fairly easy to predict.”
When GPT shipped, Jensen did not see a chatbot, he saw that the mechanism generating text can generate thoughts. Internal token generation is reasoning, external token generation is tool use, and the agentic trajectory was a direct mechanical consequence, a derivation rather than a guess.
The engineering between GPT and today was hard work by brilliant people: training models to reason step by step at scale, fine-tuning for reliability, building the tooling. The destination was legible in 2023 to anyone reasoning carefully about the mechanism.
The current signals read just as plainly. Agentic systems are here, continuous compute is replacing on-demand compute, and everything built for on-demand will get rebuilt. The founders who saw it early are building the infrastructure everyone else now rents.
Try this → name the next step in the trajectory from where AI is today, write it down, and build against it. If you cannot name it, that is the gap to close first.
7. Open models are not about the PR war with OpenAI, and the three reasons that matter more than the headline
NVIDIA burns more Anthropic and OpenAI tokens than almost any company, and that is not a contradiction.
“There are too many societies where the scale of their language is not big enough for somebody else to decide to make it a high priority. Unless you deeply care, it’s never going to be great.”
Three reasons, ordered by how underappreciated they are.
First, 230 languages where no major lab will prioritize fine-tuning. Nemotron, NVIDIA’s near-frontier open model, exists so any researcher anywhere can fine-tune for any language without starting from scratch. That is infrastructure for human intelligence at global scale.
Second, human priors. Alpamayo is a language model fused with a world model, and that fusion means the self-driving system needs a few million training miles rather than billions. The data requirement collapses when the model reasons from human experience instead of learning every edge case from zero.
Third, you cannot secure a black box. A closed-model arms race in cybersecurity is just version numbers climbing while everyone stays exposed in between. Transparency is how you defend, and Nemotron Nano, deployed in swarms, already runs this way.
Try this → pick one domain-specific model you build or use, and ask whether fusing it with a language model that carries human priors would cut your training data requirement. If yes, that is a compounding advantage sitting unused.
8. Vera Rubin is not a faster training chip, it is the first computer designed around how agents actually work
Agents do not compute the way training clusters compute.
“The AI is this multi-billion dollar system and it sends off an instruction to use a tool and that tool is gonna run on the CPU. Meanwhile, this GPU supercomputer, this multi-billion dollar system is waiting for this one CPU.”
The agent compute pattern has three requirements current cloud infrastructure was not built for.
Long-term memory has to live in storage wired directly to the processor fabric, because copying data off network storage kills latency. Tool use runs on CPUs, and cloud CPUs were designed for parallel throughput, 200 cores each doing independent work. Agent workloads need the opposite: single-threaded operations with extreme low latency, because a billion-dollar GPU cluster stalls waiting on one CPU thread.
Storage and CPU architecture both get redesigned for this pattern, and Vera Rubin is that redesign. Feynman, the generation after, is likely built for swarms of agents running sub-agents running their own sub-agents. Jensen names the method: identify the compute pattern, understand how it differs from the past, and build the system to match the actual workload.
Try this → map the actual compute pattern of your most important agent workload, find where latency enters, and ask whether your infrastructure was designed for that pattern. The gap is your performance ceiling.
9. Energy for AI is probably 1,000x current levels, Jensen has done the math, and the market is starting to agree
The bottleneck after chips is energy. This has been true for five years. Most people are still learning it.
The bottleneck after chips is energy, and most people are still learning that.
“The amount of energy that we need for computing is likely probably a thousand times more than we currently have. And so I think if you said we need a thousand times, I wouldn’t be surprised if we’re off by a couple of orders of magnitude.”
The reasoning is structural. Old computing is on-demand and retrieval-based: a server sits idle until a request arrives, responds, and goes back to idle, consuming energy per query. New computing is generative and continuous: a model runs at all times, contextually aware, generating outputs before you ask. The energy profile is a different category, not a 2x bump.
Two levers sit inside NVIDIA’s control:
▫️ Tokens per watt, already improved 50x, compounding through every generation
▫️ Co-design efficiency choices made from the chip level up
The lever outside its control but within reach of the market is energy infrastructure. Solar and nuclear needed subsidies to pencil out five years ago, and the compute demand is now strong enough that the market funds them without subsidies. Jensen calls this the best moment in history to invest in sustainable energy, which is a market-structure observation, because the demand curve justifies the capital.
Track this → tokens per watt across your inference stack. The teams improving it fastest will hold a structural cost advantage within three years that is hard to close from behind.
10. The AI doom narrative is not just wrong, it is built to sound unfalsifiable, and Jensen is not being polite about it
Jensen does not hedge here.
“It is not true that we have no idea how these systems work. It is not true that the technology is going to somehow in some nanosecond become infinitely powerful and therefore it’s going to take over the world. It is not true. These things are all being made up.”
The singularity-flash claim runs like this: at some unknown Wednesday at 7pm, AI becomes infinitely powerful, the game ends, no way to know when, no way to defend, some chance it ends civilization.
His counter is specific rather than “trust us.” We understand how these systems work, capabilities scale predictably with investment and architecture decisions, defenses can be built and tested, and the trajectory is legible. None of that fits the singularity-flash story. The more grounded version of the safety conversation looks nothing like the doom one.
The harm is concrete. The students in that hall are deciding whether to enter computer science, and the narrative shapes that choice. A generation raised to believe AI is an unfathomable danger enters policy rather than engineering, advocates for restriction rather than building, and cedes the field to people who do not share that hesitation.
GPUs are general-purpose compute. A billion people own them, running medical imaging, logistics networks, climate models, and video games. The comparison to atomic bombs makes useful thinking impossible, because every policy conclusion that starts there is contaminated.
Try this → the next time you read an AI safety claim, ask whether it is falsifiable. If no evidence could disprove it, it is a narrative, not a safety argument, so treat it accordingly.
The stack changed, all of it. Here is what that means depending on where you sit.
Jensen’s thesis is not about chips, it is about co-design as a principle applied at every scale. Optimize a layer and you get linear improvement. Co-design across layers and you get compounding. That holds for hardware, for product architecture, and for organizations.
Founders
The compute pattern for agents differs from the pattern for training, so build for the actual one and know whether your infrastructure matches the workload. The founders who map this in the next 24 months will build the platforms everyone else runs on, and the startup ideas worth building right now are already public.
Investors
The energy constraint is structural, and the market is now large enough to fund the solution without subsidies. The compounding returns sit in efficient inference, agent-optimized infrastructure, and the energy layer underneath all of it. Incumbency does not predict who wins, first-principles reasoning does, so find the teams doing it before the consensus catches up. Start from the investor list of lists and where the venture money is moving.
Operators
The computing model that held for 64 years just changed. The teams who internalize that as a first-principles fact will rebuild the stack correctly, the teams who treat it as an upgrade will spend the decade debugging assumptions that were never right for this environment. Start with how to use Claude like the top 1% of users and the complete guide to AI coding in 2026.
Anyone entering the field
The door is wider open than it has been since 1964, every layer of the stack is being rethought, and the people doing the rethinking are the ones reasoning most sharply about what changed. The 2026 AI engineer roadmap and the Claude Architect certification curriculum are where that reasoning starts.
Three principles to carry forward
▫️ Co-design wins. Individual layer optimization has a ceiling, and integration removes it.
▫️ The compute pattern determines the architecture. Know the pattern before you write a line of code, because everything else is guessing.
▫️ The trajectory is legible. GPT made agents obvious, agents will make swarms obvious, and the signals exist, so reason from them instead of waiting for the market to confirm them.
The stack changed, all of it. This is the best time in 60 years to be the person rebuilding it.
Full lecture: Stanford CS153 Frontier Systems, Jensen Huang from NVIDIA on the Compute Behind Intelligence.
If this breakdown saved you an hour, share it with one engineer, founder, or operator who needs to see it.
Full lecture: Stanford CS153 Frontier Systems, Jensen Huang from NVIDIA on the Compute Behind Intelligence
If this breakdown saved you an hour, share it with one engineer, founder, or operator who needs to see it. They will thank you later.



Ai like 👌 link 🔗