NVIDIA's Nemotron 3 AI Model Achieves Breakthrough Speed and Efficiency

NVIDIA's Nemotron 3 AI model represents a significant advancement in artificial intelligence technology, offering open-source access with detailed research documentation. The model achieves remarkable speed improvements through innovative techniques like NVFP4 format compression, multi-token prediction, and efficient memory processing while maintaining accuracy comparable to proprietary systems.

Full English Transcript of: NVIDIA’s New AI Just Changed Everything

Remember that most AI systems are proprietary, we have to pay a subscription for them, and no one knows how they work or what data they were trained on? Well, now hold on to your papers Fellow Scholars and check out this incredible work, and when I first saw it, my jaw hit the floor. They absolutely knocked it out of the park. They spilled all the secrets. This is an AI assistant that is free for all of us forever, but not just the model itself. They also gave us a 51-page research paper which might be the holy bible of creating such a system for now. Why is that?

Well, they show us every step of the way of how it was done, and the dataset it was trained on as well. That is extraordinary. Usually something is always missing. Not here. They call it Nemotron 3 Super and we are going to find out whether it is indeed super or not. Okay, so in goes 25 trillion tokens as training data, and out comes a 120 billion parameter AI assistant that is how smart exactly? It roughly matches the best closed frontier models from about a year and a half ago. Note that those models cost billions of dollars to train and every detail about them was kept in secret. And now, we just get this kind of

stuff for free. That is mind blowing. This is amazing for us, consumers and Fellow Scholars. So as you see, it is really smart. Up with some of the best open models out there in most tests, but note that it's still a bit behind some areas. Here's something that surprised me: in this result, they showcase two versions of the new model, BF16 and NVFP4. They perform roughly the same in terms of accuracy, so why the big fuss about this? Well, look at this. Holy mother of papers. Wow. Well, the NVFP4 version is about 3.5 times faster than their other model, and it is up to 7 times faster than similarly smart open models. So the story is not just the similarly smart part,

the story is that it is 7 times faster while it is similarly smart. Goodness. Okay, so how on Earth did they do that? So here are 4 secrets they gave us from the paper, in very simple words. Dear Fellow Scholars, this is Two Minute Papers with Dr. Károly Zsolnai-Fehér. Okay, NVFP4. What is that? This is a way for speeding up the AI to run a great deal faster by essentially compressing the mathematics it uses. Seeing a long number and rounding off a few digits. You get a smaller format. Less work! What's wrong with that? Well, everything. Normally, if you do that, you lose too much accuracy and the system will output nonsense. However, here, scientists did it the smart way:

they left the most sensitive calculations alone, and did this rounding for the rest, where it does not cause trouble. The result is that it runs up to 7 times faster than many other techniques. And we saw that it gives us no meaningful loss in accuracy. Magic. But there is more magic. When other AI techniques write their answer, they write it token by token. Let's simplify by saying word by word. Writing one word at a time. But not this one. This one calculates several future words at once. A whole sentence! Almost. Specifically, 7 tokens. And then the system verifies the 7 tokens in one go. Another massive speed up. They call it multi-token prediction. But why stop there? Let's add even more magic!

They showcased these weird things they call the mamba layers. What do these do? Well, traditional AI systems have a bit of a memory problem. They work like a student who constantly re-reads the textbook over and over again when they are given a question. Scientists at NVIDIA say, that's not the way to go. Memory is precious. So instead, read the book only once, and take highly compressed notes. So this kind of memory remembers important details about the conversation. However, it is smart enough to throw away the filler words. Thus, this system can process massive amounts of data efficiently. All this sounds glorious, but this still does not give us a working system. Why is that?

Well, this is why. You see that there is a lot of addition here? That is the problem. The AI generates your answer step by step, and because we rounded off the numbers, there is a little error. That's not a problem. Here's the problem. There are many steps, and the error is magnified through each step. Imagine trying to walk to your car, which is a 100 steps away, but you feel a bit sluggish today and every single one of your steps is a bit smaller than it was before. What's the result? Well, of course, after a 100 steps, you are still really far away from your car!

So what is the solution? Well, scientists solved this by adding back some random noise in the system. But wait, this noise is carefully crafted in a way that it averages to zero. So your new steps are sometimes smaller, and sometimes bigger than they used to be, but if you average them out, over a 100 steps, you will be exactly at your car. So good! They call this stochastic rounding and it is a genius idea. Now, not even this technique is perfect. For instance, when I give it my favorite question about assembling robotic cows, with lots of math, I like this guy a lot, but it thinks for almost an hour to get me an answer for that one. That's a lot.

So if I have workloads like that, I like to run it on a much faster Lambda instance. But still I think the AI game has suddenly changed. Closed systems used to dominate. Now, not anymore. It seems to me that Jensen at NVIDIA is not playing games here. It's in the news that they are going to invest tens of billions of dollars into fully open systems like this. I am not a money person, I don't know how that works exactly, but if we get to own more amazing free AI systems. Well, sign me up for this one! What a time to be alive! And there is just so much more in the paper, I would definitely love to come back for at least another video on it. Let me know in the comments if you would like that,

and if you enjoyed this, subscribe, and hit the bell.

English Subtitles

Read the full English subtitles of this video, line by line.

Loading subtitles...