AMD AGESA 1.3.0.0 Limits ECC Memory to DDR5-5200 on AM5 Motherboards

AMD's latest AGESA 1.3.0.0 firmware update appears to unintentionally limit ECC memory support on AM5 motherboards to DDR5-5200, causing boot failures and frustration among users. The change was not documented, and motherboard vendors have rolled out updates with inconsistent behavior. The issue may be related to Rowhammer mitigations, but AMD has not officially communicated a fix.

English Transcript:

Hey S. Oh, sorry. Force of habit. Hey MD, what are you doing? What we have here is a failure to communicate changes in the Aiza that have to do with the way that ECC memory is handled. AMD, what are you doing? Come on. What are you doing? ECC memory on AM5. We're basically in mandatory ECC everywhere except desktop. And ECC error correcting memory is still a thing that's available on desktop. And if you look at DDR5, some DDR5 memory says, oh, ECC is built in. It's on DI ECC. That is not the ECC that we're talking about. We're talking about ECC where there's an extra memory chip that stores extra integrity information. Up until now on

7000 series AM5 CPUs and 9000 series AM5 CPUs as well as the new Epic CPUs, AMD has been pretty good about supporting well they don't lock out ECC support. They allow motherboard makers to integrate ECC support and our X870 Orus Master X3D ICE up until 5 minutes ago was working perfectly with the ECC UDIMs that I have installed in here. 72 bits wide instead of 64 bits because the extra bits are for correction and running at 5600 because 5600 is a Jedex speed. See there's 4,800 and there's 5600. But something weird has happened. Some folks on my forum have noticed 5200 like the 9,000 series CPUs after the latest Aiza update 1.3.0 point uh I'm on the fence as to whether

it's 0A or one are locked down to 5200, not 5600. Well, I we've been running these kits of memory at 5600 since time immemorial on like the worst of the worst. I have a 9600X CPU that does not work if you turn on PBO, which is kind of an accomplishment. Like PBO does not like it's bad. It's really bad for that class of CPU. That CPU is not even stable at DDR56000 and it works perfectly fine with DDR5 5600 at the Jed DDR5600 on this absolute potato class set of error correcting memory error correcting UDIMs that I got a long time ago. There has been an Aiza update. ASUS was the first to roll it out. ASUS got the most attention, but the Agiza update has trickled down to ASRock, Gigabyte, and

MSI. MSI for their part, they're not uh super put together in supporting ECC in the first place. Uh I would say that ASRock is your first class citizen for ECC support followed by ASUS and Gigabyte can be a little hit and miss. Gigabyte does expose the ECC option in the UEFI, but now I'm getting Postcode C5, which I've gone from auto, which the help says that auto is enabled if it can, which is actually a lie. It doesn't enable it if it can, but enabled goes to C5 post code and now it won't post because it's trying to post at 5600 and the Aiza is locking it at 5200. If you roll back to an older BIOS that doesn't have 1.3, everything works fine. This is not an announced change. It's not in the

change log. There's no documentation. The behavior here seems weird because 5200 doesn't seem like that seems like a Zen 4 thing, not a Zen 5 thing. And right now, Zen 4 in this exact motherboard, in this exact scenario, is more functional than Zen 5. Why is ECC broken? It's an ongoing developing story. I'm not really sure, but we did test other motherboards cuz, you know, at first we thought it was an ASUS issue. All right, I'm turning the computer off and I'm using my clear CMOS button. Turning it back on. Let's see if it posts. When you encounter problems like this and the platform feels like it's gaslighting you, would you say it's expost facto? because you know the post is Yeah, never

mind. Now we can all enjoy the tiny little postcode. Uh secret engagement juicing so that everybody switches to 4K to read the tiny little post code. You don't need to see the post code to see the red emanating from the old crap LED. Now on this motherboard from just changing that one option ECC enable instead of auto. You get stuck at C5. But you get stuck at C5 immediately. And it's not because it's doing DDR5 training. is because this particular kit of ECC, like I say, there's only the one single Jed 5600 profile. This memory, it will work at 5200. It works at 5200 with epic CPUs, but this is not an epic CPU.

It simply will not post. And this is a really common scenario. So in you know if there is rationale from AMD that says oh we are solving some sort of problem on the epic side of the fence with CPUs. Um you've made the problem worse because there's at least some demonstrable class of user that'll go from BIOS update restore the settings that they had to no longer posting. And this is it. This is example A. Now, ASUS and some other boards will behave a little differently and some boards don't expose the ECC option. It's just always ECC auto and it'll try to enable it. But the this Agiza version seems to be behind the scenes sabotaging enabling that for this version, which is very frustrating if you're a user that would

like to have ECC on your desktop without paying the Thread Ripper tax because Thread Ripper and on up everything is ECC these days. just desktop. That's not desktop is optional. And it's always been an amazing good differentiating feature for AMD that they enable it up and down the stack because Intel usually only enabled it at the very lowest end of the stack and sometimes at the very high end of their desktopish processors, but usually nothing in the middle. But also like communicate, let us know what's the plan. Like what was fixed? Can we talk about it? Why is it just like oh no post? No post for you. Like the soup.

All right. So, Gigabyte is no post when ECC is forced on. But when it's not forced on, DMI decode does report 72 bits, but EDAC doesn't see anything because ASUS Crosshair Hero that we splurged on. And you know, way too expensive motherboard. It only has 5 GB Ethernet. The gigabyte has 10 GB built right in. 5416 A031. It is at least trying. It doesn't immediately go to an error, but it will only boot at 5200. 96 gigs of DR5200. No, sweetie. The Jedi profile is 5600. The CPU should support 5600. It's supposed to support 5600. It's Zen 4 that maybe is like 5200, but 5600 is what we're supposed to have here. And what historically has worked fine, this is a desktop CPU. It's

not epic. Definitely seems like a bug. AM5 and ECC is disproportionately popular among Linux users because Linux users know what kind of sends the engineers do under the covers. System 76 is very successful making and selling Linux and Linux accessories. AMD has um perhaps unknowingly to system 76 screwed system 76 with this kind of an update because as recently as 5 days ago for system updates like our the mira here the R4 in4 that I reviewed they advertised with ECC support. Now, System 76 vets the BIOS updates and they deliver them through their firmware update tool in Linux, and they haven't released an updated BIOS for the board in here, but it is an ASRock board, and we know that

ASRock boards are affected. So, AMD has unknowingly put system 76 in a difficult position if this is not uh a whoopsydoodle that's going to get fixed. Drives me insane when it's like, oh no, this is intended behavior. And it's like, okay, one, communication. to can't rug pull. Certain other processor companies famous for that. We don't want to go down that road. That's that's terrible. That's just terrible.

Whoopsy doodles. We can understand to a greater extent. Sorry system 76 customers. Does that mean ECC is off the table? I hope not because this is a good system. And boom. Here's our ASRock Taichi. And bad news, it behaves exactly the same way as the ASUS does. It's locked to 5200, even though it's 5600, even though that's the only Jedi profile on its memory kit. So, our Gigabyte motherboard looks at this and says, "Wow, this is a really dumb situation and just hard blocks C5." Meanwhile, ASRock and ASUS will come down to 5200 perhaps as a workaround to whatever insanity is going on under the hood. I did reach out to AMD, but I did not hear back. Um, I don't know if it's they're

interpreting this as a bug or a feature. As presented now, it is definitely a bug. And I assure you this is the absolute bottom barrel quality of memory for ECC DDR5 that has been running continuously since uh 7000 series CPUs with corrected errors reported by RAS Damon in Linux and even helping build and debug some of the edac support in the Linux kernel and it's fine. Basically, I understand that hyperscalers maybe have had some problems with Epic. It's probably more related to motherboard quality than IMC quality, but an insufficient amount of testing and validation has been done. I am more than happy to help with that on or off the record. Don't care. Just want to get the bugs fixed. My whole MO is

just getting the bugs fixed and making human suffering less. That's my whole reason for being here. Let's get it done. Now, I went off on my own trying to figure out why would this be? There is another alarming thing in some of the release notes for some of these BIOSes. The board vendors say there is no way to roll back after you upgrade to this version, this 1 point, the problematic version for ECC enthusiasts. Why would that be? Well, it's probably not because they're trying to take features away and gaslight us that the features never existed. It probably, if I had to guess, has to do with SKH Heinix. You see, uh, somebody figured out the secret formula, which is actually very easy to do now.

Like there's like I don't know, Lori Wire did an amazing video that talks about how memory problems go back to like the 1960s and the context of her talk was more with like DRAM refresh and when you get the penalty from DAM refresh. Fantastic video, excellent taste in office chairs. definitely should check out the content that she has done because it's very good. In that video, she also talks a little bit about Rowhammer and Rowhammer type vulnerabilities. There has recently been kind of an important big deal problem identified with SKH Heinik's memory because of how it does its internal stuff was figured out with a similar technique to how she figured out how where memory is going to be stored for to solve a different problem but the

approach would be the same if you were worried about refresh and rowhammer and that sort of thing. And so this may have to do with trying to help SKH Heinix address their rowhammer weakness because 5200 would make like running the memory slower overall would make it less susceptible to rowhammer type attacks typically. I don't know. That's just a guess from my part. But if it turns out that's a good guess, then that might explain why this particular whoopsy doodle is kind of a whoopsie doodle because they were in a hurry to rush that out the door because that was reported late last year and they still hadn't gotten it updated in the platform until this version. And that might be one reason why you don't

want to roll back the previous version because if you're using that kind of memory then okay, yeah, maybe the bad news is this is not SKH memory uh or at least I've tested not SKH memory in an ECC configuration. So, uh, I don't know, but this would also be a really bad way to deal with it, you know, optically, but also maybe AMD would think it would be a bad way to deal with it optically to say, hey, we've identified a fundamental flaw in DDR5 from a major manufacturer and we are rolling out mitigations for rowhammer. People will take that as a weakness in the AMD platform and roll with it even though it's not necessarily an AMD problem. It may not be an AMD problem. So, I'm really

hoping that this is a whoopsydoodle problem that is going to get corrected because it is a bug pin comment below and not a this is the new normal because this attack is only going to get worse. Like the tooling that we have for finding these kinds of problems is way better this year than it was last year. And so once you have people with this kind of tooling to turn their attention to these kinds of problems, it is only going to get dramatically worse. And the fix here, if it can even be described as a fix, is not going to work universally for someone that is hammering and attacking uh pun intended, this type of problem because we have really good tooling for

figuring out the hashing algorithms and everything else. Now, you just have to have somebody that's determined enough to do it. So yeah, I'm team whoopsydoodle, not team let's solve the engineering problem this way because you know you're just you're chasing a unicorn trying to solve it's like it's a fundamental it's fundamentally part of the technology is what I'm saying. And well this level one has been a quick look at the state of ECC on AM5 not looking good. Don't upgrade your BIOS if you get a Giza 1.3.0.0 and possibly.1 as well 1A. you want to be on something older than that, which also means you're kind of locked out of getting a 950 x3d2 for ECC support unless you're okay with DDR5200, which you shouldn't be because that's

not even a Jedax standard. And the 9000 series CPUs are now slower than the 7000 series CPUs. And if you're a company like system 76, you probably should be uh making angry sounds behind closed doors. Oh well, at this level one, hopefully there'll be a pinned comment with a change of things as the future approaches and passes and becomes the past. I'm signing out. You can find me at level one forums. Woo!

English Subtitles:

Read the full English subtitles of this video, line by line.

Loading English Subtitles:...