This is so interesting. AMD fabbed a big part of the chip that doesn't benefit from a shrink at the ultra cheap 14nm size, and then fabbed the high performance little chiplets at 7nm to get a chip that overall has a HUGE die size, but at the same time has awesome yields and high performance.
AMD has done something very intriguing this year with Rome.
Is this the inevitable path for all modern chips for all devices? Is this going to be exclusive to high performance clusters? Is this not without disadvantages? Can this be upgraded and improved to the point where it's the same as one-die designs? Discuss.
its a nice design because it frees up more room on the dies by moving I/O and memory controller off. allows them to use the same size to add more cache, extra. or just shrink the cores down from the space savings to add more on the pcb. either way they can go both routes if they choose to.
not sure if intel or arm will follow in design. it would be embarrassing for intel as intel called infinity fabric "gluing together dies." arm would only do it if it shows improvements to power consumption as arm only cares about power consumption. hence why arm never adopted smt.
i'm more interested to see how this works out on the consumer space. looking at that rome engineering sample its a big boy. to big to replace current mainstream ryzen segment (i'm not talking about pro-consumer segment with threadripper). if each die is still 2 ccx's with 4 cores each, they can stick one core on the pcb along with a small i/o & memory controller die below it or something.
i wonder if anyone has tried to make a mockup of mainstream ryzen version of it.
Aiden Murphy
Bump
Anthony Gonzalez
So zen 2 is rumored to have 88% yields, does anyone know what yields are for xenons?
William Phillips
2.6%
Leo Phillips
LOL no its fucking not, random bullshit asspulled from anons is not a source for anything 7nm is not a mature process, its absurdly fucking complex, has more mask layers than 14nm, uses more quad patterning. It is never going to be a high yielding process. EUV exists only to try and rectify *some* of the costs of the increasing complexity at smaller process nodes, it won't ever alleviate it fully.
The chiplet designs aren't that small for 7nm either, they're around 20%~ smaller than Apple's 7nm cellphone SoC. The range of 60mm2 to 80mm2 is still quite large given the transistor count at this size. AMD would be lucky if TSMC yielded 30% from potential candidates per wafer.
Hudson Brooks
It's a good plan, AMD moved from traditional system architectures that mostly rely on shrinks and uarch upgrades to stay relevant to a system architecture that puts its entire weight on the interconnect(glue).
The smart thing here is that the former is not scalable, as it depends on both fabs and SoC/package verification which gets much, much harder when 60% of your die is analog circuitry(which AMD moved to the I/O die). Not only does it take a lot of time, a lot of bugs are expected.
Secondly, by moving the entire system arch to rest on the interconnect, they got rid of the scaling issue, they've made it trivial to add more cores, trivial to improve cores, and trivial to extend the I/O, all independently of each other, now that they don't have to worry about most of the verification and debugging Intel has to do. Most importantly, interconnects are scalable, they don't depend on fabs, or wire improvements, they depend on software and its sole purpose is to improve its encoding algos, which are constantly improving.
Hunter Peterson
If my math is right, 12%. 88 + 12 = 100.
David Bell
>tfw 48c/96t Dickripper coming
Aiden Rodriguez
>32c Threadripper already has gimped memory from CCXs on some dies having to communicate with memory through another CCX >wanting to add even more cores to this quad channel platform
This wouldn't be a good idea. 32c is already pushing things.
Are you dumb? Rome (and TR3) will be UMA, all cores will have access to all memory controllers, it remains to be seen if 4 channels is enough for 32 cores, or even 48. But I have heard that AMD is planning on a new HEDT platform (X599?) next year with TR3. And looking at the monster they have, I wouldn't doubt it. No point gimping such a fine chip with backwards compatibility.
Kayden Gutierrez
right now only TSMC exclusively has 7nm and it's completely booked. By keeping almost half the total area of the dies on 14nm they basically double the number of cpus that can be put out for there allotted 7nm production. Once 7nm ramps up and there is enough production from multiple fabs it wont be necessary but in the short term this is a genius idea.
Eli Hernandez
And honestly, anyone expecting to go from 16, to 32 and then to 48/64 cores on the SAME SOCKET while retaining a full performance and feature profile of the latest generation is out of their fucking minds and has zero fucking clue how any of this shit works.
Levi Morgan
>right now only TSMC exclusively has 7nm No. Samsung has 7nm lines, and they started theirs with EUV.
Robert Wright
Not really, I expect them to keep using 14nm I/O dies for at least 2 years.
1. WSA wafer agreement with GloFo 2. 7nm is way too expensive to spend on 95% fucking analog circuitry 3. even if you shrunk it to 7nm you'd probably gain some 15-20% die area at best, nowhere close to 2x density.
Nicholas Clark
Good thing they're so small, then. That's gotta cut the potential price to a fourth of what it could be.
Ryan Morris
So basically we're talking they've gone from tick tock to tick tuck teck tock? They could improve it ~4+ times a year without much work?
Juan Stewart
Like another user said, though... this could potentially be fixed and upgraded within a single generation?
Benjamin Mitchell
I expect a bunch of "tocks" from AMD going forward, probably every 11 months to 15 months, they have massively cut down verification, testing and debug times for the cores and SoCs going like this. We might be looking at 10-15+% IPC improvements every year until 2022, when I expect Zen to retire.
Ayden Rivera
What would AMD have to do for a gen2 to actually achieve this, then? Or would they even bother? Or would it be like adding more bread slots to a toaster - not needed by anyone?
Elijah Diaz
It's not on AMD, they could have went with a more traditional and far more inefficient architecture like Naples is and keep compatibility, but they would be crazy to do so.
Everything is in the mobo vendor's hands, AMD can't really do anything on how they do DRAM traces or forward compatibility with PCIe4, some boards might support that though. It's more of a electrical than physical issue.
Michael Clark
All chips will eventually do this. Even APUs, to an extent. Probably only the upcoming x-series ones that are gunning for 1050/1060 tier performance. Those aren't going to be functional for a while yet, but that's the inevitable direction for low end GPUs and mid range CPUs. The GPU is going to creep further up the stack so long as software continues to add GPGPU needs.
I'd expect these APUs to be much much cheaper than an individual card and CPU as well. Think of all that saved money on a PCB, separate GPU board, memory, etc. Just stick a fabbed GPU and CPU under the same lid together and give them a specially engineered interconnect...
I'd bet this would shave over $50 off the cost of a separate CPU and GPU, and even more considering the reduced complexity of the motherboard and BIOS and such.
I can't figure it out either. I'm at a complete loss.
Caleb Stewart
>not sure if intel or arm will follow in design. it would be embarrassing for intel as intel called infinity fabric "gluing together dies." arm would only do it if it shows improvements to power consumption as arm only cares about power consumption. hence why arm never adopted smt Incel is already trying to rush their own MCM designs out before AMD can lock down the majority of the enterprise market. AFAIK ARM wouldn't benefit so much from MCM, but I'm not an expert by any measure.
Jaxson Robinson
dammit
Kevin Fisher
does that frankensteined chip come with an imprint of a balding man with sunglasses too lol haha aymeds btfo
Jaxson Bell
67mm^2 per chiplet looks like. They can and will be able to disable any two cores per CCX which gives them much higher effective yield. Datacenter is also first which is higher margin anyway. And apple sells to consumers, if apple can make a healthy profit on those compared to their previous designs then surely AMD's balance sheet doesn't look much worse.
>random bullshit asspulled from anons is not a source for anything >AMD would be lucky if TSMC yielded 30%
Source?
Jack Perry
Intel announced a glued-together Xeon the day before the AMD event. Also, the fact that they hired Jim Keller for "SoC integration" suggests that they are working on their own infinity fabric.
>2 years This is enterprise equipment. They'll be manufacturing it for 5+. Epyc 1 will be made for that long as well so there's going to be something there to pass off against the WSA along with years and years of this IO chip and enterprise polaris and vega parts.
On another note surely this IO chip has to include all the shit that zepplin has. Like 8 10gbe, 32 sata over the 128 gpio lanes which also support IF along with the extra 16USB3 lanes as well. The usb3 thing boggles the mind of how much space must be wasted on those as they're not really for the enterprise segment but they still have to technically support them for full socket compatibility.
Connor Thomas
>sunglasses Try camera you donut.
Carter Garcia
USB is tiny, a few mm2 at most even on 14nm Everything else is needed, and I'm pretty sure Rome has 100GbE (or 5x20GbE muxed) included
>The range of 60mm2 to 80mm2 is still quite large given the transistor count at this size. It's not, not even with SAQP/LELELELE. >AMD would be lucky if TSMC yielded 30% from potential candidates per wafer. Are you retarded, user-kun. Like, genuinely so?
Jacob Powell
Not in HVM. >Everything else is needed, and I'm pretty sure Rome has 100GbE (or 5x20GbE muxed) included More like 4*25GbE, muxed over the same 25GT/s PHYs as xGMI.
Ayden Ortiz
PCie5 isn't far off, if AMD goes a bit crazy Milan +2 could probably have 400GbE, more than those new crazy expensive switches
Evan James
They don't really need it since the hyperscale wants their own networking.
Jonathan Ward
YOU Stupid ass hat its 4x ccx now get with the program go look at bits and chips and hardware unboxxed and never be this stupid again.
Hudson Perez
On zeppelin the usb3 wasn't that much smaller than the gpio lanes per lane. But it'll add up, especially if it needs edge space as well. Then again I don't think I've seen an epyc board with that many usb3 ports, only tr3 stuff with 8 of them.
On XXGBe, >4*25GbE This might make more sense. It'll need 8 of them for compatability so all of them together will hit the 200gb/s mark that this generation of servers will target, a sound idea unless AMD doesn't want to step on mellanox's toes and stick with 10gbe. Has it been confirmed what each io lane supports? I know zen1 was 12-13gb/s.
Jackson Johnson
xGMI runs at PCIe4 ESM aka 25GT/s, see Vega20.
Carter Young
Explain like I'm a potato. What does this mean? I'm not well versed on CPU technology.
Wyatt Miller
AMD kinda went full circle. They were the first to integrate memory controller (for actual high-volume market anyway) and now they're the first to disintegrate it back onto it's own die.
Juan Thomas
Brain damage
Angel Gomez
Who? Absolutely nothing about Rome is confirmed besides system design, core count and xGMI speeds.
Luke Bennett
So what are the implications of this? Is this to the benefit of their newest architectures and shit? And if so, how?
Kayden Phillips
I really wonder how this epyc design will translate to ryzen 2.
Games rarely benefit from more than 8 physical cpus and every ryzen 2 die on its own contains 8 cores. Will they release single die cpus + huge I/O controller with 128 pcie lanes? Unlikely. Do they have a seperate ryzen design where the I/O is back on the core die? Unlikely as well.
Justin Myers
>So what are the implications of this? Fun things will happen. >Is this to the benefit of their newest architectures and shit? Yes. Maybe. No? >And if so, how? Getting rid of complex NUMA topologies is always good. Reaping the most benefits from the node shrinks by removing everything not digital logic and SRAM away from bleeding edge node is also good.
Jaxson Bailey
>Do they have a seperate ryzen design where the I/O is back on the core die? They will have a simple smaller northbridge explicitly for Ryzen.
Robert Garcia
The studies show it dramatically improves yields and binning of higher end chips. Intel also has plans with their EMIB tech. The latency increase is mitigated by using active silicon with wired connections on a larger node instead of a simple interposer.
I wonder if eventually the core dies will be stacked over the IO die for a smaller over all package. We'd basically have 3D cpus at that point.
Gabriel Hernandez
Is there a source for that? I didn't actually know ccix pushed 25 thought the phy. It's that kind of stuff that's making me look forward to GenZ and similar technology.
I died inside a little when Papermaster said "front side bus" in the presentation.
Nathan Ramirez
>I wonder if eventually the core dies will be stacked over the IO die for a smaller over all package. It will just become an active Si interposer. >Is there a source for that? Look up CCIX spec and announcement. >I died inside a little when Papermaster said "front side bus" in the presentation. Lel.
Lucas Jackson
I have something Jow Forums should read about rome
I'm still very curious as to what they're going to do for the mainstream parts. Threadripper is easy and just use the same parts as epyc 2. I want 16cores on AM4 but I can't fathom how they're going to do it without certain downsides. Mainstream will want a single SOC for cost and complexity reasons on low end parts and laptops, that'll include a gpu. Producing chiplets and IO for any of those is extra designs that won't be cheap for the currently frugal AMD and will induce latency penalties so the gamer market will be missed for that kind of thing.
>I'm still very curious as to what they're going to do for the mainstream parts. Chiplets are the solution to literally everything. Their second-gen IF is some sorcery, since 64C and an incredibly complex interconnect setup still fits into ~250W.
>It's 1x CCX now, no longer 2xCCX People like you caused this rumour 8 cores PER DIE. 2 ccx per die. 8 DIES per bridge. 8*8=64 cores All improvements listed are as follows: Improved Execution Pipeline Doubled Floating Point (256-bit) and Load/Store (Doubled Bandwidth) Doubled Core Density Half the Energy Per Operation Improved Branch Prediction Better Instruction Pre-Fetching Re-Optimized Instruction Cache Larger Op Cache Increased Dispatch / Retire Bandwidth Maintaining High Throughput for All Modes
They are all on the front end of the ZEN cpu core. NOTHING even hints about topology changes and there has never been evidence for it.
Yes. No. Yes.
>Sillicon wafers could fit increasingly more transistors at lower costs on a board. >multiple pieces of silicon was more expensive in final manufacturing. >AMD reduced the amount of sillicon pieces and saved alot of costs >14nm, 7nm and beyond are increasingly more expensive nodes but the performance gain is also enormous. >AMD decides to move the most power hungy parts from the chip to smaller 7nm chiplets (The cores and their cache) the rest remains on a larger 14nm die. ( mainly mem channels and pcie lanes.)
Adam Parker
>>AMD decides to move the most power hungy parts from the chip to smaller 7nm chiplets (The cores and their cache) the rest remains on a larger 14nm die. ( mainly mem channels and pcie lanes.) Ah, I see now. This makes sense.
Eli Rogers
>NOTHING even hints about topology changes and there has never been evidence for it. They won't tell you everything at once. More info is coming between now and launch, according to STH, that is.
Logan Stewart
Now repeat with 67mm^2 and then assume AMD can effectively recover half the failures by disabling any 1 or 2 cores on a CCX.
Though isn't 3000 for a 7nm britisch scone a bit cheap?
Joseph Hill
>Though isn't 3000 for a 7nm britisch scone a bit cheap? Ye, it's gonna be like 10k per wafer.
Christopher Richardson
It can be but in the current state of AMD I don't think they can afford to just make a fresh 7nm design for every situation. Throughout zen1 and second gen ryzen AMD used two designs for everything and 7nm is even more expensive along with them making even fewer gpus designs. The only thing that can be used from epyc2 is one of the chiplets. They need gpus and IO to go with that, the extra assembly will cut into the thing margins in the budget end of the market. There is increased latency unless it's done on interposers that are way too expensive. I'm not going to say they won't use chiplets, just that there's going to be some compromises somewhere and I really don't want one of those compromises to be less than 16cores on desktop.
Thomas Gutierrez
For one they did not tell anyone about the 4MB L3 per core, but we already knew for months it's there.
Kayden Cox
>It can be but in the current state of AMD I don't think they can afford to just make a fresh 7nm design for every situation Which is why they're gonna use that 8c chiplet where ever possible. >They need gpus and IO to go with that, the extra assembly will cut into the thing margins in the budget end of the market. PCBs are cheap. >There is increased latency unless it's done on interposers that are way too expensive. Si interposer never lowers latency. It allows power savings, though, either by using lower power drivers or going silly wide and slow.
William Hughes
>It allows power savings... Which is another one of those things, laptops for example want those power savings.
Carson Wright
The difference is gonna be miniscule, they only ever make sense for FPGAs and anything with the stacked memory since the interface for HBM is veeeeeery wide.
Easton Thomas
Also servers want power saving even more than laptops, since TCO is the #1 metric for pretty much everyone. Yet Rome does it's interconnect magick without any interposers.
Aiden Hall
>[[[They]]] won't tell you everything at once. sorry, Occams razor applies here. AMD has shown almost all their cards. Nothing has indicated that there is an actual 8core ccx. The 8core CCX meme is prevalent under illiterates that think AMD could create an 8core with interconnects that perform at the same latency as current designs. Why do you think intel went down the ringbus road? They thought they got the best of both worlds. cores next to each other had barely any latency while the tail end of the bus was around 87-90ns.
>More info is coming between now and launch, according to STH, that is.
There is always more info coming. Just when does the 8core meme die? I have no doubt AMD is working on several designs, one of which a potential 8 core CCX. The question is wether intercore latencies and L3 cache latencies are worth the coherent 8core CCX.
David Jackson
even at 10k per wafer, it's still only 36$ per die and they can still recover totally failed dies at a fraction of the cost.
the point was to shove it in that user's face with how fucking retarded he was
Sebastian Wright
>AMD has shown almost all their cards. They showed literally nothing, not even the power figures. >The question is wether intercore latencies and L3 cache latencies are worth the coherent 8core CCX. Dealing with less exclusive chunks of L3$ alone is worth it. The main question is how, and that we will learn soon enough.
Isaac Hernandez
>They showed literally nothing, not even the power figures. we can assume it stays within the same power envelope as other epyc chips on the market, given that it's supposed to slot right in to current boards.
>Dealing with less exclusive chunks of L3$ alone is worth it. >The main question is how, and that we will learn soon enough. Dealing with the 28 interconnects that have to be completely functioning and validated makes it completely worthless. CCX-CCX latency isn't even bad on Zen 1.
Landon Price
>88 This post is antisemitic
Jaxon Smith
>we can assume it stays within the same power envelope as other epyc chips on the market, given that it's supposed to slot right in to current boards. Higher. It's 250W average for the top bin. >Dealing with the 28 interconnects that have to be completely functioning and validated makes it completely worthless. You don't need to have it fully connected. >CCX-CCX latency isn't even bad on Zen 1. That's not the point, you silly gaymen manchild. The problem is separate exclusive chunks of L3. The less, the better.
Cameron Wood
>You don't need to have it fully connected. ah yeah adored's mythical "butterdonuts" shame there's been literally no talk about it
Cooper Howard
>ah yeah adored's mythical "butterdonuts" Who? What. Just a bunch of non-fully connected xbars.
Nolan Diaz
>the point was to shove it in that user's face with how fucking retarded he was Good. >even at 10k per wafer, it's still only 36$ per die >and they can still recover totally failed dies at a fraction of the cost.
Imagine being Lisa su right now.
>Muh 8 core CCX exists despite no evidence AMD already showed all their arch' improvements >Muh power and performance numbers I'm sure power draw and performance numbers will tell us wether its an 8core CCX. >Dealing with less exclusive chunks of L3$ alone is worth it. You can just say you dont know how to optimise for L3 caches.
In any case, AMD's got you covered because they doubled the L3 cache. Your 8core ccx isn't even needed according to you because it now fits in L3 cache.
If only you needed high thread coherency then you might have made a case for your meme 8core CCX.
Levi Jones
>AMD already showed all their arch' improvements They didn't even talk anything in any details. Are you retarded? >I'm sure power draw and performance numbers will tell us wether its an 8core CCX. That's not how you do that. >If only you needed high thread coherency then you might have made a case for your meme 8core CCX. So, any large VM deployment, ever?
Isaiah Morgan
>That's not the point, you silly gaymen manchild. >The problem is separate exclusive chunks of L3. >The less, the better.
You doubled down on the magic "Seperate L3 cache" >details. >That's not how you do that. Yes, you sniff your own farts and if it burns it is an 8core CCX >So, any large VM deployment Virtual Memes do not by any definition 'need' high thread coherency. Quite the opposite if anything.
Aiden Wright
The whole point of the I/O die is to give the compute dies uniform memory access.
15 years ago I would never have imagined that heatspreaders (aka thermal insulation caps to stop idiots from cracking their dies) would become the absolute standard. Even shims used to be basically seen as training wheels for the inexperienced and careless.