Zen2: Rampant Speculation without Speculative Execution Edition

>unknown IPC uplift over Zen1
>unknown transistor count increase
>unknown clockspeed increase
>will be 7nm HP from TSMC
>Process touted as offering up to 45% higher perf compared to 14/16 class nodes
>more than the low power 7nm process
>EPYC2 based Rome will be in the hands of some big OEMs by the end of December if they don't have test samples already


For a 8c/16t 95w part I'd conservatively bet on a 4ghz base clock at launch, with turbo around 4.5ghz. I expect OC headroom to be a lot higher than 14nm LPP or the 12nm parts.
And if Rome has a 64c/128t part has a halo SKU then I'd expect 8 core CCXs.

And before anyone posts stupid diagrams about a central control hub and 9 different dies on a package, they're fakes from some nobody on twitter:
twitter.com/chiakokhua/status/1041487772429705216
twitter.com/peresuslog/status/1041514114789597185

Attached: 1.jpg (720x540, 271K)

Other urls found in this thread:

cl.cam.ac.uk/~pes20/armv8-mca/armv8-mca-draft.pdf
warosu.org/g/thread/S61550845#p61572265
twitter.com/SFWRedditImages

Seems like they could easily increase CCX size so long as the caches were taken care of.

Attached: 14-1080.3338864818.png (1920x1076, 71K)

If they hit a 10-15% IPC increase intel will be hurting in pretty much every market.
Higher IPC and a big bump to clocks will put them right at the top of gaymen benches in addition to dominating enterprise workloads.

how about AVX 256/512bit ?

On one hand AVX512 is pretty low on the list of things that matter to consumers, on the other hand having an FPU actually capable of processing these large vectors would also have massively higher throughput for smaller ops as well.
I don't know if AMD has any plans to increase FPU width, but I think they'll eventually be forced to.

The trend we saw at HotChips this year was tons of new core arch all centered around huge vectorized workloads.

It's a very conscious, deliberate, and sensible choice for Zen to have been designed with a 128-wide AVX path.

Intel's wide AVX2/AVX512 paths use so much power they have to be powered down when not used, and the whole chip has to be clocked down when they are used. The unspoken result is that if you are doing anything at the same time that does not require AVX2 or AVX512, you get worse performance than if you weren't using it. The net result: whole-program optimised code actually (properly) _avoids_ using AVX2/AVX512 in some cases.

The other unspoken result is that since unchoking and powering up the AVX arrays is done speculatively (because otherwise it would cause a huge latency hiccup), this causes an exploitable Spectre side-channel leak. Doing it non-speculatively would... cause an exploitable remote side-channel leak.

It remains to be seen if that kind of power gate delay can be exploited with any other coprocesssors/subprocessors in modern designs, but almost none of those are as big or as unwieldy as Intel's gigantic AVX ALUs.

Intel's FPUs definitely do run hot when crunching 512bit AVX, the power spike is pretty significant as well.
I think they were banking on it being more viable at smaller process nodes. Obviously that didn't work out for intel and their 14+++++++ refreshes.

bump for the drama

Nope, making 8-core CCX would require a lot of crossbars and the die will also be way bigger

>die will be bigger
>7nm vs 14nm
>what is scaling factor

Either CCXs grow in core count, or they add more CCXs per die. Either way complexity increases, theres no way around that.
Rome isn't going to have 8 or 9 dies on package.

chiplets are the future.

>7nm vs 14nm
It's TSMC's 7nm -> GloFo's 14nm, they will most likely go with more 4 core CCXes per die, no need to do a CCX redesign

Attached: iedm-2017-intel-10-xtor-comparison.png (1054x919, 108K)

Sure, but theres still a limit. Just like with triple and quad patterning in lithography, you exponentially increase complexity with every pass. With every chip added to a package in an MCM you dramatically decrease yields.

Why do you think they call it infinity fabric? If on 7nm they can reach 64 core CPUs on 5-3nm they could probably go 128 cores

Protip: You have no idea what any of the words you're using actually mean
Having a data fabric capable of scaling to N dies doesn't mean that its possible to manufacturer it. Getting a package with 8 dies, or 8 core dies, and 1 die that has external memory controllers, would have magnitudes lower yields than a package with 4 dies.
Infinity Fabric has literally nothing to do with this. Nothing.

bump for perfect cpu to encode my vietnamese cartoons with

i will convert everything to av1

>Having a data fabric capable of scaling to N dies doesn't mean that its possible to manufacturer it
Says you

Attached: never-saw-it-coming-the-worst-technology-predictions-of-all-time-12-638.jpg (638x479, 58K)

that's the idea. i read in av1 thread that they only really support mp4 containers though you can use mkv with av1 based on wikipedia. dunno how that will affect compatibility, though.

Nah intel shills will just overclock their housefires to disastrous 5.8GHz and burn their neighborhood to the ground just to prove they're still faster

Well they will be releasing yet another generation of 14nm parts, so trying to squeeze out a few more mhz is pretty much all they can do.

I'm hyped for Zen2. I've been an Intlet for my whole life but now I'm seriously considering getting AMD

I really don't think Intel will ever recover to be honest. Just the thought of Rome having 64 cores is enough to make me think they will never beat AMD, the gap is just too big.

Attached: 1537900855124.png (672x794, 44K)

why aren;t these compared against xeons?

so you're buying a server farm? nothing can encode av1 now at acceptable speeds, aside from that.

because hedt chips are compared with hedt chips. xeons are way out of the price range of those.

Do you see any Epyc there?

Kool aid drinking fag.

>just wait

Reminder that intel has nothing to challenge Zen until at least 2021~
Reminder that intel continued the lifespan of the 14nm process in their Xeon lines which means 10nm is still broke city incapable of producing big parts.


Reminder that Zen3 with 8 channels of DDR5 will have 307.2GB/s of memory bandwidth with 6000mhz DIMMs and support multiple TB per socket.

doesnt matter

ummm sweaty, intlel has netflix :)

Attached: 1514937277489.jpg (626x657, 81K)

>666 gorrilions of dollars
>can't make their 10nm work
Why don't they just, like, bribe the laws of physics to make it work?

Too expensive. Better use that money to bribe software developers to crash their programs when used with a cpu of the competition :)

Can't even tell if this post was ironic or unironic, it's Intel we're talking about here.

Attached: 1500377406924.jpg (2083x1405, 462K)

>unknown IPC uplift over Zen1
10% worse case, 30% best case prob somewhere inbetween
>unknown transistor count increase
enough, if its 16c dies its going to be at least 2x higher, maybe a incremental increase if its small 8c dies
>unknown clockspeed increase
if its a 16c die then prob small, if its a 8c die 5ghz will be a realistic boost/all core overclock speed

housefire meme shit

There a nice paper for you on this topic, it's not has bad has you think.

Since the interpose doesn't need to be on a current low yield Tech and can often just be simple older high yeild notes in the 96% yield rating and with very little active compononent in them would increase the overall performance.

4/4.5Ghz would be such an abysmal improvement for this big node jump.

This, the jump from 14nm to 12nm was roughly +300mhz, 7nm should give at least a +400mhz clock bump over 12nm

>TSMC’s 5nm Fin Field-Effect Transistor (FinFET) process technology is optimized for both mobile and high performance computing applications. It is scheduled to start risk production in the second half of 2019.
So 7nm Zen 2019 and 5nm in 2020?

Since it's going through the HP process, there's likely going to be a frequency increase.

Zen 3 when? Is it 2020? 5nm basically 7nm+ right?

For TSMC their 5nm node is still FinFET as far as I know. Only IBM, GloFo, and Samsung were zealously pursuing GAA as early as possible, and GloFo dropped out of the race. IBM sold their foundry business to GloFo, so the only one left was Samsung. I think Samsung stated they're not transitioning to GAA until 3nm, though their 3D stacked NAND design actually is a GAA structure, so they have a lot of experienced building them on larger nodes.
As of yet I haven't heard much about TSMC's 5nm plans, aside from the fact that they have them. Its unclear if they'll have a whole new BEOL, if it'll be a half shrink with some performance boosters, or what.

The 5nm GAA process that IBM/Samsung/GloFo worked on had 40% higher FMAX than their 7nm FinFET though. Shame it'll never materialize with Global Foundries.

Xnm is just a marketing meme, it doesn't measure anything inside the chip. And there's no reason to believe frequency scales linearly with feature size, because even tho the electrons travel shorter distances, you can't use voltages as high as with a bigger transistor.

Smaller gates don't need high voltage to hit high frequency, so half of your comment makes zero sense. Drive voltage is always lower with smaller devices.
Gate delay itself doesn't necessarily limit clocks in any way.

>Gate delay itself doesn't necessarily limit clocks in any way.
Then what limits the frequency at a certain voltage?

Nuances of the process itself. The transistors themselves, their w length/physical surface area, their effective electrical control over the channel, the metals used, their insulation, etc. There are dozens of variables that contribute to what clocks a process can achieve.

But how can decreasing the size of the transistor while keeping everything else constant increase the maximum frequency if not by shortening the time the electrons take from one point to another?
And my first point wasn't just that you had to lower the voltage. It was that maybe as you shrink the chip there are diminishing returns in the frequency increases, for example if maximum voltage scales non-linearly with size while frequency at X voltage scales linearly or even non-linearly with opposite convexity.

Transistors are analog devices, smaller ones are more sensitive to changes in voltage, so they switch reliably at lower voltages. They also proportionately have more of an issue with the off state not actually being off for what its worth.
Shorter channel devices can actually have longer gate delay depending on the type of structure. It isn't an artifact of feature size alone.

>I really don't think Intel will ever recover to be honest. Just the thought of Rome having 64 cores is enough to make me think they will never beat AMD
>a billion dollar company will never recover
>when a smaller company like AMD recovered after getting destroyed by Core2 and Corei7 for decades
Literally delusional AMDrone

Even with node advantage AMD hasn't eaten a significant market share while Intel can't produce enough 14nm+++++++++++++++ because they are in such high demand

You never hear Ryzen, Threadripper or Epyc shortages because nobody save for a few buys them, companies would rather buy discounted Xeons than buy Epyc

AMD does AVX256 by 2 x AVX 128 I believe. The Zen2 will probably enable AVX512 by doing 4 x AVX 128. Since AVX512 recently came out for Intel, AMD can be forgiven for not implementing it. But there is a good chance for it to be on Zen2.

UMA

There is no way the turbo is only 4.5ghz
It's 4.35Ghz on the 2700X and 4.45Ghz on the 2950X.

It'll almost surely be 5Ghz. It's confirmed to be on the HPC process.

Doesn't even need higher IPC. 2700X IPC is less than 3.5% behind the 8700k. Just needs higher clocks and lower memory latency.

>Implying that security issues on Intel CPUs haven't done massive damage to Intel's prestige and reputation on the Enterprise/SMB markets.

>Intel Marketing Shill in complete damage control

Major OEMs are already bitching about 14nm supply issues a.k.a Xeon SKUs.

There simply not enough units on the market and that's why Xeon SKUs such insanely high price points. (Protip: Intel's shareholders aren't going to allow massive discounts either because they hooked on the massive profit margins from nearly a decade near-dominance)

F500 companies in an upgrade cycle aren't going to wait 6+ months on back orders. They are going to be getting Epyc servers and will not look back at Intel solutions.

> AMD is going to snag more far market share in the enterprise/SMB world then they ever did with Opterons within the next five years and Intel is completely powerless to stop it.

Hmm ok I'll trust your word

I was just throwing out conservative estimates.
4ghz for base clock would only be 600mhz away from the 1800X at launch, and I have a ton of data on power per core and voltage scaling for launch Summit Ridge.
They should be able to hit 3.5ghz under 1v.

>Zen3 will most likely be on a new socket with DDR5 since the spec is final and early samples are already in the wild
>Low power DIMMs hit 5500mhz
>standard power DIMMs hit 6000mhz
>a dual channel CPU with early sample DDR5 could have twice the bandwidth compared to the average DDR4 kit today.
>wide IO memory will probably be standard as a L4 by then if EDRAM isn't ubiquitous

I for one look forward to the future of AMD computing

when is zen2 likely to drop? given nvidia shat the bed on pricing I have some money for computer upgrades in the next 18-24 months.

Word was that Epyc 2 enterprise chips would sample end of this year. They'd probably be available in full volume Q1 or early Q2 2019. I'd expect desktop Zen 2 chips in the same time frame.

As of Ryzen 2k OCing is basically non-existent. You can expect any future desktop processors to be pushing the limits of the process and then some if the motherboard permits it.

14nm LPP and 12nm had little OC headroom because of the process they were based on. GloFo had to make a special vt for AMD just to get clocks above 3ghz because of how the process scaled with voltage.
The Zen arch wasn't limiting clocks, the low power ARM SoC oriented process was the limiting factor.
TSMC's 7nm HP node doesn't have the same issue, or its significantly lessened.

>Major OEMs are already bitching about 14nm supply issues a.k.a Xeon SKUs.

Let's be real. That's because Intel is in very high demand and companies everywhere are expanding out their servers. It's the same reason why RAM is so expensive. It's a demand issue as evidenced by the fact that components are fairly inelastic right now, eg corporations still buying up RAM like candy at inflated prices, or Intel's increasingly inflated prices. AMD has benefiting from the rise in demand too but that's because intel is becoming too hard for systems OEMs like HP and Dell to get. If the trend continues AMD will suffer similar "problems", but you won't be saying OEMs are bitching about supply issues because you're a fanboy.

>14nm LPP and 12nm had little OC headroom because of the process they were based on
This is untrue. They had little OC headroom because the Zen designers spent a ton of time working on the boost and frequency scaling behavior of Ryzen, as evidenced by the fact that they let the CPU core voltage go as high as 1.5V in short bursts to hit the 4.35V boost. What we can expect is a safe base clock in the mid-high 3.x Ghz, with a moderate all core boost in low to mid 4.xGHz and a very high single core XFR boost in short increments at or above 5GHz. I also wouldn't put much stock in the supposed 40-50% performance increase. What other companies found was going from TSMC's 16/14FF to 7 low power was an almost negligible increase in frequency, even with HP I doubt they'll get a lot more and those measurements are usually done on relatively small cores like an old ARM reference SoC. Around 5GHz is a safe bet. Difference is GF 7LP was based on IBM tech designed for their power guzzling mainframes, TSMC despite their high performance claims doesn't have that sort of pedigree.

No, Zen only had limited XFR headroom because of the process the chips were made on. XFR is effectively auto overclocking, it has sensing paths for voltage, so it can tell if the clock its trying to achieve ill be stable before the settings are finalized.
Limited clocks are an artifact of process, not core arch.

Flat out you're talking out of your ass.

My 1600X does 3.7ghz on 1.0875v.
These should do 4ghz on under 1v. Wtf. It's twice the density.

>unknown IPC uplift over Zen1
ipc stays the same (remember amd can do 6 instructions vs 4 than intel )and amd jacks up the clocks to match intel
>unknown transistor count increase
doesnt really matter since there is no possible way to understand or compare since there isnt anything on 7nm except for the zen 2

what matters is
>increased cache on all aspects
>smaller IF breakthrough
>slap ISA for every avx instruction up untill 512 and offer full support not half precision
>5.0ghz should be achieveable for the majority of the chips and not creating another market by artificially selling shitty chips
>we dont fucking know what kind of process amd sampled from tsmc but history taught us that amd never goes with either lpp or hp its always in between
>epyc 2 is already on the hands of certain organizations including cern when the time comes (i assume in 3-4 weeks ) i will release some photos of it

the problem is amd at 7nm is probably already close to removing 150watts/cm2
plus add to that x86 is becoming the main problem of the pc industry with its retarded limitations(up to 8 cores it can scale perfectly but after that the ISA essentially bottlenecks the entire pipeline )
there are also plans to jump on carbon nanotubes since you can literally just copy and paste the entire cpu PT into them and create an identical but currently it costs 75% more to produce..
then you have the biggest problem all the uarch we see today cpu..gpu...arm bla bla is literally based on neumann uarch a fucking 80 years old uarch that is still the base for everything that has a logic on it..

>plus add to that x86 is becoming the main problem of the pc industry with its retarded limitations(up to 8 cores it can scale perfectly but after that the ISA essentially bottlenecks the entire pipeline )
That is interesting, you got any sources for that? I really wanna read more.

its debateable till this day...the only source you can get is to find a serious developer and ask him how many manhours a company needs to make a program that perfectly scales above 8 cores...

tl dr think of it like how async works on amd..but on sync....add to that the fact that cpu cores arent really capable of flushing flipping and switching on the fly and you get the idea...(but eventually we will end up having cores that work best on gpu's into the cpu's in the future )

>comparing marketing and economy predictions to physic limitation
Cool story bro

Intel can't compete, and can't keep up with yields even as AMD starts dominating the new PC market (AMD has actually been picking up a fuck load of market share, both in PC and enterprise, not that you'll believe that) because intel's still, STILL suffering from low yields per wafer, while AMD is enjoying something stupidly high like 80-85% yields on chips they can use in the TOP END 1800x, 2700x, or TR level chips.

>its debateable till this day...the only source you can get is to find a serious developer and ask him how many manhours a company needs to make a program that perfectly scales above 8 cores...
But that's not about the ISA. It doesn't matter what ISA you use in this case because what's difficult about scaling above 8 cores isn't the instruction set or the CPU. What's difficult about that problem is to sufficiently decouple your program's interdependencies to the point where it can all run in 8+ parallel threads. This is a difficult problem because so many tasks *inherently* depend on the result of what has already happened, and the difficulty is to find alternative solutions that achieve the same result or decouple the existing solution.

I really thought you knew something about the x86 architecture that I did not here, because I thought about it and couldn't really think of anything about the instruction set that was problematic in this regard, but it turns out you did not.

As for what you said in your other post >then you have the biggest problem all the uarch we see today cpu..gpu...arm bla bla is literally based on neumann uarch a fucking 80 years old uarch that is still the base for everything that has a logic on it..
I have no idea what you would otherwise suggest than a Von Neumann architecture for general computation, but regardless of that, Von Neumann behaviour is just a facade on modern processors that is practical for the programmer.

x86 processors today run a frontend interpreter that translates x86 instructions to an internal micro-op, which is an entirely different set of instructions that allow Intel and AMD to work with them much more easily and these instruction sets are most decidedly not Von Neumann.

Some of the things they do on top of micro-ops are to reorder data fetches, instructions etc. completely dependent upon what would be the most optimal, and so that way you can fill up the CPU pipeline in the most optimal way (e.g. by already having all the data ready before a computation runs). Other behavior implemented on top of these micro-ops is speculative execution, intended to always keep the pipelines full.

Both of these behaviours are absolutely not Von Neumann behaviour. A true Von Neumann architecture would run all instructions in-order and would never speculatively execute (would wait on branching).

These ellipses... poster's gotta be 50+

ofc its a problem of the ISA
x86 was made to conserve energy as much as possible
over the years with every hack they introduced it has become a clusterfuck

>ofc its a problem of the ISA
It literally isn't. Concurrency is just as hard on any other ISA too, whether it be DEC Alpha, SPARC or ARM. The problem isn't the ISA of either of these, but rather that parallel execution is fucking hard and requires decoupling of interdependence in your problem.

>x86 was made to conserve energy as much as possible
What

>over the years with every hack they introduced it has become a clusterfuck
Yes, x86 is absolutely a clusterfuck, but no-one can say it isn't high performance.

there is a paper
cl.cam.ac.uk/~pes20/armv8-mca/armv8-mca-draft.pdf
that talks how arm simplified their parallerism in relation to x86

plus how would you know that it is high perfomance? x86 was an isa specifically made to save energy and nothing more
over the years and with the addition of countless extensions you can say that it has increased its perfomance but the energy saving features of it essentially are gone

I'm surprised no one has mentioned the chiplet+uncore rumor yet. Do you more knowledgeable people think that AMD could have went with a 14nm central uncore with 7nm core chiplets as described by some sources? What do you think the advantages/disadvantages will be, if they did pull a surprise and went this route?

Of course not, everything he said is plain bullshit.

because removing the tie of memory worked so well for the 2990wx under windows right?
fact is unless windows fix their shitty SC no amount of advancements will ever be enough since for whatever reason it only helps intel chips

That tune is quickly changing in light of the loads of hardware-level security flaws.

The whole "nobody got fired from buying Intel" meme got completely BTFO. The supply issues is just icing on the cake.

You can easily tell by how much Intel marketing is working overtime on damage control campaigns that are pretty much amount to "PLZ USE US! AMD HAS BAD CHIPS AND SUPPLY ISSUES!"

In light of recent developments is deliciously ironic.

AMD's "chiplet" gambit is paying off massively in the SMB/Enterprise world. Intel was going the same route after having trouble making Broadwell-E and Broadwell-EP en mass but AMD beaten them to the punch.

insufficient registers
partially transparent on devs
frontend decoding and sequencing is such a mess that requires quite a lot of hardware pipestages
no defined sw/hw interractions
fp still needs to use a flat model because of the limitations of the x87
IA 64 was some order of magniture better but the typical intel bullshit more or less made it irrelevant and the x64 extension got adopted
not to mention it was a vliw and itanium had simd similiar to an ati 4xxx but less advanced for it to be able to clock higher..

Micro ops that the arch can process per clock and observed IPC in varied workloads, measured in benchmarks used by every reviewer ever, are not the same thing.

Your entire post is full of bullshit and is a prime example of Dunning-Kruger.

UMA DELID CIA

Noone was ever able to write a decent compiler for ia64 whilte the arch was actually relevant. On top of that, the x86 compat/emulation was a shitshow.
If you had the resources to hand-code assembly, good for you, you were actually able to utilize the ia64. Otherwise, shit son, you were out of luck. The only good use for it was physics/weather simulation.

>my predictions are right because I say so
ok

Attached: 2wAe2CnbRRwUt0qHhMo_MikaVvHBy-nGjLa4iglIBV8.jpg (940x492, 74K)

AMD projects they'll have 5% of the server marketshare by end of year. It's big for them but overall its next to nothing.

The compiler was more than decent - it was superb, excellent whatever superlatives one can find.
It's just that rewarmed VLIW is suitable for the same tasks as any VLIW - technical computing. No static compiler will change that fact.

Reminder that EPYC-based machines didn't actually start shipping to normal customers until 2Q this year. Unless you were a big player like Amazon/Google/MS/Facebook, you literally couldn't even backorder them.
Based on typical server lifespan, that could be between a 1/5 to 1/3 of total sales this year. That's pretty significant.

If the electricity used to power the vacuum is produced by nuclear power, does that not count?

Name one compiler available in 2003-2004 (because that's when amd64 came out and everyone and their dogs sighed with relief that they don't have to deal with itanium insanity) that could actually handle the insane rules of ia64 instruction packing with general (non-physics/rendering/other insanely parallelizable) code. Protip: you can't.
The architecture was over-specialized, hence shit for general computing.

>Core2 and Corei7 for decades
Underage please go

warosu.org/g/thread/S61550845#p61572265

Attached: 1536694146733.png (1200x800, 164K)

>tfw skipping DDR4 completely

DDR3 sandy bridge here, going Zen 3 on DDR5, the jump in performance is going to be colossal

DDR3 Haswell here. Can't fucknig wait.

DELID THIS

IT'S NOT FAIR INTELBROS

Attached: AMD-EPYC.jpg (494x345, 68K)

>On a market that changes server every decade getting a 5% on a single years is next to nothing

brian plis you arent intel ceo anymore you can cut the bullshit

what kind of nuclear reactor we are talking about

Attached: Untitled.jpg (965x722, 121K)

>brian
pic related

Attached: 1520577207220.png (378x343, 280K)