SemiAccurate Forums  

 
Go Back   SemiAccurate Forums > Main Category > Article discussion

Article discussion Talk about front page articles


Reply
 
Thread Tools Display Modes
  #51  
Old 07-17-2017, 08:04 AM
chithanh chithanh is offline
2^10
 
Join Date: Jun 2010
Location: Germany
Posts: 1,174
chithanh is on a distinguished road
Default

** OT removed **

Last edited by Grandma Guillotine; 07-17-2017 at 04:29 PM. Reason: ** OT removed **
Reply With Quote
  #52  
Old 07-17-2017, 08:32 AM
Melkhior Melkhior is offline
8-bit overflow
 
Join Date: Apr 2010
Location: France
Posts: 416
Melkhior is on a distinguished road
Default

Quote:
Originally Posted by Woolybully67 View Post
Unless you are a Baidu, Amazon, MS, or Facebook who have had months to run your own benchmarks you'd be hard pressed to know what's going on exactly
There's a lot of company who've had access to Skylake for a while.

AVX-512 support in SKX is seriously ... funky. Everyone focus on the second FMA missing in the HCC and LCC dies (missing, not fused off...), but that's not the only interesting point.

1) Port 0+1 becomes a single big AVX-512 port on all SKX. Side-effect, all AVX2 instructions formerly available only on port 0 also become available on port 1 (psslw, etc.) Nice for some workloads.

2) Latency for the port 5 FMA (only used for AVX-512) is higher than on port 0+1. But there is some possible bypass in some cases (presumably chained FMA, same as on the Cortex A57) ...

3) With AVX512VL, you can have 256 bits AVX-512 instructions (or even 128 bits) ... which are similar, but not identical, to AVX2 instructions - you gain the masking. How do they behave frequency-wise? And on which port? Preliminary results indicate they behave "same as AVX2", at least if you don't mix them with 512 bits AVX-512 ...

Optimizing performance on Skylake is not going to be easy :-)
__________________
Not speaking for my employer.
Reply With Quote
  #53  
Old 07-17-2017, 09:27 AM
London Dave's Avatar
London Dave London Dave is offline
640k who needs more?
 
Join Date: Jul 2010
Location: UK
Posts: 778
London Dave is on a distinguished road
Default

** OT removed **
__________________
IT middle management is just like Dungeon Keeper, but with technicians and purchase orders instead of imps and spells

Last edited by Grandma Guillotine; 07-17-2017 at 04:28 PM. Reason: ** OT removed **
Reply With Quote
  #54  
Old 07-17-2017, 11:55 AM
gruffi gruffi is offline
2^10
 
Join Date: Jun 2010
Location: Silicon Saxony
Posts: 1,083
gruffi is on a distinguished road
Default

** OT removed **

Last edited by Grandma Guillotine; 07-17-2017 at 04:28 PM. Reason: ** OT removed **
Reply With Quote
  #55  
Old 07-18-2017, 04:09 AM
aaronspink aaronspink is offline
2^10
 
Join Date: Feb 2010
Posts: 1,050
aaronspink will become famous soon enough
Default

Quote:
Originally Posted by Melkhior View Post
There's a lot of company who've had access to Skylake for a while.

AVX-512 support in SKX is seriously ... funky. Everyone focus on the second FMA missing in the HCC and LCC dies (missing, not fused off...), but that's not the only interesting point.
By which of course you actually mean not actually missing, actually fused off.
__________________
speaking for myself inc.
Reply With Quote
  #56  
Old 07-18-2017, 04:41 AM
chithanh chithanh is offline
2^10
 
Join Date: Jun 2010
Location: Germany
Posts: 1,174
chithanh is on a distinguished road
Default

Given that Intel's Skylake-SP presentation talked about the "glue" which AMD used in Epyc, it would be interesting to look at core/die scaling of price and performance.

At Anandtech's Skylake-X review, we saw the following Cinebench R15 scores:


From AMD's Threadripper marketing video, we saw 3062 points in Cinebench R15 for the 1950X and 2431 for the 1920X. Intel's 7900X got 2167 so the setup is comparable to Anandtech's.

This means:
  • Intel 7800X (6C/12T) 1333 points to 7820X (8C/16T) 1734 points to 7900X (10C/20T) 2169 points almost linearly scales with core count. Caveat is that the single core score of the 7800X is 3-4% lower than the other two, so the performance scaling is in fact slightly less than linear.
  • AMD 1600X (6C/12T) 1232 points to 1920X (12C/24T) 2431 points is almost linear increase going from single die to two dies.
  • AMD 1800X (8C/16T) 1625 points to 1950X (16C/32T) 3062 points loses around 6% performance per core when going from single die to two dies.
We don't have single core scores from Threadripper, but given that clocks are very similar (only base clock is slightly higher for 1600X/1800X) I guess these are the same.

List price per core
7800X: $65 7820X: $75 7900X: $100
1600X: $42 1800X: $62 1920X: $67 1950X: $62

This means that - with the exception of the R5 1600X which addresses a different market segment - AMD has actually managed to keep prices per core pretty constant, and loses only 1-6% of Cinebench R15 performance in the process. Infinity Fabric scaling is slightly worse than Intel's mesh, but the price that AMD is able to command thanks to IF more than makes up for it.

Last edited by chithanh; 07-18-2017 at 04:42 AM.
Reply With Quote
  #57  
Old 07-18-2017, 06:14 AM
gruffi gruffi is offline
2^10
 
Join Date: Jun 2010
Location: Silicon Saxony
Posts: 1,083
gruffi is on a distinguished road
Default

Quote:
Originally Posted by chithanh View Post
This means that - with the exception of the R5 1600X which addresses a different market segment - AMD has actually managed to keep prices per core pretty constant, and loses only 1-6% of Cinebench R15 performance in the process. Infinity Fabric scaling is slightly worse than Intel's mesh, but the price that AMD is able to command thanks to IF more than makes up for it.
I haven't seen any proof of IF scaling worse than Intel's mesh. The former has advantages in some scenarios, the latter has advantages in some other scenarios. If you look at memory bandwidth then EPYC should be able to scale better across all cores.
Reply With Quote
  #58  
Old 07-18-2017, 06:27 AM
Melkhior Melkhior is offline
8-bit overflow
 
Join Date: Apr 2010
Location: France
Posts: 416
Melkhior is on a distinguished road
Default

Quote:
Originally Posted by aaronspink View Post
By which of course you actually mean not actually missing, actually fused off.
Nope, everything I heard from Intel lined up with "not architected". I might have misunderstood, but AFAICT absolutely no SKUs from HCC or LCC have the second FMA, so...

The fact the second FMA feels "bolted-on" (with the added latency, the awkward bypasses, and the extra conflicts with port 5 instructions) tells me it's an afterthought only here so that HPC-oriented SKUs get better Linpack than BDW, and those are HCC dies (same as OPA). It's otherwise mostly useless, so no point in wasting silicon in the higher-volume HCC and LCC dies.

Quote:
Originally Posted by chithanh View Post
Given that Intel's Skylake-SP presentation talked about the "glue" which AMD used in Epyc
That one got me ROFL. I have yet to try Epyc, but SKX inter-core latency isn't exactly awesome compared to BDW or KNC. Similar to KNL AFAICT, and that's not good news.
__________________
Not speaking for my employer.

Last edited by Melkhior; 07-18-2017 at 06:30 AM. Reason: 2nd quote/answer
Reply With Quote
  #59  
Old 07-18-2017, 06:32 AM
NTMBK's Avatar
NTMBK NTMBK is offline
2^11
 
Join Date: Sep 2012
Posts: 3,163
NTMBK will become famous soon enoughNTMBK will become famous soon enough
Default

Quote:
Originally Posted by Melkhior View Post
Nope, everything I heard from Intel lined up with "not architected". I might have misunderstood, but AFAICT absolutely no SKUs from HCC or LCC have the second FMA, so...
Core i9 7900X has the second FMA, but is made from an LCC die.
Reply With Quote
  #60  
Old 07-18-2017, 07:09 AM
Melkhior Melkhior is offline
8-bit overflow
 
Join Date: Apr 2010
Location: France
Posts: 416
Melkhior is on a distinguished road
Default

Quote:
Originally Posted by NTMBK View Post
Core i9 7900X has the second FMA, but is made from an LCC die.
OK. I only looked at the Xeon SKUs, I never cared much for overpriced under-featured desktop processor. My mistake.

If it is indeed an LCC die (couldn't it be a really stripped-down XCC like the 6126 ?), then it's really weird that nothing using LCC or HCC has the second FMA in the Xeon line-up if it's there architecturally...
__________________
Not speaking for my employer.
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -5. The time now is 09:36 PM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
SemiAccurate is a division of Stone Arch Networking Services, Inc. Copyright © 2009 Stone Arch Networking Services, Inc, all rights reserved.