Page 3 of 7 FirstFirst 12345 ... LastLast
Results 21 to 30 of 64

Thread: Trinity Desktop Review

  1. #21
    640k who needs more?
    Join Date
    Apr 2011
    Posts
    762
    Quote Originally Posted by esrever View Post
    Trinity actually has much better IPC than llano if you just ignore the float point benchmarks. In the float point benchmarks, its kinda unfair with it being 2 FPU vs 4 and all.
    I wouldn't call it unfair when a Bulldozer/Piledriver module takes up roughly the same amount of die space as two Greyhound/Husky cores.

    It is a trade-off that AMD has decided to make.

  2. #22
    8-bit overflow
    Join Date
    Feb 2011
    Posts
    419
    Quote Originally Posted by kalelovil View Post
    I wouldn't call it unfair when a Bulldozer/Piledriver module takes up roughly the same amount of die space as two Greyhound/Husky cores.

    It is a trade-off that AMD has decided to make.
    I would love to know exactly why that is the case. Wasn't the point of CMT that they would be able to save a bunch of die-space by sharing the FPUs?

  3. #23
    Quote Originally Posted by kalelovil View Post
    I wouldn't call it unfair when a Bulldozer/Piledriver module takes up roughly the same amount of die space as two Greyhound/Husky cores.

    It is a trade-off that AMD has decided to make.
    It's not so straightforward though, Trinity has a much wider array of instructions and these offer huge increases in some benchmarks. I mean Llano doesn't even have SSE4.2!

    If you cut Trinity down so it is feature identical with Llano, you will find that 1 Piledriver module will be smaller than 2 Stars cores, maybe even 3.

    But yes, the shared FP is a trade-off AMD decided to make, and it must reflect. However, it is a pretty good trade-off considering most of those FP operations can be done way, way better on the GPU. Now it is up to the developers to utilize it...

  4. #24
    640k who needs more?
    Join Date
    Apr 2011
    Posts
    762
    Quote Originally Posted by Guild View Post
    If you cut Trinity down so it is feature identical with Llano, you will find that 1 Piledriver module will be smaller than 2 Stars cores, maybe even 3.
    Is there any evidence of that? I would have thought many of the additions would take up hardly any space, especially considering how they're implemented (e.g. AVX).

    I thought the advantage was more in terms that 4 Bulldozer modules would only require a 4-core client crossbar, while 8 K10 cores would require a much more complicated 8-core client crossbar.



    Quote Originally Posted by Guild View Post
    But yes, the shared FP is a trade-off AMD decided to make, and it must reflect. However, it is a pretty good trade-off considering most of those FP operations can be done way, way better on the GPU. Now it is up to the developers to utilize it...
    That has been my opinion as well, but here we are in 2012 with the APU's GPU component still acting like a glorified IGP with little processing synergy between it and the CPU cores.
    I realise that this depends on more than AMD alone, and that it is highly difficult and innovative task for AMD's engineers, but that doesn't change the situation for consumers.

  5. #25
    8-bit overflow
    Join Date
    Apr 2012
    Posts
    483
    Quote Originally Posted by kalelovil View Post
    I wouldn't call it unfair when a Bulldozer/Piledriver module takes up roughly the same amount of die space as two Greyhound/Husky cores.

    It is a trade-off that AMD has decided to make.
    well modules are somewhat smaller than the 2 star cores if you look at the trinity vs llano die shots. Trinity had a lot more area dedicated to the gpu and the new front end.

  6. #26
    640k who needs more?
    Join Date
    Apr 2011
    Posts
    762
    Quote Originally Posted by esrever View Post
    well modules are somewhat smaller than the 2 star cores if you look at the trinity vs llano die shots. Trinity had a lot more area dedicated to the gpu and the new front end.
    If you're counting just the die area, they're not any smaller.

    Bulldozer/Piledriver module:
    ~19.3mm2

    Husky core:
    ~9.7mm2

    The crossbar needed for Trinity is smaller however, likely because it only has 2 CPU clients rather than 4.

  7. #27
    8-bit overflow
    Join Date
    Oct 2010
    Posts
    272
    Quote Originally Posted by kalelovil View Post
    Is there any evidence of that? I would have thought many of the additions would take up hardly any space, especially considering how they're implemented (e.g. AVX).
    there are 2 128bit int SIMD units in the FPU that would take quite a bit of space. There is also the fact the FPU is FMA so that makes them bigger as well. the problem with bulldozer form a single thread FPU performance perspective is that the store bus from the FPU is only 128bits per core.

  8. #28
    640k who needs more?
    Join Date
    Mar 2010
    Posts
    995
    Quote Originally Posted by itsmydamnation View Post
    the problem with bulldozer form a single thread FPU performance perspective is that the store bus from the FPU is only 128bits per core.
    That is one of the problems indeed. Also,when looking strictly at single threaded code which is dominated by SSE instructions we can see that integer cores do not sustain a constant 2x128bit loads per cycle in the FP load buffer. The whole FPU unit's potential cannot be utilized by a single core,which goes against what AMD stated numerous times and against an idea of shared resources being available to each core(they are but apparently each core cannot use them efficiently). It is evident that in case of FP/SSE workloads one needs to have MT code in order to get the most out of the FLexFP unit. Agner Fog states that shared front end also is a culprit here among other bottlenecks that can potentially hurt the utilization of the whole floating point unit. Let's hope SR core alleviates some of these bottlenecks.

  9. #29
    8-bit overflow
    Join Date
    Dec 2009
    Posts
    292
    It is a very interesting architecture that AMD has created.

    There are several design trade-offs that I find interesting:

    INT IPC

    The BD core (and now PD) employ a design of higher frequency at the expense of IPC. It looks like AMD has done quite well with this since PD now has INT IPC somewhat higher than Stars while having clock speeds that appear to be capable of ~4.5Ghz on a quad core design.

    FP vs APU
    Integration of the graphics core and optimization for compute performance within the GPU has allowed AMD to create a two pronged strategy. First, the integrated graphics provides an ideal candidate for mobile computing. Since laptop sales are growing much faster than desktop, this seems like a very good idea. Second (and this one isn't as clean cut), by relying on OpenCL and direct compute for good (or great) FP performance, AMD's new products are weak in the vast majority of FP intensive workloads (reminds me of the good ole K6 days ).

    While I believe that this strategy is good for the long run, it is painful today.

    I believe that new architectures (really new ones anyway) only pop up about every decade or so. Between times, the CPU companies do tweaks and call them new architectures. If we assume that the current AMD architecture is designed to scale for ~10 years, then the architectural design decisions seem quite good ..... despite their current weakness in FP.

    Where AMD seems the most lacking to me is in the process technology. Intel stays religiously an entire die shrink ahead of AMD. This gives Intel about 2 times the transistor budget within the same die space..... a pretty tough disadvantage to overcome IMHO.

    I remember very well how everyone thought AMD should never have purchased ATI. This was a gutsy move and I believe a very good one. Integrated CPU's with GPU's are the future, and AMD could very well found itself obsolete without the ATI technology.

    Look at how intel struggles to catch up to AMD now in the graphics department.

    If AMD has successfully interpreted the future, and OpenCL becomes more important than a strong FP unit, they will have guessed right on the design.

  10. #30
    Senior Member
    Join Date
    Sep 2011
    Posts
    866
    Nice scores yet author could have made more.
    BTW to be more accurate PD is 18.7% and 17% faster not just 15% for each test
    Kaveri, where are you? I can't wait any longer!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
WordPress Appliance - Powered by TurnKey Linux