Page 41 of 179 FirstFirst ... 3139404142435191141 ... LastLast
Results 401 to 410 of 1787

Thread: 3rd generation Bulldozer - Streamroller Architecture Discussion

  1. #401
    Senior Member
    Join Date
    Jul 2011
    Posts
    2,056
    Quote Originally Posted by del42sa View Post
    Sorry it was GordonBGood not Agner, but Agner had admitted himself he is not an expert in cache....



    http://www.agner.org/optimize/blog/read.php?i=192
    Humility..?

    =)

    [/OT]

  2. #402
    640k who needs more?
    Join Date
    Oct 2009
    Posts
    775
    As a thought experiment, lets make the WCC as big as you like. What happens on a read miss in L1? Oh, you have to look in both the WCC and the L2. It's not free you know!

    So unless the WCC is flushed, or smaller than the L1 (which forces flushing), then it will need to be looked up for cache coherency or L1 read miss - you might as well just have a decent L2 and be done with it.

    Seriously, if all you are doing is taking writes the L1 and combining them into one line then this should be done in the Store Queue to some small extent, otherwise you've got multiple writes into L1.

    As Exophase said, have a WT L1->L2 allows simplifying cache coherency (no need to lookup in L1). The question is whether this is really worth it compared to having a smaller, lower latency cache stack as Intel does.

    Quote Originally Posted by TESKATLIPOKA
    Where do you see the same write performance as L3 because I see 2x faster writes?
    I didn't. I was meaning that if I followed your logic then that would be the case, which it clearly isn't.
    Long live aceshardware!

  3. #403
    Does anyone know how much read and write transactions the L2 + L3 can do in Orochi?

  4. #404
    Senior Member
    Join Date
    Oct 2011
    Posts
    1,372
    Quote Originally Posted by Lightning View Post
    As Exophase said, have a WT L1->L2 allows simplifying cache coherency (no need to lookup in L1). The question is whether this is really worth it compared to having a smaller, lower latency cache stack as Intel does.
    Good Question

  5. #405
    Banned
    Join Date
    Aug 2011
    Posts
    424
    Quote Originally Posted by Lightning View Post
    As a thought experiment, lets make the WCC as big as you like. What happens on a read miss in L1? Oh, you have to look in both the WCC and the L2. It's not free you know!

    So unless the WCC is flushed, or smaller than the L1 (which forces flushing), then it will need to be looked up for cache coherency or L1 read miss - you might as well just have a decent L2 and be done with it.

    Seriously, if all you are doing is taking writes the L1 and combining them into one line then this should be done in the Store Queue to some small extent, otherwise you've got multiple writes into L1.

    As Exophase said, have a WT L1->L2 allows simplifying cache coherency (no need to lookup in L1). The question is whether this is really worth it compared to having a smaller, lower latency cache stack as Intel does.
    What's best for Intels process isn't necessarily the best for AMD/GFs process. I'd hope AMD would have simulated different cache designs before choosing the current one.

  6. #406
    Senior Member
    Join Date
    Oct 2011
    Posts
    1,372
    Quote Originally Posted by Megol View Post
    What's best for Intels process isn't necessarily the best for AMD/GFs process. I'd hope AMD would have simulated different cache designs before choosing the current one.
    then their simulations fail miserably...

  7. #407
    640k who needs more?
    Join Date
    Mar 2010
    Posts
    995
    AMD's Achilles' heel has always been cache design. Let's hope they learn from the past.

  8. #408
    I'd be interested in whether Jaguar's L1D is write-through then.
    <crap>It'd better not being WT, so that I can dream for a moderate-sized, middle-level private DCache, which lies between the small L1 DCache and the shared, large L2 $, in the future evolution of the Bulldozer family.


  9. #409
    640k who needs more?
    Join Date
    Mar 2010
    Posts
    995
    Jaguar will be an interesting chip for sure. It will be stuck at low(ish) clocks so it won't be directly comparable to SR core/module,but if I'd had to guess, I'd say it will be darn close to SR, IPC wise. But SR will be a high clocking design on top of its IPC improvements (vs BD/PD) so this has to be counted also.

  10. #410
    Quote Originally Posted by inf64 View Post
    Jaguar will be an interesting chip for sure. It will be stuck at low(ish) clocks so it won't be directly comparable to SR core/module,but if I'd had to guess, I'd say it will be darn close to SR, IPC wise. But SR will be a high clocking design on top of its IPC improvements (vs BD/PD) so this has to be counted also.
    The ULV SR and 17W Jaguar will be interesting to compare. Also, if the Jaguar/SR rumors in next-gen consoles are true.
    Andy "Krazy" Glew <3 I also like soup.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
WordPress Appliance - Powered by TurnKey Linux