SemiAccurate Forums  

 
Go Back   SemiAccurate Forums > Main Category > CPUs

CPUs Talk about processors and related technology

Reply
 
Thread Tools Display Modes
  #1  
Old 02-12-2010, 05:25 AM
Lightning's Avatar
Lightning Lightning is offline
640k who needs more?
 
Join Date: Oct 2009
Location: Canberra, Australia
Posts: 774
Lightning is on a distinguished road
Question CPU-cache performance simulator

Years ago, in a far, far away galaxy called Aceshardware, David Berkgvist (spl?) wrote a little program called CPUCache which attempted to show relative performance for different MPU cores and their caches - the basic characteristics of each could be tooled with in a text file.

Does anyone know of something similar and freely available?

I'm really interested to know things like:

Effective latency - what impact to bottlenecks have and where do they occur if I monkey with any part of the system.
Active ports vs pseudo-porting (true parallel cache cell access vs increased way-count).
Cache line length - what affect does the line length have when we place different demands on the cache.

If there isn't anything I can get my hands on I'm going to attempt to write my own. At first I'll just try to focus on the basics - instruction throughput for the core given a percentage of branches, loads and stores, a rough access pattern for instructions and data, fixed IPC, pipeline depth, clock speeds for core and uncore, and number of cores.

At this stage I'm going to assume the following for two "sample codes":

1) AI/DB/Server code - Instruction: 1/6 branches, 75% predicted/locality, 2/6 loads, 1/6 stores, 1 MB footprint. Data: 300 MB footprint, 70% 128 byte records, 30% 32 byte records, 70% random location for memory access.

2) Desktop code - Instruction: 1/8 branches, 92% predicted/locality, 2/8 loads, 1/8 stores, 2 MB footprint. Data: 50 MB footprint, 20% 128 byte records, 80% 32 byte records, 15% random location for memory access.

If anyone wants to chip in with suggestions or recommendations/corrections for my percentages above please drop a comment. Likewise if anyone thinks this would be useful drop me a line or comment, which should encourage me to work faster.
__________________
Long live aceshardware!
Reply With Quote
  #2  
Old 02-12-2010, 04:22 PM
redpriest redpriest is offline
8-bit overflow
 
Join Date: Jun 2009
Posts: 447
redpriest will become famous soon enough
Default

Hmmm part of the problem with some of those goals is that getting a *really* accurate level of performance is hard. Really hard. Especially given trade-offs you might not know about but are just as consequential; so I guess my question is, what level of accuracy are you looking for?
__________________
Speaking for myself.
Reply With Quote
  #3  
Old 02-12-2010, 06:14 PM
Lightning's Avatar
Lightning Lightning is offline
640k who needs more?
 
Join Date: Oct 2009
Location: Canberra, Australia
Posts: 774
Lightning is on a distinguished road
Default

Quote:
Originally Posted by redpriest View Post
Hmmm part of the problem with some of those goals is that getting a *really* accurate level of performance is hard. Really hard. Especially given trade-offs you might not know about but are just as consequential; so I guess my question is, what level of accuracy are you looking for?
Mostly I want to be able to get an "idea" of how several cache layouts vary in their performance. The old CPUCache program was a bit of "black magic" in how it did its comparisons so I'd like to make something a bit more open and accessible than that. I'd like to be able to see the effect adding an extra core has for example on an L3 cache in terms of loading for example - so then I could try to work out what size buffers were needed, and so on.
__________________
Long live aceshardware!
Reply With Quote
  #4  
Old 02-12-2010, 06:26 PM
hyc hyc is offline
640k who needs more?
 
Join Date: Nov 2009
Location: Los Angeles, CA
Posts: 741
hyc will become famous soon enough
Default

Have you tried looking at the code for valgrind's cachegrind already?

www.valgrind.org ...
Reply With Quote
  #5  
Old 02-12-2010, 06:52 PM
Lightning's Avatar
Lightning Lightning is offline
640k who needs more?
 
Join Date: Oct 2009
Location: Canberra, Australia
Posts: 774
Lightning is on a distinguished road
Default

Found it and forgotten it. Thanks for reminding me. (My mum died in the intervening couple of weeks. I'll use that as my excuse anyway - thanks Mum!)

Have you ever used it yourself?
__________________
Long live aceshardware!
Reply With Quote
  #6  
Old 02-12-2010, 08:36 PM
hyc hyc is offline
640k who needs more?
 
Join Date: Nov 2009
Location: Los Angeles, CA
Posts: 741
hyc will become famous soon enough
Default

I used to use it quite a lot, but since it only simulates a single core, and I didn't have the time to extend it to multicore myself, I haven't used it recently.
Reply With Quote
  #7  
Old 02-14-2010, 12:28 AM
rambaldi's Avatar
rambaldi rambaldi is offline
2^11
 
Join Date: Jan 2010
Location: Auckland, New Zealand
Posts: 2,435
rambaldi will become famous soon enoughrambaldi will become famous soon enough
Default

Sounds pretty interesting. I wrote a very very simple version of this a couple years ago for a uni paper (was just looking at performance of cache levels etc.) and would be pretty interested in seeing something that fully tested. Would be interested to help you out with some testing or something if you do go forward with writing your own thing.
Reply With Quote
  #8  
Old 02-15-2010, 07:31 AM
Dresdenboy's Avatar
Dresdenboy Dresdenboy is offline
8-bit overflow
 
Join Date: Oct 2009
Location: Berlin, Germany
Posts: 362
Dresdenboy will become famous soon enough
Default

Some simulators I know are:
PTLSim
Zesto-x86

PTLSim has been used to simulate AMD's ASF:
http://www.amd64.org/research/multi-...e-systems.html
Reply With Quote
  #9  
Old 02-15-2010, 08:39 AM
Rottis's Avatar
Rottis Rottis is offline
8-bit overflow
 
Join Date: Jul 2009
Posts: 491
Rottis is on a distinguished road
Default

I remember seeing great CPU cache analysis at Digit-Life.
Digit-Life is no more, but I think the tool they used was related to Rightmark.

http://www.rightmark.org/

Memory analyzer there seems to be something you might be looking for.

[Edit]
There's source code available for the project.
While it doesn't exactly generate nice 3D-charts Digit-Life had, it certainly seems to generate pieces of similar information.
If this is not what you are looking for, then it might provide good reference or base to build on.
It supports threads and aggregate cache/memory performance.

Last edited by Rottis; 02-15-2010 at 08:46 AM. Reason: Add information after going through the source
Reply With Quote
Reply

Tags
cache, cpu, simulation

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -5. The time now is 01:53 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SemiAccurate is a division of Stone Arch Networking Services, Inc. Copyright 2009 Stone Arch Networking Services, Inc, all rights reserved.