Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I have a embarrassingly parallel algorithm that'll run within a Parallel.ForEach block on my 4-core machine in about 5 minutes. When I run the same on an 8-cores machine it ran for much longer (I gave up after 10 minutes, so I don't know exactly how long). Since .NET uses a Shared-Memory architecture for this kind of thing, I'm guessing that access to main memory is creating a bottleneck.

So my question is, is there a way of making n-copies (where n is the number of available cores) of my data and assigning one copy to each core, thus removing the bottleneck?

What I oxymoronically basically want to do is something like distributed memory, but within the same machine.

UPDATE

I re-ran the code on the 8-core machine and while on my 4-core the CPU usage (via Task Manager) will max out for the duration of the run, on the 8-core machine the CPU usage ran at about 50-60% for the duration. I wonder if this is indicative of something?

UPDATE 2 Implemented MPI.NET in my program and I now get 100% CPU usage on all cores, plus I can access cores on other machines.

share|improve this question

closed as too broad by svick, Philipp Wendler, Steve Czetty, Maverick, watcher Feb 28 '14 at 20:35

There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs. If this question can be reworded to fit the rules in the help center, please edit the question.

    
It is possible to set processor / thread affinity. I would likely argue though that the problem lies in the data structure. If the CPU cannot effectively cache the data in L-n cache on the CPU itself then you will see lots of time disappearing on memory fetches and paging of L cache. Does the 8-core machine (assuming physical cores), have less cache? If they are logical cores, they might contend for L cache with each other. –  Adam Houldsworth Dec 10 '13 at 15:54
2  
I do not see evidence that memory bandwidth is the problem. Also, shared data that is read-only can be in multiple CPU caches at the same time. Duplicating it makes matters worse. IOW please post some code so that we can review it for problems. –  usr Dec 10 '13 at 16:12
1  
When it comes to performance, don't guess. Have you tried to use profiling to find out what the problem actually is? And of course, without more details, it's very hard for us to help you. –  svick Dec 10 '13 at 16:35
    
@AdamHouldsworth Is there any way I can profile memory fetches by the CPU? –  mattyB Dec 13 '13 at 14:10