Tell me more ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I've run into a minor HPC problem after running some tests on a 80core (160HT) nehalem architecture with 2Tb DRAM:

A server with more than 2 sockets starts to stall a lot (delay) as each thread starts to request information about objects on the "wrong" socket, i.e. requests goes from a thread that is working on some objects on the one socket to pull information that is actually in the DRAM on the other socket.

The cores appear 100% utilized, even though I know that they are waiting for the remote socket to return the request.

As most of the code runs asynchronously it is a lot easier to rewrite the code so I can just parse messages from the threads on the one socket to threads the other (no locked waiting). In addition I want to lock each threads to memory pools, so I can update objects instead of wasting time (~30%) on the garbage collector.

Hence the question:

How to pin threads to cores with predetermined memory pool objects in Python?

A little more context:

Python has no problem running multicore when you put ZeroMQ in the middle and make an art out of passing messages between the memory pool managed by each ZMQworker. At ZMQ's 8M msg/second it the internal update of the objects take longer than the pipeline can be filled. This is all described here: http://zguide.zeromq.org/page:all#Chapter-Sockets-and-Patterns

So, with a little over-simplification, I spawn 80 ZMQworkerprocesses and 1 ZMQrouter and load the context with a large swarm of objects (584 million objects actually). From this "start-point" the objects need to interact to complete the computation.

This is the idea:

  • If "object X" needs to interact with "Object Y" and is available in the local memory pool of the python-thread, then the interaction should be done directly.
  • If "Object Y" is NOT available in the same pool, then I want it to send a message through the ZMQrouter and let the router return a response at some later point in time. My architecture is non-blocking so what goes on in the particular python thread just continues without waiting for the zmqRouters response. Even for objects on the same socket but on a different core, I would prefer NOT to interact, as I prefer having clean message exchanges instead of having 2 threads manipulating the same memory object.

To do this I need to know:

  1. how to figure out which socket a given python process (thread) runs on.
  2. how assign a memory pool on that particular socket to the python process (some malloc limit or similar so that the sum of memory pools do not push the memory pool from one socket to another)
  3. Things I haven't thought of.

But I cannot find references in the python docs on how to do this and on google I must be searching for the wrong thing.

share|improve this question
2  
Can you explain your question with a bit more context? I would naively answer that a Python process cannot run multicore, so you must be talking about 80 (or 160) independent processes here. Pinning them to specific cores can be acheived e.g. with taskset, on Linux (see man tasklet). – Armin Rigo Aug 5 at 14:24
zeromq permits that you drop workload out on all your hyper threads, but that doesn't automatically mean that the memory objects that are created stay with each thread. – BHM Aug 5 at 18:11

This question has an open bounty worth +50 reputation from BHM ending tomorrow.

This question has not received enough attention.

20 views only? Come on - more people must have this problem...

1 Answer

Just wondering if this might not be amenable to the use of python remote objects - this might be worth investigation but unfortunately I do not have access to such hardware.

As explained in the documentation while pyro is often used to distribute work across multiple machines on a network it can also be used to share processing between cores on a single machine.

On a lower level Pyro is just a form of inter-process communication. So everywhere you would otherwise have used a more primitive form of IPC (such as plain TCP/IP sockets) between Python components, you could consider to use Pyro instead.

While pyro may add some overhead it may well speed things up and should make things more maintainable.

share|improve this answer
Can you explain your idea with a little more detail? – BHM 2 days ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.