Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

I am running Ubuntu on a machine with a quad core cpu. I have written some test Java code that spawns a given number of processes that simply increment a volatile variable for a given number of iterations when run.

I would expect the running time to not increase significantly while the number of threads is less than or equal to the number of cores i.e. 4. In fact, these are the times I get (using "real time" from the UNIX time command:

1 thread: 1.005s

2 threads: 1.018s

3 threads: 1.528s

4 threads: 1.982s

5 threads: 2.479s

6 threads: 2.934s

7 threads: 3.356s

8 threads: 3.793s

This shows that adding one extra thread does not increase the time as expected, but then the time does increase with 3 and 4 threads.

At first I thought this could be because the OS was not allowing the JVM to use all the threads, but I ran top, and it clearly showed that with 3 thread, 3 cores were running at ~100%, and with 4 threads, 4 cores were maxed out.

My question is: why is the code running on 3/4 CPUs not roughly the same speed as when it runs on 1/2? Because it is running parallel on all the cores.

Here is my main method for reference:

class Example implements Runnable {

    // using this so the compiler does not optimise the computation away
    volatile int temp;

    void delay(int arg) {
        for (int i = 0; i < arg; i++) {
            for (int j = 0; j < 1000000; j++) {
                this.temp += i + j;
            }
        }
    }

    int arg;
    int result;

    Example(int arg) {
        this.arg = arg;
    }

    public void run() {
        delay(arg);
        result = 42;
    }

    public static void main(String args[]) {

    // Get the number of threads (the command line arg)

    int numThreads = 1;
    if (args.length > 0) {
        try {
            numThreads = Integer.parseInt(args[0]);
        } catch (NumberFormatException nfe) {
            System.out.println("First arg must be the number of threads!");
        }
    }

    // Start up the threads

    Thread[] threadList = new Thread[numThreads];
    Example[] exampleList = new Example[numThreads];
    for (int i = 0; i < numThreads; i++) {
        exampleList[i] = new Example(1000);
        threadList[i] = new Thread(exampleList[i]);
        threadList[i].start();
    }

    // wait for the threads to finish

    for (int i = 0; i < numThreads; i++) {
        try {
            threadList[i].join();
            System.out.println("Joined with thread, ret=" + exampleList[i].result);
        } catch (InterruptedException ie) {
            System.out.println("Caught " + ie);
        }
    }
}
share|improve this question
3  
This is an interesting question, please post your Example source. –  Andrey Chaschev 2 hours ago
 
I suppose one of those threads is running JVM as well and then there is the main thread that is spawned for running this code. –  ipinak 2 hours ago
 
Does your cpu have 4 physical cores or is it hyperthreaded for 2 physical and 2 logical cores? –  vandale 2 hours ago
 
@AndreyChaschev edited into question. –  Fractal 2 hours ago
 
@ipinak Well, the main thread must also be running, but compared to the work that the others are doing, it is pretty trivial. i.e. when I look at top when it is running just two threads, 2 cores are running at ~100% and the others are around 0-4%. –  Fractal 2 hours ago
show 6 more comments

4 Answers

Using multiple CPUs helps up to the point you saturate some underlying resource.

A common resource to over utilise is the L3 cache. This is shared across CPUs and while it allows a degree of concurrency, it doesn't scale well above to CPUs. I suggest you check what your Example code is doing and make sure they can run independently and not use any shared resources. e.g. Most chips have a limited number of FPUs.

share|improve this answer
 
Thanks for the answer. I have edited the Example code back into the question. It's not obvious to me why the issues you mention would be influencing this. Can you suggest a way of investigating further? –  Fractal 1 hour ago
 
You have received a lot of upvotes. It must be that ` for (int i = 0; i < arg; i++) { for (int j = 0; j < 1000000; j++) { this.temp += i + j; } }` uses a great deal of memory. Can you guess how many gigabytes roughly? See right answer, stackoverflow.com/a/20809194/1083704 –  Val 1 hour ago
add comment

The Core i5 in a Lenovo X1 Carbon is not a quad core processor. It's a two core processor with hyperthreading. When you're performing only trivial operations that do not result in frequent, long pipeline stalls, then the hyperthreading scheduler won't have much opportunity to weave other operations into the stalled pipeline and you won't see performance equivalent to four actual cores.

share|improve this answer
add comment

There are several things that can limit how effectively you can multi-thread an application.

  1. Saturation of a resource such as memory/bus/etc bandwidth.

  2. Locking/contention issues (for example if threads are constantly having to wait for each other to finish).

  3. Other processes running on the system.

In your case you are using a volatile integer being accessed by all of the threads, that means that the threads are constantly having to send the new value of that integer between themselves. This will cause some level of contention and memory/bandwidth usage.

Try switching each thread to be working on its own chunk of data with no volatile variable. That should reduce all forms of contention.

share|improve this answer
add comment

There are two good answer to you already, both are fine to explain what is happening.

Look to you processor, most part of the "quad core" from intel is in fact a dual core, that simulates a quad core do OS (yes, they tell you that you have 4 core, but you have just 2 in fact...). This is the better explanation to your problem, because the time increments as a dual core processor.

If you have a real 4 core, the another answer is that you code have some concurrency.

share
add comment

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.