Take the 2-minute tour ×
Stack Overflow is a question and answer site for professional and enthusiast programmers. It's 100% free, no registration required.

Java 8 provides java.util.Arrays.parallelSort, which sorts arrays in parallel using the fork-join framework. But there's no corresponding Collections.parallelSort for sorting lists.

I can use toArray, sort that array, and store the result back in my list, but that will temporarily increase memory usage, which if I'm using parallel sorting is already high because parallel sorting only pays off for huge lists. Instead of twice the memory (the list plus parallelSort's working memory), I'm using thrice (the list, the temporary array and parallelSort's working memory). (Arrays.parallelSort documentation says "The algorithm requires a working space no greater than the size of the original array".)

Memory usage aside, Collections.parallelSort would also be more convenient for what seems like a reasonably common operation. (I tend not to use arrays directly, so I'd certainly use it more often than Arrays.parallelSort.)

The library can test for RandomAccess to avoid trying to e.g. quicksort a linked list, so that can't a reason for a deliberate omission.

How can I sort a List in parallel without creating a temporary array?

share|improve this question
1  
All Java's sorting algorithms for List use stable sorting algorithms derived from mergesort, which do linear amounts of temporary allocation anyway. –  Louis Wasserman 55 mins ago
    
@LouisWasserman: Okay, but I'm still using thrice rather than twice the memory: the list, the toArray result and Arrays.parallelSort's working space (note that Arrays.parallelSort uses "a working space no greater than the size of the original array" -- linear temporary memory). When the lists are large (as required for parallel sorting to be a win), these constant factors begin to matter. (Also, Collections.parallelSort would be more convenient than using a temp array, too.) –  Jeffrey Bosboom 49 mins ago

1 Answer 1

Just speculating here, but I see several good reasons for generic sort algorithms preferring to work on arrays instead of List instances:

  • Element access is performed via method calls. Despite all the optimizations JIT can apply, even for a list that implements RandomAccess, this probably means a lot of overhead compared to plain array accesses which can be optimized very well.
  • Many algorithms require copying some fragments of the array to temporary structures. There are efficient methods for copying arrays or their fragments. An arbitrary List instance on the other hand, can't be easily copied. New lists would have to be allocated which poses two problems. First, this means allocating some new objects which is likely more costly than allocating arrays. Second, the algorithm would have to choose what implementation of List should be allocated for this temporary structure. There are two obvious solutions, both bad: either just choose some hard-coded implementation, e.g. ArrayList, but then it could just allocate simple arrays as well (and if we're generating arrays then it's much easier if the soiurce is also an array). Or, let the user provide some list factory object, which makes the code much more complicated.
  • Related to the previous issue: there is no obvious way of copying a list into another due to how the API is designed. The best the List interface offers is addAll() method, but this is probably not efficient for most cases (think of pre-allocating the new list to its target size vs adding elements one by one which many implementations do).
  • Most lists that need to be sorted will be small enough for another copy to not be an issue.

So probably the designers thought of CPU efficiency and code simplicity most of all, and this is easily achieved when the API accepts arrays. Some languages, e.g. Scala, have sort methods that work directly on lists, but this comes at a cost and probably is less efficient than sorting arrays in many cases (or sometimes there will probably just be a conversion to and from array performed behind the scenes).

share
    
Re point 2/3: if the algorithm needs temporary space, it can just allocate arrays. There'd be no advantage to allocating a list, as the result will be stored in the original list. If it needs an internal copy of part of the list, it can use subList(i, j).toArray(). Storing the result back into the list (if not sorted in-place) does need to set each element one-by-one though, I'll give you that. –  Jeffrey Bosboom 53 secs ago

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.