Parallelism studies programs that execute simultaneously on multiple processors
0
votes
0answers
19 views
SIMD instructions and synchronization
I read that nowadays implementations of algorithms (in various field of application) make use of SIMD instructions as frequently as possible in order to get performance improvement. Since we deal with ...
0
votes
1answer
38 views
MapReduce Pseudocode
Consider the following pseudo code for mapreduce to find the frequency of words in a collection of documents:
...
1
vote
1answer
28 views
Why does the map function in MapReduce take two parameters?
I am playing around with MapReduce and wanted to ask why the map function takes two inputs (key, value)? Why not just pass along ...
2
votes
1answer
124 views
Which of the common sorting algorithms can be parallelized? [closed]
I want to know that whether which of the following algorithm can be parallelized?
Bubble Sort,
Insertion Sort,
Selection Sort,
Shell Sort,
Quick Sort,
Merge Sort,
Radix Sort.
Those which can't be, ...
5
votes
2answers
90 views
Why do you have to worry about cache coherence if you are using shared memory?
Wikipedia says that shared memory comes with lots of costs associated with cache coherence costs. But I thought the whole idea of shared memory is that all the CPUs access the same memory? So if one ...
-1
votes
1answer
35 views
What would be the over all speed-up achieved?
Suppose we are using a dual core processor and we run 2 applications (1 with 80% resource requirements and the other with 20% resource requirements).Now the first program is parallelizable for 40% and ...
2
votes
1answer
50 views
Reference book for Parallel Computing and Parallel algorithms.
Please suggest some books on Parallel Algorithms which teach efficient programming techniques like the ones taught in Udacity course(Introduction to Parallel Programming with CUDA). Also wanted to ...
0
votes
2answers
72 views
What is the impact of synchronisation overhead on parallel speedup? [closed]
When implementing a parallel version of an algorithm, what is the impact of synchronization delays on speedup efficiency? Does this depend on the platform used?
Is coarse-grained parallelism better ...
4
votes
1answer
58 views
$\mathbf{NC}$ is closed under logspace reductions
I am trying to solve the question 6.12 in Arora-Barak (Computational Complexity: A modern approach). The question asks you to show that the
$\mathsf{PATH}$ problem (decide whether a graph $G$ has a ...
5
votes
1answer
92 views
Largest reported speedup for a parallel computation?
A parallel implementation of a computation is usually compared to a sequential version of the same computation. The ratio of the time taken by the sequential version to the time taken by the parallel ...
2
votes
1answer
55 views
Does PETSc really give speedup?
I searched linear solver library and found out PETSc library which considered to be powerful and useful library. PETSc consists implementations of various iterative methods with preconditioners and ...
1
vote
1answer
85 views
Parallelization of an iterative function application
What I have
An iterative process based on the application of a very simple function to represent a rate, with total dependence among any iteration and its predecessor (one input parameter for (n)th ...
2
votes
1answer
135 views
Amdahl's Law and Computer Science
I would like to learn more about Amdahl's Law and other similar topics. In what branch of computer science would one place Amdahl's law? Could someone point me to a textbook or further reading ...
3
votes
1answer
87 views
Which theoretical parallel model is closest to CUDA?
Which theoretical parallel model is closest to CUDA/OpenCL programming model?
For example, it fits at some degree to the generic Parallel Random Access Machine (PRAM) model. However, that is too ...
3
votes
2answers
60 views
Parallel Computing: Past Vs Present
Parallel computing is not new but it is becoming common now a days. This is essentially driven by the need of the users (they need to process more data now) and also because of physical limitations on ...
3
votes
2answers
75 views
Best problems that are prone to parallelization?
What are some problems that are prone to parallelization? When I think about this, the first thing that comes to my mind is matrix multiplication, which yields to faster calculations, meaning you can ...
1
vote
0answers
99 views
What is Pointer Jumping ?
Studying parallel algorithms for CLRS, old edition Chapter 30. Can some one explain with a simple example what is pointer jumping and how exactly it works ?
4
votes
1answer
114 views
Parallel merge sort using hypercube connection template
I've been reading about hypercube connection template for parallel algorithms. The general scheme is explained in Designing and Building Parallel Programs by Ian Foster and it's pretty clear.
What I ...
2
votes
1answer
68 views
How to enumerate combinations in parallel
I have $n\times k$ matrix with $k<n$ and I would like to find all its $n\choose k$ submatrices which are $k\times k$ matrices that are the concatenations of all possible $k$ rows. Actually I tried ...
4
votes
1answer
128 views
Perfect shuffle in parallel processing
How is Perfect shuffle a better interconnect scheme for parallel processing? For example if we consider a problem of sum reduction, I want to understand how this scheme is useful when implementing sum ...
5
votes
1answer
128 views
Problem Solving in MapReduce Framework
I am looking for good resources which have classified and solved typical large scale data processing in MapReduce framework (like graph algorithms, statistics, numerical algorithms ...). Any help is ...
1
vote
0answers
39 views
Vectorizing and Array Notation?
I'm really trying to wrap my head around vectorization but I can't seem to understand it. I don't know if I don't understand how to vectorize, or if I don't understand the array notation that's being ...
1
vote
1answer
43 views
Studies of file access patterns in embedded systems
I am considering what elements an operating system for Network On Chip systems might have and the issue of file systems looms large. NoCs have an embedded heritage, and their domain of use (at least ...
5
votes
1answer
103 views
An online parallel algorithm for finding top K elements
Suppose we have a function on GPU which calculates elements of array from which we need to select top K elements. The number of elements can be quite big so we can't store them in memory and our ...
0
votes
1answer
118 views
A question about parallel algorithm complexity
When in a Parallel algorithm we say:
"This algorithm is done in $O(1)$ time using $O(n\log n)$ work, with $n$-exponential probability, or alternatively, in $O(\log n)$ time using $O(n)$ work, with ...
0
votes
0answers
31 views
a question about parallel algorithm complexity [duplicate]
Possible Duplicate:
A question about parallel algorithm complexity
when in a Parallel algorithm we say:
"This algorith is done in O(1) time using O(nlogn) work, with n-exponential ...
4
votes
1answer
90 views
How partitioning in map-reduce work?
Assume a map-reduce program has $m$ mappers and $n$ reducers ($m > n$). The output of each mapper is partitioned according to the key value and all records having the same key value go into the ...
0
votes
0answers
125 views
Bitonic sort on 2D hypercube of input 12
I'm not sure how I would perform a bitonic sort on a two-dimensional hypercube with 12 numbers.
Suppose I had the input 2 28 6 34 9 7 15 23 4 10 12 25.
Since it's ...
3
votes
0answers
283 views
how does the parallel radix sort work?
I'm currently learning computer science, and there is a slide of notes brief described the parallel radix sort under data parallelism.
...
3
votes
1answer
126 views
How do you go about designing a vector processor architecture for the sum of matrix products?
The following equation is a matrix expression where $B_i$ and $C_i^T$ are $n\times n$ matrices and k is a positive integer:
$$P = \sum_{i=1}^k B_i C_i^T $$
So $P = B_1 C_1^T + B_2 C_2^T + \cdots ...
2
votes
0answers
42 views
Determine interference factors in parallel computing
Parallel processes interfere with each other in many ways, by competing for shared resources such as shared caches, memory, disks, etc.
Would it be possible to determine latent factors just with a ...
9
votes
2answers
176 views
Some questions on parallel computing and the class NC
I have a number of related questions about these two topics.
First, most complexity texts only gloss over the class $\mathbb{NC}$. Is there a good resource that covers the research more in depth? ...
0
votes
1answer
55 views
Using Loop Dependence analysis for vectorization
How exactly Loop Dependence Analysis helps in vectorization ? Are there any standard rules of safety criterias for parallizing such loops ?
0
votes
1answer
100 views
What are current approaches to auto-parallelisation?
I am looking for answers which provide short overview of models and current state of research for auto-parallelisation of sequential code.
3
votes
1answer
164 views
Batch processing in increase-key function using binary heap
Is there an algorithm to perform batch processing in the increase-key operation? Let us say, a binary heap (min-heap) is used. In the normal increase-key function, if we perform increase key on one ...
3
votes
0answers
38 views
How is amorphous computing different from spatial computing?
Surely spatial computing and amorphous computing share similarities and overlap. Is spatial computing a subset of amorphous computing? How are the two different?
26
votes
4answers
1k views
What is the novelty in MapReduce?
A few years ago, MapReduce was hailed as revolution of distributed programming. There have also been critics but by and large there was an enthusiastic hype. It even got patented! [1]
The name is ...
4
votes
1answer
186 views
Getting parallel items in dependency resolution
I have implemented a topological sort based on the Wikipedia article which I'm using for dependency resolution, but it returns a linear list. What kind of algorithm can I use to find the independent ...
2
votes
1answer
239 views
Could an artificial neural network algorithm be expressed in terms of map-reduce operations?
Could an artificial neural network algorithm be expressed in terms of map-reduce operations? I am also interested more generally in methods of parallelization as applied to ANNs and their application ...
7
votes
1answer
194 views
Can joins be parallelized?
Suppose we want to join two relations on a predicate. Is this in NC?
I realize that a proof of it not being in NC would amount to a proof that $P\not=NC$, so I'd accept evidence of it being an open ...
3
votes
1answer
96 views
Explain $\log_2(n)$ squared asymptotic run-time for naive nested parallel CREW PRAM mergesort
On from Page 1 of these lecture notes it is stated in the final paragraph of the section titled CREW Mergesort:
Each such step (in a sequence of $\Theta(\log_2\ n)$ steps) takes
time ...
11
votes
2answers
224 views
How to scale down parallel complexity results to constantly many cores?
I have had problems accepting the complexity theoretic view of "efficiently solved by parallel algorithm" which is given by the class NC:
NC is the class of problems that can be solved by a ...
16
votes
3answers
3k views
Distributed vs parallel computing
I often hear people talking about parallel computing and distributed computing, but I'm under the impression that there is no clear boundary between the 2, and people tend to confuse that pretty ...
11
votes
3answers
383 views
P-Completeness and Parallel Computation
I was recently reading about algorithms for checking bisimilarity and read that the problem is P-complete. Furthermore, a consequence of this is that this problem, or any P-complete problem, is ...
13
votes
2answers
195 views
How does variance in task completion time affect makespan?
Let's say that we have a large collection of tasks $\tau_1, \tau_2, ..., \tau_n$ and a collection of identical (in terms of performance) processors $\rho_1, \rho_2, ..., \rho_m$ which operate ...
10
votes
3answers
265 views
Are today's massive parallel processing units able to run cellular automata efficiently?
I wonder whether the massively parallel computation units provided in graphic cards nowadays (one that is programmable in OpenCL, for example) are good enough to simulate 1D cellular automata (or ...
4
votes
1answer
166 views
Analyzing load balancing schemes to minimize overall execution time
Suppose that a certain parallel application uses a master-slave design to process a large number of workloads. Each workload takes some number of cycles to complete; the number of cycles any given ...