Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.
6
votes
1answer
181 views
Average movie rankings
Given a list of tuples of the form, (a, b, c), is there a more direct or optimized for calculating the average of all the c's ...
0
votes
1answer
160 views
Class for finding the median of a two-dimensional space
I have a simple static class that it's purpose is given an RDD of Point to find the median of each dimension and return that as a new ...
6
votes
2answers
9k views
Producing a sorted wordcount with Spark
I'm currently learning how to use Apache Spark. In order to do so, I implemented a simple wordcount (not really original, I know). There already exists an example on the documentation providing the ...
5
votes
1answer
1k views
Why does the LR on spark run so slowly?
Because the MLlib does not support the sparse input, I ran the following code, which supports the sparse input format, on spark clusters. The settings are:
5 nodes, each node with 8 cores (all the ...