Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

learn more… | top users | synonyms (1)

6
votes
1answer
181 views

Average movie rankings

Given a list of tuples of the form, (a, b, c), is there a more direct or optimized for calculating the average of all the c's ...
0
votes
1answer
160 views

Class for finding the median of a two-dimensional space

I have a simple static class that it's purpose is given an RDD of Point to find the median of each dimension and return that as a new ...
6
votes
2answers
9k views

Producing a sorted wordcount with Spark

I'm currently learning how to use Apache Spark. In order to do so, I implemented a simple wordcount (not really original, I know). There already exists an example on the documentation providing the ...
5
votes
1answer
1k views

Why does the LR on spark run so slowly?

Because the MLlib does not support the sparse input, I ran the following code, which supports the sparse input format, on spark clusters. The settings are: 5 nodes, each node with 8 cores (all the ...