MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes
5
votes
0answers
49 views
Pyspark Solver for Tiered Board Games
I've written a Pyspark program that will completely solve a tiered board game (no loops, each game position is a member of only one tier) and writes each tier to a file. It also determines the ...
8
votes
3answers
946 views
Find Top 10 IP out of more than 5GB data
I have a few of files, and total size of them is more than 5 GB. Each line of the files is a IP address, looks like:
127.0.0.1 reset success
...
127.0.0.2 reset success
how can i find ...
3
votes
1answer
48 views
Classifying and counting database entries using Scala map and flatMap
I am new to Spark and Scala and I have solved the following problem. I have a table in database with following structure:
...
6
votes
2answers
143 views
Accepting user defined functions for custom map reduce functionality in C++
I am implementing map and reduce - style functions for processing geospatial raster datasets.
I would like the ...
4
votes
1answer
209 views
Finding the similarity between the two movies using Pearson correlation coefficient
I am trying to find the similarity between the two movies using Pearson correlation coefficient. The programs is working well for small inputs but for large inputs (like 100000 lines) it takes forever....
10
votes
1answer
14k views
Generic “reduceBy” or “groupBy + aggregate” functionality with Spark DataFrame
Maybe I totally reinvented the wheel, or maybe I've invented something new and useful. Can one of you tell me if there's a better way of doing this? Here's what I'm trying to do:
I want a generic <...
4
votes
1answer
56 views
MongoDB - Find records in one collection that match string in 2nd collection
I have a MongoDB collection of product IDs/unique products attributes and a second collection of codes that relate to attributes common for products whose IDs are prefixed with those codes.
For ...
5
votes
1answer
367 views
Weighted Probability Problem in Swift
I was asked a weighted probability question in a technical interview a few months ago that went something like this:
Given an input of colors and an integer "weight" value, randomly return a color ...
3
votes
3answers
204 views
MapReduce in Erlang
I'm doing a comparison of Erlang, Haskell, Elixir and ES6, and I'm less farmiliar with Erlang and Elixir, but I want to represent all of these languages fairly, so is this good Erlang code?
...
8
votes
2answers
518 views
Average movie rankings
Given a list of tuples of the form, (a, b, c), is there a more direct or optimized for calculating the average of all the c's ...
2
votes
1answer
160 views
Mergesort using map-reduce, multithreads, buffers and condition variables
I wrote a map reduce program which uses multi threads, bounded buffers, condition variables. It works perfectly for some types of inputs.
In the program there are N mappers, R reducers, 1 merger. ...
2
votes
1answer
180 views
CSMR for large-scale text-prcessing
I'm working on a project for large-scale text-processing, which is a first implementation of the basic idea of CSMR. CSMR is an algorithm that measures the similarity between documents by calculating ...
5
votes
2answers
267 views
Map reduce tester ported from bash to Python
My MapReduce tester is clearly ported from Shell, short of args=None for line in args or read_input(), what's a better way of ...
1
vote
1answer
135 views
MapReduce program for finding low value hashes
For class we were to make a MapReduce program in Python to find low value hashes of our name. I have completed the assignment but want to try and speed it up. The program currently takes about 45s to ...
8
votes
1answer
21k views
Calculate min, max, average, and variance on a large dataset
I got a piece of Java code using Hadoop to calculate min, max, average and variance on a large dataset made of (index value) couples separated by a newline:
...
3
votes
1answer
78 views
8
votes
1answer
906 views
Map-reduce implementation for splitting strings
I have been changing this code and I don't get to make it much better. I changed a little bit the structure, reimplemeted a new function for splitting Strings which is more efficient, etc. I have been ...