MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes

learn more… | top users | synonyms

5
votes
0answers
49 views

Pyspark Solver for Tiered Board Games

I've written a Pyspark program that will completely solve a tiered board game (no loops, each game position is a member of only one tier) and writes each tier to a file. It also determines the ...
8
votes
3answers
946 views

Find Top 10 IP out of more than 5GB data

I have a few of files, and total size of them is more than 5 GB. Each line of the files is a IP address, looks like: 127.0.0.1 reset success ... 127.0.0.2 reset success how can i find ...
3
votes
1answer
48 views

Classifying and counting database entries using Scala map and flatMap

I am new to Spark and Scala and I have solved the following problem. I have a table in database with following structure: ...
6
votes
2answers
143 views

Accepting user defined functions for custom map reduce functionality in C++

I am implementing map and reduce - style functions for processing geospatial raster datasets. I would like the ...
4
votes
1answer
209 views

Finding the similarity between the two movies using Pearson correlation coefficient

I am trying to find the similarity between the two movies using Pearson correlation coefficient. The programs is working well for small inputs but for large inputs (like 100000 lines) it takes forever....
10
votes
1answer
14k views

Generic “reduceBy” or “groupBy + aggregate” functionality with Spark DataFrame

Maybe I totally reinvented the wheel, or maybe I've invented something new and useful. Can one of you tell me if there's a better way of doing this? Here's what I'm trying to do: I want a generic <...
4
votes
1answer
56 views

MongoDB - Find records in one collection that match string in 2nd collection

I have a MongoDB collection of product IDs/unique products attributes and a second collection of codes that relate to attributes common for products whose IDs are prefixed with those codes. For ...
5
votes
1answer
367 views

Weighted Probability Problem in Swift

I was asked a weighted probability question in a technical interview a few months ago that went something like this: Given an input of colors and an integer "weight" value, randomly return a color ...
3
votes
3answers
204 views

MapReduce in Erlang

I'm doing a comparison of Erlang, Haskell, Elixir and ES6, and I'm less farmiliar with Erlang and Elixir, but I want to represent all of these languages fairly, so is this good Erlang code? ...
8
votes
2answers
518 views

Average movie rankings

Given a list of tuples of the form, (a, b, c), is there a more direct or optimized for calculating the average of all the c's ...
2
votes
1answer
160 views

Mergesort using map-reduce, multithreads, buffers and condition variables

I wrote a map reduce program which uses multi threads, bounded buffers, condition variables. It works perfectly for some types of inputs. In the program there are N mappers, R reducers, 1 merger. ...
2
votes
1answer
180 views

CSMR for large-scale text-prcessing

I'm working on a project for large-scale text-processing, which is a first implementation of the basic idea of CSMR. CSMR is an algorithm that measures the similarity between documents by calculating ...
5
votes
2answers
267 views

Map reduce tester ported from bash to Python

My MapReduce tester is clearly ported from Shell, short of args=None for line in args or read_input(), what's a better way of ...
1
vote
1answer
135 views

MapReduce program for finding low value hashes

For class we were to make a MapReduce program in Python to find low value hashes of our name. I have completed the assignment but want to try and speed it up. The program currently takes about 45s to ...
8
votes
1answer
21k views

Calculate min, max, average, and variance on a large dataset

I got a piece of Java code using Hadoop to calculate min, max, average and variance on a large dataset made of (index value) couples separated by a newline: ...
3
votes
1answer
78 views

Customer MapReduce implementation

I would love a second opinion / another pair of eyes on this. ...
8
votes
1answer
906 views

Map-reduce implementation for splitting strings

I have been changing this code and I don't get to make it much better. I changed a little bit the structure, reimplemeted a new function for splitting Strings which is more efficient, etc. I have been ...