Newest 'hadoop elastic-map-reduce' Questions

0

votes

0answers

9 views

Elastic MapReduce hangs on step because of upload to S3; is this CombineFileInputFormat's fault?

Often when a step (job) has long been complete, Elastic MapReduce with S3 intermediates will hang on 1 or 2 tasks, presumably because data are being uploaded to S3. This hanging can take considerable ...

asked Aug 12 at 16:37

verve
47110

3

votes

2answers

71 views

Spark/Hadoop throws exception for large LZO files

I'm running an EMR Spark job on some LZO-compressed log-files stored in S3. There are several logfiles stored in the same folder, e.g.: ... s3://mylogfiles/2014-08-11-00111.lzo ...

hadoop apache-spark elastic-map-reduce lzo

asked Aug 11 at 16:37

Pimin Konstantin Kefaloukos
4861617

1

vote

1answer

27 views

Number of concurrently running mappers per node drops precipitously on Elastic MapReduce w/ AMI 3.1.0 and Hadoop 2.4.0 as cluster size increases

In a related question (How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce), I ask for formulas relating the number of concurrently running ...

hadoop amazon-web-services amazon-ec2 elastic-map-reduce yarn

asked Aug 10 at 13:31

verve
47110

0

votes

1answer

19 views

Process entire files using Hadoop streaming on Amazon EMR

I have a directory full of gzipped text files on Amazon S3, and I'm trying to use Hadoop streaming on Amazon Elastic MapReduce to apply a function to each file individually (specifically, parse a ...

hadoop amazon-web-services amazon-s3 hadoop-streaming elastic-map-reduce

asked Aug 8 at 20:39

user3923714
1

0

votes

1answer

28 views

Running Simple Hadoop Command using Java code

I would like to list files using hadoop command. "hadoop fs -ls filepath". I want to write a Java code to achieve this. Can I write a small piece of java code, make a jar of it and supply it to Map ...

hadoop mapreduce elastic-map-reduce amazon-emr

asked Aug 4 at 15:22

user1879956
1

0

votes

0answers

17 views

Amazon EMR:How to copy logs from S3 and store it in two different locations inside HDFS

I want to copy logs from S3 into HDFS and store them in two different locations. I am doing this. $EMR_BIN/elastic-mapreduce --jobflow $JOBFLOW --jar /home/hadoop/AmazonDistCp-1.1.jar \ --main-class ...

hadoop amazon-web-services amazon-s3 elastic-map-reduce amazon-emr

asked Jul 31 at 7:21

user2890683
547

2

votes

0answers

27 views

Does Hadoop Streaming's performance decrease if I use -mapper cat rather than -mapper org.apache.hadoop.mapred.lib.IdentityMapper?

I have had problems trying to use org.apache.hadoop.mapred.lib.IdentityMapper as the argument of -mapper in Hadoop Streaming 1.0.3. "cat" works though; does using cat affect performance -- especially ...

hadoop hadoop-streaming elastic-map-reduce

asked Jul 24 at 17:31

verve
47110

0

votes

1answer

31 views

Unable to load Hive-JDBC driver when accessed through MapReduce program on Amazon's Elastic MapReduce

I have written a MapReduce program in which I am storing some part of output data into Hive table. I have used Hive-JDBC driver to access Hive table via MapReduce code. This program has compiled ...

hadoop jdbc mapreduce hive elastic-map-reduce

asked Jul 16 at 11:33

user3523860
62

0

votes

0answers

13 views

Issue with using files in distributed cache in Elastic MapReduce

I'm trying to make use of an external library in my Python mapper script in an AWS Elastic MapReduce job. However, my script doesn't seem to be able to find the modules in the cache. I archived the ...

python hadoop amazon-web-services elastic-map-reduce

asked Jul 10 at 5:09

user296554
11

0

votes

0answers

18 views

ChainMappers Hadoop Parallism

I am trying yo use ChainedMappers, I have some doubt regarding the usage: Mapper1 (M1) -> M1(K1, V1) Now M1, does some processing & emits 3 key value pairs (K2, V2),(K3, V3),(K4, V4) Mapper2 ...

hadoop mapreduce elastic-map-reduce mappers

asked Jul 8 at 15:49

Water
457

0

votes

1answer

37 views

R Reducer is not working properly in Amazon EMR

I have done a map reduce code in R to run in Amazon EMR. My input file format: URL1 word1 word2 word3 URL2 word4 word2 word3 URL3 word1 word7 word2 I'm expecting the output as: URLs are concat ...

r hadoop mapreduce elastic-map-reduce emr

asked Jun 26 at 3:26

Nadaraj
486

0

votes

1answer

54 views

Map Error- Attempy_xxxx_ Timed out after 600 seconds

I'm using Hadoop 2.2.0 and in when I run my map tasks I get the following error attempt_xxx Timed out after 1800000 seconds (its 1800000 because I have changed the config for ...

hadoop map mapreduce timeout elastic-map-reduce

asked Jun 17 at 8:03

user3690321
165

0

votes

0answers

18 views

Cannot Start & Process MapReduce Job

I created a custom EMR cluster with 1 master node and 3 core nodes (with 0 task nodes), all of them of m1.large configuration. I made a sample MapReduce program to analyze TCPDump data on my Eclipse ...

hadoop mapreduce cluster-computing elastic-map-reduce

asked Jun 15 at 16:05

Anup Saumithri
183114

0

votes

2answers

34 views

How to read a file from s3 in EMR?

I would like to read a file from S3 in my EMR Hadoop job. I am using the Custom JAR option. I have tried two solutions: org.apache.hadoop.fs.S3FileSystem: throws a NullPointerException. ...

java hadoop amazon-s3 elastic-map-reduce

asked Jun 12 at 12:43

David Nemeskey
183

0

votes

1answer

60 views

Hadoop on EMR - Map Tasks Not Parallel

I've set up an EMR job through Data Pipeline in AWS. This job is to transfer CSV data from S3 to DynamoDB. My data size is 400 MB. I set mapred.max.split.size = 134217728 (i.e. 128 MB). With that, ...

hadoop elastic-map-reduce

asked Jun 10 at 16:40

Mouli
33918

0

votes

0answers

14 views

How do I use FileOutputCommitter from Java in hadoop

this is the beginning of my code: public class LogParserMapReduce extends Configured implements Tool { @Override public int run(String[] args) throws Exception { Configuration conf = ...

hadoop elastic-map-reduce

asked Jun 1 at 11:18

Gavriel
3,45631731

1

vote

1answer

124 views

“Unable to verify integrity of data” while running MR job

I'm running a relatively big MR job using Amazon Elastic Map Reduce. I ran the job plenty of times on small data sets with no problem. But when trying to run it on a large dataset I'm getting the ...

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked May 24 at 18:51

itzhaki
198110

1

vote

1answer

24 views

How is data distributed among datanodes in MapReduce?

I'm new to MapReduce, I'm having the task to process large data(lines of records). One thing I should use is the line number of specific record in my mapper, and then reducer process the line number ...

hadoop mapreduce elastic-map-reduce

asked May 22 at 16:20

i3wangyi
5416

0

votes

3answers

69 views

How is data partitioned and distributed among datanodes in MapReduce?

I'm new to MapReduce, I'm having the task to process large data(lines of records). One thing I should use is the line number of specific record in my mapper, and then reducer process the line number ...

python hadoop mapreduce elastic-map-reduce

asked May 22 at 2:24

i3wangyi
5416

1

vote

1answer

53 views

Copying a large file (~6 GB) from S3 to every node of an Elastic MapReduce cluster

Turns out that copying a large file (~6 GB) from S3 to every node in an Elastic MapReduce cluster in a bootstrap action doesn't scale well; the pipe is only so big, and downloads to the nodes get ...

caching hadoop amazon-web-services amazon-s3 elastic-map-reduce

asked May 21 at 18:19

verve
47110

1

vote

0answers

84 views

EMR hadoop tasks agonize for hours when losing task nodes

I've set up an Amazon EMR jobflow with 1 on-demand core node and 4 task nodes with bidding. When I run my task on only the core node each step finishes within 1 hour. When I'm lucky and have 1 core + ...

hadoop elastic-map-reduce emr

asked May 21 at 14:51

Gavriel
3,45631731

0

votes

1answer

32 views

How to bid for a spot instance with price: 0.0164

I looked at the charts of last week's EC2 prices for m1.large in us-east-1c, and I saw prices like: 0.0160, 0.0161, 0.0162, 0.0163 so clearly there must be a way to bid for prices like this, but when ...

hadoop amazon-ec2 amazon elastic-map-reduce

asked Apr 29 at 13:41

Gavriel
3,45631731

1

vote

1answer

53 views

How to find the right portion between hadoop instance types

I am trying to find out how many MASTER, CORE, TASK instances are optimal to my jobs. I couldn't find any tutorial that explains how do I figure it out. How do I know if I need more than 1 core ...

hadoop elastic-map-reduce instancetype

asked Apr 29 at 9:29

Gavriel
3,45631731

1

vote

1answer

107 views

How can I turn off hadoop speculative execution from Java

After reading Hadoop speculative task execution I am trying to turn off speculative execution using the new Java api, but it has no effect. This is my Main class: public class Main { public ...

java hadoop elastic-map-reduce speculative-execution

asked Apr 24 at 10:21

Gavriel
3,45631731

2

votes

1answer

294 views

Hadoop failure copying input bz2 file from s3

I have a map-only hadoop job, running on Amazon's EMR, running on the latest ami-version: 3.0.4. Once in a while I get exceptions like this: Error: com.amazonaws.AmazonClientException: Unable to ...

hadoop amazon elastic-map-reduce bzip2

asked Apr 23 at 23:30

Gavriel
3,45631731

2

votes

2answers

224 views

Oozie on EMR - tasks hang forever in PREP state

I am running Oozie 4.0.1 on Elastic Mapreduce using the 3.0.4 AMI (Hadoop 2.2.0). I've built Oozie from source, and everything installs and seems to work correctly, up to the point of scheduling a ...

java hadoop hive elastic-map-reduce oozie

asked Apr 18 at 23:24

mindcrime
259113

0

votes

1answer

326 views

FAILED: NullPointerException null in HIVE QUERY

Following is the HIVE query I am using, I am also using a Ranking function. I am running this on my local machine. SELECT numeric_id, location, Rank(location), followers_count FROM ( SELECT ...

hadoop mapreduce hive elastic-map-reduce hiveql

asked Apr 17 at 4:18

pratikgala
75113

0

votes

0answers

13 views

Is there a way to get information from the jobflow and steps inside a step

Is there a way to know from a shell script that is running in elastic mapreduce (from script-runner.jar) whether there are following steps or it is the last step?

hadoop elastic-map-reduce

asked Apr 12 at 20:32

Gavriel
3,45631731

0

votes

1answer

77 views

Running Mappers and Reducers on different Groups of machines

We have a nice, big, complicated elastic-mapreduce job that has wildly different constraints on hardware for the Mapper vs Collector vs Reducer. The issue is: for the Mappers, we need tonnes of ...

hadoop amazon-web-services elastic-map-reduce mapper reducers

asked Apr 12 at 2:08

David Beveridge
747

2

votes

1answer

51 views

How to know job flow id, other cluster parameters in script running via script-runner.jar

I'm starting an elastic mapreduce cluster with the following command-line: $ elastic-mapreduce \ --create \ --num-instances "${INSTANCES}" \ --instance-type m1.medium \ --ami-version 3.0.4 \ --name ...

hadoop elastic-map-reduce

asked Apr 8 at 10:39

Gavriel
3,45631731

0

votes

1answer

119 views

BZip2 Native Splitting on Amazon/EMR

We have a question in specific regard to compressed input on an Amazon EMR Hadoop job. According to AWS: "Hadoop checks the file extension to detect compressed files. The compression types ...

hadoop amazon-s3 elastic-map-reduce bzip2

asked Apr 2 at 20:30

David Beveridge
747

0

votes

0answers

14 views

Custom Grouping and Partitioning in Job Conf

AWS Job not accepting the configuration parameters for Custom Grouping and Custom Sorting. conf3.setOutputValueGroupingComparator(StockKeyGroupingComparator.class); ...

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked Apr 2 at 14:56

Mahalakshmi Lakshminarayanan
304220

0

votes

0answers

88 views

Is there an open source version of s3distcp?

I would love to use s3distcp for copying data from S3 buckets to S3 buckets but I have the need to use an external proprietary encryption mechanism to ensure the data is encrypted at rest (keeping the ...

hadoop amazon-web-services amazon-s3 elastic-map-reduce

asked Mar 31 at 17:09

kellyfj
4052925

0

votes

1answer

40 views

setting ssh permission in hadoop installation

I'm trying to install hadoop for the first time and I'm following this tutorial http://www.youtube.com/watch?v=xrxQXfE7t9A & https://sites.google.com/site/howtohadoop/how-to-install-hdp#bmec2 ...

hadoop ssh elastic-map-reduce

asked Mar 16 at 8:31

Dhoha
458

1

vote

2answers

99 views

How to implement the combiner in Hadoop MapReduce?

I understand that for including a combiner in Hadoop MapReduce the following line is included (which I have done already); conf.setCombinerClass(MyReducer.class); What I don't understand is that ...

java hadoop mapreduce elastic-map-reduce

asked Mar 13 at 12:58

ali
83

0

votes

0answers

72 views

Unable to parse credentials.json

I have been trying to run Amazon's Elastic MapReduce command line interface, and I have gotten to the point of validating the install. I created my .json file per the instructions, but for some ...

ruby json hadoop amazon-web-services elastic-map-reduce

asked Mar 6 at 1:50

Raesu
1

0

votes

0answers

106 views

How do I convert my Java Hadoop code to run on EC2?

I wrote a Driver, Mapper, and Reducer class in Java that runs the k-nearest neighbor algorithm on test data, and pulls in the training set using Distributed Cache. I used a Cloudera virtual machine ...

hadoop amazon-web-services amazon-ec2 elastic-map-reduce

asked Mar 3 at 23:20

user1956609
30019

0

votes

1answer

283 views

Trouble using hbase from java on Amazon EMR

So Im trying to query my hbase cluster on Amazon ec2 using a custom jar i launch as a MapReduce step. Im my jar (inside the map function) I call Hbase as so: public void map( Text key, BytesWritable ...

hadoop amazon-web-services hbase zookeeper elastic-map-reduce

asked Feb 28 at 20:22

frankie liuzzi
39518

0

votes

1answer

507 views

Class not found exception in eclipse wordcount program

I am running a word count program from eclipse, it says class not found. I exported same program as jar file and executed from command line, it's working fine. Here is the error stack trace ...

hadoop mapreduce elastic-map-reduce

asked Feb 15 at 4:53

venu
375

0

votes

1answer

118 views

Writing to a file in S3 from jar on EMR on AWS

Is there any way in which I can write to a file from my Java jar to an S3 folder where my reduce files would be written ? I have tried something like: FileSystem fs = FileSystem.get(conf); ...

hadoop amazon-web-services amazon-s3 mapreduce elastic-map-reduce

asked Feb 13 at 17:28

hitrix
158

0

votes

1answer

86 views

outputing custom csv header in reducer of map reduce

I am creating my own reducer as follows: public class MyReducer implemts Reducer<K1,V1,K2,V2>{ @override public void configure(JobConf conf){ } @override public void close(JobConf ...

java hadoop elastic-map-reduce

asked Jan 31 at 11:58

user93796
1,221103470

3

votes

0answers

200 views

elastic map reduce timing out java.io.IOException: Unexpected end of stream

I am running MAP reduce job (Elastic map reduce EMR ) service.The job works fine for small data set but gives following exceptions for large data set (File size 400MB) Running another job with same ...

java hadoop elastic-map-reduce

asked Jan 30 at 11:59

user93796
1,221103470

0

votes

2answers

330 views

cannot ssh into Elastic MapReduce

I'm using elastic-mapreduce to spun new clusters from the command line. After reading this tutorial, I have: elastic-mapreduce --create --alive \ --instance-type m1.xlarge\ --num-instances 5 \ ...

hadoop amazon-web-services ssh amazon-ec2 elastic-map-reduce

asked Jan 30 at 2:01

philippe
1,96921657

3

votes

1answer

426 views

Error: java.io.IOException: wrong value class: class org.apache.hadoop.io.Text is not class Myclass

I have my mapper and reducers as follows.But i am getting some kind of strange exception. I cant figure out why is it throwing such kind of exception. public static class MyMapper implements ...

java hadoop elastic-map-reduce

asked Jan 27 at 9:20

user93796
1,221103470

0

votes

0answers

26 views

How to set a custom file name for the leaf files in an elastic map reduce job ?

I am running elastic map reduce jobs. The output files generated by the reducer has names like part-0000. I would rather have these names as "mykey-001". Is this possible with EMR ?

hadoop mapreduce elastic-map-reduce

asked Jan 7 at 23:45

Akshar Prabhu Desai
1,10631739

1

vote

0answers

93 views

Error: user not authorized to perform: iam:GetInstanceProfile

When trying to create "Interactive Cluster" using , ruby elastic-mapreduce --create --alive --name "Interactive Cluster" --num-instances=1 --master-instance-type=m1.large --hive-interactive I get ...

hadoop elastic-map-reduce

asked Dec 21 '13 at 18:08

user1523292
435

0

votes

0answers

41 views

output format in AWS EMR

I'm running a mapreduce program in AWS EMR, which is similar to the word count example of AWS. The output of this is not well formatted, meaning it is not one item per line, nor is there proper ...

hadoop amazon-web-services elastic-map-reduce

asked Dec 18 '13 at 21:46

user3116978
1

1

vote

2answers

409 views

Combine output files of MapReduce job

I have written a Mapper and Reducer in Python and have executed it successfully on Amazon's Elastic MapReduce(EMR) using Hadoop Streaming. The final result folder contains the output in three ...

python hadoop mapreduce hadoop-streaming elastic-map-reduce

asked Dec 14 '13 at 8:21

Arun Kumar
386

3

votes

1answer

114 views

MapR client not executing hadoop - Windows

I have an Amazon Windows VM where i did install MapR-Client 2.1.2, and another MapR cluster waiting for the jobs to be executed. I set up MAPR_HOME in C:\opt\mapr, and when I execute hadoop fs -ls / ...

hadoop elastic-map-reduce mapr

asked Dec 13 '13 at 1:38

philippe
1,96921657

0

votes

2answers

253 views

Mapper and Reducer in Hadoop

I have a confusion about the implementation of Hadoop. I notice that when I run my Hadoop MapReduce job with multiple mappers and reducers, I would get many part-xxxxx files. Meanwhile, it is true ...

hadoop cloud elastic-map-reduce

asked Nov 22 '13 at 4:25

Bill Liu
283

your communities

Tagged Questions

Related Tags