Unanswered 'hadoop' Questions

9

votes

0answers

869 views

Distributed local clustering coefficient algorithm (MapReduce/Hadoop)

I have implemented MapReduce paradigm based local clustering coefficient algorithm. However I have run into serious troubles for bigger datasets or specific datasets (high average degree of a node). I ...

asked Jun 10 '12 at 10:51

alien01
3671316

7

votes

0answers

147 views

Hadoop streaming jobs SUCCEEDED but killed by ApplicationMaster

I just finished setting up a small hadoop cluster (using 3 ubuntu machines and apache hadoop 2.2.0) and am now trying to run python streaming jobs. Running a test job I encounter the following ...

python hadoop

asked Jun 2 at 11:39

GebitsGerbils
362

6

votes

0answers

171 views

Hiveserver2 cannot fetch result of a query from remote connection

Hi I am facing a problem while trying to fetch data from a remote hadoop cluster using hiveserver2. The JDBC connection is working in the sense that meta data queries such as show tables is working ...

hadoop jdbc mapreduce hive

asked Apr 17 at 11:26

user3223391
314

5

votes

0answers

192 views

Cascalog Hadoop version support

I notice that the Cascalog getting started guide specifies a version of Hadoop :profiles { :dev {:dependencies [[org.apache.hadoop/hadoop-core "1.0.3"]]}} If my group uses a different version of ...

hadoop clojure cascalog

asked Sep 6 '13 at 16:25

MRocklin
2,8451226

5

votes

0answers

618 views

Write data that can be read by ProtobufPigLoader from Elephant Bird

For a project of mine, I want to analyse around 2 TB of Protobuf objects. I want to consume these objects in a Pig Script via the "elephant bird" library. However it is not totally clear to my how to ...

hadoop apache-pig elephantbird

asked Feb 13 '12 at 17:30

dmeister
11.5k74273

4

votes

0answers

196 views

Garbage Collection duration in Hadoop CDH5

We have a four-datanodes-cluster running CDH5.0.2, installed through Cloudera Manager parcels. In order to import 13M users' rows into HBase, we wrote a simple Python script and used hadoop-streaming ...

java hadoop garbage-collection hbase cloudera-cdh

asked Jul 3 at 14:27

mennato
27329

4

votes

0answers

146 views

Programmatically determine Field names of Scalding/Cascading Pipe

I'm using Scalding to process records with many (> 22) fields. At the end of the process, I'd like to write out the final Pipe's field names to a file. I know this is possible as Mapper and Reducer ...

java scala hadoop cascading scalding

asked Mar 5 at 1:11

Ben Sidhom
377211

4

votes

0answers

424 views

How to bundle many files in S3 using Spark

I have 20 million files in S3 spanning roughly 8000 days. The files are organized by timestamps in UTC, like this: s3://mybucket/path/txt/YYYY/MM/DD/filename.txt.gz. Each file is UTF-8 text ...

scala hadoop amazon-s3 spark-cluster-framework

asked Feb 17 at 1:29

Pierre D
872823

4

votes

0answers

233 views

Incremental MapReduce implementations (other than CouchDB, preferably)

I work on a project that sits on a large-ish pile of raw data, aggregates from which are used to power a public-facing informational site (some simple aggregates like various totals and top-tens of ...

hadoop mapreduce aggregation

asked Jun 18 '13 at 21:45

Andrew Pendleton
376111

4

votes

0answers

3k views

Hadoop Streaming : Chaining Jobs

This is a documentation on how to chain two or more streaming jobs, using Hadoop Streaming( currently 1.0.3) only and nothing more. In order to understand the final code that will do the chaining and ...

hadoop streaming chaining

asked Oct 13 '12 at 11:18

vpap
12529

4

votes

0answers

650 views

Error running Hadoop pipes Program: “Server failed to authenticate”

While trying to run a C++ program referring this ( link ) on my hadoop cluster. I got the error mentioned below. I referred related posts (this) regarding this error, and tried tweaking my Makefile, ...

c++ hadoop pipes

asked Nov 25 '11 at 12:30

dipeshtech
879

3

votes

0answers

57 views

What is an efficient way of running a logistic regression for large data sets (200 million by 2 variables)?

I currently am trying to run a logistic regression model. My data has two variables, one response variable and one predictor variable. The catch is that I have 200 million observations. I am trying to ...

python r matlab hadoop stata

asked yesterday

user1398057
524

3

votes

0answers

28 views

How to set number of reducer dynamically based on my mapper output size?

I know that the number of mapper can be set based on my dfs split size by setting mapred.min.split.size to dfs.block.size. Similary how can set I the number of reducers based on my mapper output ...

hadoop mapreduce distributed

asked Jul 19 at 2:28

Makubex
184

3

votes

0answers

78 views

Orchestration of Apache Spark using Apache Oozie

We are thinking of the integration of apache spark in our calculation process where we at first wanted to use apache oozie and standard MR or MO (Map-Only) jobs. After some research several ...

hadoop bigdata apache-spark oozie

asked Jul 14 at 14:44

Matthias Kricke
685414

3

votes

0answers

413 views

ssh closes connection immediately after login

I was trying to set up hadoop in pseudo distributed mode in fedora 20. I generated the required public keys and copied to authorized_keys. Now ssh localhost logs in without the password but it ...

hadoop ssh fedora sshd

asked Apr 29 at 7:12

pokharel
161

3

votes

0answers

77 views

How can I using Pig scripts to generate nested Avro field?

I am new to Pig, My input data is in the format as Record1: { label:int, id: long }, Record 2: { ... } ... And what I want as output is to get Record 1: { data:{ label:int, id:long ...

hadoop apache-pig piglatin

asked Mar 14 at 22:32

Yitong Zhou
441624

3

votes

0answers

295 views

Hadoop Hive: How to allow regular user continuously write data and create tables in warehouse directory?

I am running Hadoop 2.2.0.2.0.6.0-101 on a single node. I am trying to run Java MRD program that writes data to an existing Hive table from Eclipse under regular user. I get exception: ...

hadoop permissions hive data-warehouse

asked Mar 11 at 9:11

Anton Ashanin
394417

3

votes

0answers

150 views

Create Custom InputFormat of ColumnFamilyInputFormat for cassandra

I am working on a project, using cassandra 1.2, hadoop 1.2 I have created my normal cassandra mapper and reducer, but I want to create my own Input format class, which will read the records from ...

java hadoop mapreduce cassandra

asked Feb 23 at 10:43

Ashish Ratan
943419

3

votes

0answers

187 views

elastic map reduce timing out java.io.IOException: Unexpected end of stream

I am running MAP reduce job (Elastic map reduce EMR ) service.The job works fine for small data set but gives following exceptions for large data set (File size 400MB) Running another job with same ...

java hadoop elastic-map-reduce

asked Jan 30 at 11:59

user93796
1,674103468

3

votes

0answers

670 views

Logistic Regression\SVM implementation in Mahout

I am currently working on sentimental analysis of twitter data for one of telecom company data.I am loading the data into HDFS and using Mahout's Naive Bayes Classifier for predicting the sentiments ...

hadoop machine-learning svm mahout logistic-regression

asked Jan 28 at 13:22

Deepesh Shetty
451723

3

votes

0answers

316 views

Exact steps to kill Hadoop 2.2.0 Configuration deprecation info messages

This question is similar to Hadoop 2.2.0 Configuration deprecation, but the answers to that question did not resolve the issue, so I am asking for specific steps in this question, and providing a ...

java hadoop hdfs

asked Jan 20 at 2:53

merlin2011
19.3k31752

3

votes

0answers

861 views

R-rmr2 PipeMapRed.waitOutputThreads(): subprocess failed with code 2

I am running a rmr2 example from here, this is the code i tried : Sys.setenv(HADOOP_HOME="/home/istvan/hadoop") Sys.setenv(HADOOP_CMD="/home/istvan/hadoop/bin/hadoop") library(rmr2) library(rhdfs) ...

java r apache hadoop hdfs

asked Dec 16 '13 at 2:40

Naveen
104210

3

votes

0answers

84 views

How to add aspects to hadoop 2.2

I am on Linux, and I don't see the a jar file for aspectj, so I am curious how do I add aspects to yarn. Ideally I would like to just use the Fault Injection Framework ...

java hadoop aspectj yarn

asked Dec 2 '13 at 14:22

James Black
26.1k33785

3

votes

0answers

230 views

hdinsight new hiveconnection not working

Im using the hdinsight hadoop locally and after successfully running mapreduce jobs on the hdfs i am trying with hive, unfortunately i am getting errors when running the hive query when creating a ...

c# hadoop hive nullreferenceexception hdinsight

asked Oct 28 '13 at 9:32

richard savage
162

3

votes

0answers

64 views

How to ensure I do not run into LeaseExpiredException

Right after my job is finished running I have a program going through to upload files into S3 in chunks. I have to do some processing which is why I didn't write directly into S3. I used ...

java hadoop

asked Aug 31 '13 at 22:47

Julian
206111

3

votes

0answers

269 views

“Starting flush of map output” takes very long time in hadoop map task

I execute a map task on a small file (3-4 MB), but map output is relatively large (150 MB). After showing Map 100%, it takes long time to finish the spill. Please suggest how can I reduce this period. ...

hadoop map flush

asked Jul 10 '13 at 9:37

user2185422
162

3

votes

0answers

325 views

Scaling up Cassandra and Mahout with Hadoop

Is it possible to configure Mahout to retrieve input data from a Cassandra cluster while executing a Recommender Job over Hadoop? I have found some resources on this topic - see ...

hadoop cassandra mahout

asked Jul 7 '13 at 7:12

Dumitru P.
214

3

votes

0answers

207 views

Child Error due to javax.security.auth.login.LoginException

I have a 20 node Hadoop cluster where each node has 8GB memory and an 8-core processor. I sometimes get the following error on a random basis when I have a long running job with 300-600 reducers: ...

hadoop mapreduce

asked May 27 '13 at 19:48

jim.twensky
162

3

votes

0answers

125 views

Job wide custom cleanup after all the map tasks are completed

While running a map-reduce job, that has only mapper, I have a counter that counts the number of failed documents .And after all the mappers are done, I want the job to fail if the total number of ...

hadoop mapreduce

asked May 24 '13 at 3:52

Abhi
161

3

votes

0answers

505 views

Exception in Using Hadoop for MapReduce

I am facing an exception in using Hadoop on local box. Exception in thread "main" java.lang.NoSuchMethodError: ...

hadoop amazon-web-services cloud elastic-map-reduce

asked Apr 29 '13 at 17:06

Hammy
203

3

votes

0answers

967 views

s3distcp: can not create path from empty string

While running s3distcp from S3 to HDFS: sudo -u hdfs hadoop jar /usr/lib/hadoop/lib/s3distcp.jar --src ...

hadoop amazon-web-services amazon-s3 hdfs

asked Apr 4 '13 at 12:21

bocse
262

3

votes

0answers

197 views

custom InputFormat, hadoop c++ pipes

I'd like use hadoop c++ pipes to create my may/reduce code. And the input data is binary, I want to custimize the inputformat to control getSplits logic...... but am unsure if that's a possible ...

hadoop

asked Jan 9 '13 at 9:45

user1960238
161

3

votes

0answers

180 views

'Stream Closed' error when using s3distcp to copy files from HDFS to Amazon S3

I am using s3distcp to copy files from HDFS to Amazon S3. Recently, I started getting the 'Stream Closed' error for reducer tasks. I noticed that the error only happened where there were multiple ...

hadoop

asked Dec 22 '12 at 1:23

coder
363

3

votes

0answers

310 views

ClassCastException while using Avro and MRUnit mapDriver

I am using MRUnit 0.9.0, Avro 1.7.0 and Hadoop 0.20.205.0. I have configured the mapDriver as follows: @Before public void setup() { AvroWordCount.Map mapper = new AvroWordCount.Map(); ...

hadoop avro

asked Dec 11 '12 at 3:27

user1893455
161

3

votes

0answers

664 views

hadoop CompositeInputFormat not joining all data

I'm currently working with Hadoop 0.20.2 and the old API. What I want to do a map-side join. I have a graph dataset which consist of two files one with edges and the other with nodes. The edges are in ...

java input hadoop mapreduce

asked Nov 16 '12 at 11:07

user1829378
163

3

votes

0answers

805 views

How to use Indexing in Hive?

I have written a custom index handler and wanted to test it. However hive is not using it. So I checked with simple table (pokes (int foo, string bar)) which comes with hive distribution for testing ...

hadoop index hive

asked Jul 26 '12 at 16:31

ablimit
79141028

3

votes

0answers

548 views

Whirr: Cannot connect to Hadoop cluster on EC2 after lauch-cluster

I am new to Whirr and I'm trying to setup a Hadoop cluster on EC2 with Whirr,I have followed the tutorial on Cloudera https://ccp.cloudera.com/display/CDHDOC/Whirr+Installation Before install Whirr, ...

hadoop amazon-ec2 cluster-computing apache-whirr

asked Apr 12 '12 at 5:18

L'Imperatore
6116

3

votes

0answers

211 views

Deploying custom MBeans to Hadoop

I'm starting development of a Hadoop application and I'd like to manage it via a couple of MBeans. I've experimented with using MBeanUtils.register and MBeanServer's register method in jar files I'm ...

hadoop jmx mbeans

asked Nov 10 '11 at 20:56

Matt Kleiderman
161

3

votes

0answers

438 views

how to import the package org.apache.hadoop.mapreduce.lib.chain in a hadoop 0.20.2 project?

I'm trying to chain maps and reduces phases in one job. The problem is that I'm running under hadoop 0.20.2 and the package org.apache.hadoop.mapred.lib.Chain seems to be deprecated and replaced by ...

apache hadoop mapreduce chaining

asked Mar 25 '11 at 12:04

Wassim
162

3

votes

0answers

1k views

Hadoop MapReduce - Pig/Cassandra - Unable to create input splits

I'm trying to run a MapReduce Job with Pig and Cassandra and I always get the error: ERROR 2118: Unable to create input splits for: cassandra://constellation/logs [SOLVED] There were some environment ...

hadoop mapreduce cassandra apache-pig

asked Nov 16 '10 at 9:59

Christoph
2663921

2

votes

0answers

20 views

Decompressing LZ4 compressed data in Spark

I have LZ4 compressed data in HDFS and I'm trying to decompress it in Apache Spark into a RDD. As far as I can tell, the only method in JavaSparkContext to read data from HDFS is textFile which only ...

hadoop hdfs apache-spark lz4

asked Jul 27 at 21:10

shoopdelang
1218

2

votes

0answers

33 views

Spring XD dynamic deployment manifest

I have been reading the Spring XD documentation fairly heavily and can't really get to grips with two things I'd like to achieve in relation to Hadoop YARN. Maybe they aren't supported yet or won't ...

hadoop yarn spring-xd

asked Jul 26 at 22:42

user3880645
111

2

votes

0answers

64 views

loading data into hdfs in parallel

I have a Hadoop cluster consisting of 3 Nodes. I want to load a 180 GB file into HDFS as fast as possible. I know neither -put nor -copyFromLocal are going to help me in this as they are single ...

java multithreading hadoop mapreduce hdfs

asked Jul 20 at 22:54

bytebiscuit
78111030

2

votes

0answers

71 views

Manual Fix of Hbase table Overlap (Multi region has same start key)

I was inserting the data into Hbase through the java client. But, suddenly the Region server crashed at a point. So i restarted the Hbase, which after that the Hmaster was not running. When i run the ...

hadoop hbase hdfs bigdata

asked Jul 5 at 7:39

vivek_nk
1,091311

2

votes

0answers

62 views

Reading SequenceFile written by Spark

I have bunch of sequence files that I want to read using Scalding and I am having some troubles. This is my code: class ReadSequenceFileApp(args:Args) extends ConfiguredJob(args) { ...

scala hadoop cascading sequencefile scalding

asked Jul 2 at 12:00

Rob Schneider
1271317

2

votes

0answers

35 views

Why is my test hadoop code that connects to libhdfs throwing a Segmentation fault?

I'm using libhdfs to connect and write into an hdfs system. The program works fine, however when I attach GDB to it, It segfaults in hdfsConnect, but the connection goes through and I'm able to write ...

c++ hadoop segmentation-fault

asked Jun 18 at 14:00

shrinidhisondur
676

2

votes

0answers

41 views

How to execute aggreagatewordcount example in hadoop which uses hadoop aggregate framework?

I tried executing the aggregatewordcount example found in hadoop examples jar file. Even though the program ran successfully, the output was not what I expected. The output file just has a single line ...

java hadoop mapreduce

asked Jun 8 at 10:17

jithinjustin
905

2

votes

0answers

67 views

Getting NameNode's fsimage size using Java

I'm trying to get metadata about a NameNode from a running Hadoop cluster using Java. Specifically, I would like to get the size of fsimage, the last checkpoint time, and number and size of the edit ...

java hadoop

asked May 30 at 21:22

irishman
111

2

votes

0answers

55 views

NoSuchMethodError using Guava 15 on Hadoop (2.3.0)

I have a compiled jar for Hadoop including this library: com.google.guava:guava:jar:15.0:compile When I submit it into my Hadoop CDH5.0.1 cluster I have this error: java.lang.NoSuchMethodError: ...

hadoop guava cloudera-cdh

asked May 28 at 10:49

Ferran Galí
111

2

votes

0answers

37 views

File processing using AWS EMR

I need architectural suggestion for this problem I'm working on. I have log files coming in every 15 minutes in gzipped folder. Each of these have about 100,000 further files to process. I have a ...

python hadoop parallel-processing mapreduce amazon-emr

asked May 27 at 21:29

Amey
112

your communities

Tagged Questions

Related Tags