Newest 'hadoop python' Questions

0

votes

0answers

13 views

Invalid syntax hadoop streaming error

I am trying to run a Hadoop streaming Python job: /home/hduser/hadoop/bin/hadoop jar /home/hduser/hadoop/share/hadoop/tools/lib/hadoop-*streaming*.jar -file audio.py -cacheFile ...

asked yesterday

schoon
537

0

votes

0answers

20 views

Error launching job using mrjob on Hadoop

I am new to hadoop and mrjob and this book really helped me a lot to learn. I was trying to run mrSVM.py on hadoop as it works fine locally. But I ran the following command:python mrSVM.py -r hadoop ...

python python-2.7 hadoop mrjob

asked 2 days ago

Manvendra singh tomar
1

0

votes

0answers

12 views

How to mention a Combiner in Oozie while using streaming jar

I have a streaming job that I am calling through Oozie. I am able to run this successfully with a mapper and reducer. But what I am failing to understand is, how do I pass the combiner. All my mapper, ...

python hadoop oozie combiners

asked Aug 16 at 14:12

Sayon Majumdar
1

0

votes

0answers

16 views

Can not put a file in hdfs using hadoopy

I have installed hadoopy based on this tutorial: http://www.hadoopy.com/en/latest/tutorial.html#putting-data-on-hdfs But when I try to run a simple example, for instance(example.py): import hadoopy ...

java python hadoop hadoopy

asked Aug 14 at 22:36

Nemanja91
32

1

vote

1answer

25 views

How to run a C++ executable from hadoop python wrapper

I am new in hadoop streaming library using python. So the question may look stupid but I got stuck here badly. Any help is appreciated. I am trying to run a C++ executable (which takes a local ...

python c++ hadoop

asked Aug 14 at 14:18

user3606212
134

0

votes

0answers

34 views

Where did the Luigi task go?

First time into the realm of Luigi (and Python!) and have some questions. Relevant code is: from Database import Database import luigi class bbSanityCheck(luigi.Task): conn = luigi.Parameter() ...

python hadoop

asked Aug 13 at 21:23

nick
2351315

0

votes

1answer

50 views

Why won't python read stdin input as a dictionary?

I'm sure I'm doing something dumb here, but here goes. I'm working on a class assignment for my Udacity class "Intro to Map Reduce and Hadoop". Our assignment is to make a mapper/reducer that will ...

python hadoop dictionary mapreduce stdin

asked Aug 12 at 18:11

jrubins
162

0

votes

1answer

20 views

python script for avro conversion using Hadoop Streaming

I have 10 GB of input file which i am trying to convert to avro using python hadoop streaming, the job is successfull but i canot read the output using the avro reader. It is giving 'utf8' codec ...

python hadoop

asked Aug 12 at 16:12

user3012093
211

0

votes

0answers

17 views

Hive client for Python 3.x

is it possible to connect to hadoop and run hive queries using Python 3.x? I am using Python 3.4.1. I found out that it can be done as written here: ...

python hadoop python-3.x hive

asked Aug 12 at 10:06

adomasb
365

0

votes

0answers

8 views

Hadoop streaming - wrapper executing binary application issues

I'm new to Hadoop and am attempting to use Hadoop streaming to parallelize a physics simulation that is compiled into a binary. The idea would be to run the binary in parallel using maps with one ...

python hadoop hadoop-streaming

asked Aug 11 at 20:24

user2666216
135

1

vote

1answer

56 views

With Spark,how to connect master or solve an error:“WARN TaskSchedulerImpl: Initial job has not accepted any resources”

Please tell me to how to following problem. Firstly,I confirmed that following code run when master is "local". Then I started two EC2 instances(m1.large). However,when master is ...

python hadoop amazon-ec2 spark apache-spark

asked Aug 7 at 7:07

prk2
427

1

vote

1answer

12 views

Parsing json string generated from org.apache.avro.mapred.AvroAsTextInputFormat using python streaming

In hadoop streaming using python for reading avro data file I am using the input format, which doc says the input key is string representation in JSON. -inputformat ...

python json hadoop

asked Aug 6 at 17:26

user3012093
211

0

votes

0answers

16 views

Hive data search & exploration tool

I have several Hive tables. I would like to create a web interface where users could search and explore a small sample dataset and also the schema of the tables. One option could be by exporting a ...

python search hadoop hive

asked Aug 2 at 3:46

user1484282
61

3

votes

0answers

90 views

What is an efficient way of running a logistic regression for large data sets (200 million by 2 variables)?

I currently am trying to run a logistic regression model. My data has two variables, one response variable and one predictor variable. The catch is that I have 200 million observations. I am trying to ...

python r matlab hadoop stata

asked Jul 30 at 20:32

user1398057
524

1

vote

1answer

25 views

mongodb_hadoop streaming with python: -inputURI not recognized

I'm trying to create a MapReduce application in python using the mongodb_hadoop connecter. I have a cluster with hadoop 2.2.0 installed. I've installed the mongodb_hadoop connector v1.3.0. I've ...

python mongodb hadoop

asked Jul 30 at 14:10

FlyinPoulpus
186

0

votes

1answer

21 views

How to call filebrowser in HUE

I'll start by saying that I'm very new to HUE and Python and have no prior experience with either. What I have to do now is make my own HUE application to upload files to HDFS, start an oozie work ...

python hadoop cloudera hue

asked Jul 28 at 10:21

Havnar
9016

-1

votes

0answers

10 views

Pydoop IOError: Cannot connect to localhost

I installed Hadoop 2.2.0 and Pydoop on fedora 20. on executing command hadoop fs -ls hdfs://localhost:8020/ output: drwxr-xr-x - root supergroup 0 2014-07-09 17:03 ...

python hadoop

asked Jul 25 at 7:52

user3805623
11

0

votes

0answers

23 views

Python to Pig. Loading Binary Delimited Text

I'm a little new to Pig/Hadoop. I'm trying to load a server logs stored as a gzip, however the logs are stored in binary delimited form. In Python, I would translate the file as below. Anybody know ...

python hadoop apache-pig

asked Jul 24 at 21:47

cloud36
92110

0

votes

1answer

18 views

Amazon EMR job with many json files as input

I am writing a hadoop streaming application in python to run on EMR. The input for the EMR job is a directory of files in an S3 bucket, each of which is a json file containing a single json object. I ...

python json hadoop amazon-s3 amazon-emr

asked Jul 23 at 13:36

Jay Hack
404

0

votes

1answer

44 views

is MapReduce usefull for processing big files, crawling a lot of pages for data and inserting them in Hbase?

I have some python scripts that I run every day, these scripts do this stuff : parse 1000 text files (gziped) : ~ 100 GB 30 Millions rows Crawl some data from many websites : 40 ...

python hadoop mapreduce hbase hadoop-streaming

asked Jul 23 at 11:43

Abdelali AHBIB
18219

0

votes

1answer

26 views

Efficient way to intersect multiple large files containing geodata

Okay, deep breath, this may be a bit verbose, but better to err on the side of detail than lack thereof... So, in one sentence, my goal is to find the intersection of about 22 ~300-400mb files based ...

python mysql performance hadoop arcmap

asked Jul 23 at 3:21

user1028885
577

0

votes

0answers

13 views

Issue with using files in distributed cache in Elastic MapReduce

I'm trying to make use of an external library in my Python mapper script in an AWS Elastic MapReduce job. However, my script doesn't seem to be able to find the modules in the cache. I archived the ...

python hadoop amazon-web-services elastic-map-reduce

asked Jul 10 at 5:09

user296554
11

1

vote

1answer

40 views

unable to run map reduce using python in Hadoop?

I have written mapper and reducer in python for word count program that works fine. Here is a sample: echo "hello hello world here hello here world here hello" | wordmapper.py | sort -k1,1 | ...

python hadoop mapreduce hadoop2

asked Jul 9 at 18:53

eagertoLearn
872626

0

votes

1answer

30 views

Killing a program with except: pass

Is there any way to kill a program that ignores all exceptions? Stupid, I know. I was testing something (since I wasn't sure what error a failed, embedded pig script would throw), forgot to limit the ...

python hadoop

asked Jul 8 at 1:32

user2958714
254

1

vote

0answers

19 views

Hadoop streaming accessing files in a directory

I wish to access a directory in Hadoop (via Python streaming) and loop through its image files, calculating hashes of each in my mapper. Does the following logic make sense (and instead of hard ...

python hadoop hadoop-streaming

asked Jul 4 at 14:03

schoon
537

0

votes

0answers

21 views

MapReduce Task fails Python

I seem to be getting the following error: 14/07/02 23:29:14 INFO mapreduce.Job: Task Id : attempt_1395688818137_1239_r_000001_2, Status : FAILED Error: java.lang.RuntimeException: ...

python hadoop mapreduce

asked Jul 2 at 23:49

user3799652
1

0

votes

3answers

61 views

Why my hadoop output is many parts of file?

I tried to count the frequency of word, and write the file: mapper.py: #!/usr/bin/env python import sys # input comes from STDIN (standard input) for line in sys.stdin: # remove leading and ...

python hadoop

asked Jul 2 at 13:12

ChaosCosmos
365

2

votes

1answer

23 views

error while executing python mapreduce tasks in hadoop?

I have written mapper and reducer for the wordcount example in python. The scripts works fine as a standalone ones. but I get error when run in hadoop. I am using hadoop2.2 Here is my command: ...

python hadoop hadoop2

asked Jul 1 at 20:14

brain storm
2,23011033

0

votes

0answers

66 views

Installing pyspark on hadoop and yarn

I have installed spark on top of hadoop and yarn. when I launch the pyspark shell and try to compute something I get this error. Error from python worker: /usr/bin/python: No module named pyspark ...

python hadoop apache-spark yarn

asked Jun 28 at 0:04

Sanghamitra Deb
31

0

votes

0answers

33 views

Shuffle and Sorting in Hadoop

I have been reading about Hadoop and have implemented sample MR programs in Hadoop using Python. I am confused about shuffle and sorting in hadoop My mapper code emits key value pairs Example ...

python sorting hadoop mapreduce hadoop-streaming

asked Jun 26 at 4:18

user3677949
83

0

votes

0answers

62 views

Hive UDF with Python - Runtime error

I write a Hive Query that calls a UDF written in Python, but I get this error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row ...

python hadoop hive user-defined-functions

asked Jun 25 at 18:10

Ganesh Sundar
474

0

votes

0answers

41 views

issues on python packages on hadoop distributed system nodes

I use python to do hadoop streaming. We use an AWS hadoop streaming distributed systems which has a master node, and four slave nodes. If I need to install a package on python, I need to install the ...

python hadoop streaming packages

asked Jun 23 at 17:26

user3634601
12

0

votes

1answer

32 views

Hadoop EMR using Python

I'm using Hadoop streaming to use my mapper and reducer code in python to run a Mapreduce job. I have input data in s3, and I'm trying to use that for the job. However, when I run the command like ...

python hadoop emr

asked Jun 20 at 0:12

aishpr
476

0

votes

1answer

84 views

Why these seemed right hadoop streaming python scripts do not work?

I have a set of hadoop streaming job, like below: bash file: hadoop fs -rmr /tmp/someone/sentiment/ hadoop jar ...

python hadoop streaming

asked Jun 18 at 22:29

user3634601
12

0

votes

1answer

201 views

Hive UDF with Python

I'm new to python, pandas, and hive and would definitely appreciate some tips. I have the python code below, which I would like to turn into a UDF in hive. Only instead of taking a csv as the input, ...

python hadoop pandas hive

asked Jun 18 at 19:44

user3476463
313

0

votes

0answers

25 views

how to work with dumbo

I have written a simple k-means clustering code for Hadoop (two separate programs - mapper and reducer). The code is working over a small dataset of 2d points on my local box. It's written in Python ...

python hadoop hadoop-streaming

asked Jun 16 at 3:35

user3616059
195

0

votes

1answer

42 views

Iterative kmeans based on mapreduce and hadoop

I have written a simple k-means clustering code for Hadoop (two separate programs - mapper and reducer). The code is working over a small dataset of 2d points on my local box. It's written in Python ...

python hadoop mrjob

asked Jun 14 at 11:04

user3616059
195

0

votes

1answer

256 views

Exporting a Scikit Learn Random Forest for use on Hadoop Platform

I've developed a spam classifier using pandas and scikit learn to the point where it's ready for integration into our hadoop-based system. To this end, I need to export my classifier to a more common ...

python hadoop machine-learning scikit-learn pmml

asked Jun 13 at 19:28

Axel Magnuson
422314

-1

votes

1answer

53 views

remove empty line printed from hive query output using python

i am performing a hive query and storing the output in a tsv file in the local FS. I am running a for loop for the hive query and passing different parameters. If the hive query returns no output once ...

python mysql hadoop hive

asked Jun 13 at 0:54

rond
91314

0

votes

1answer

77 views

Preserving column data types in Hadoop UDF output (Streaming)

I'm writing a UDF in Python for a Hive query on Hadoop. My table has several bigint fields, and several string fields. My UDF modifies the bigint fields, subtracts the modified versions into a new ...

python hadoop hive apache-pig hadoop-streaming

asked Jun 12 at 1:05

Maxim Zaslavsky
6,8881365122

0

votes

0answers

30 views

Running extrnal python lib like (NLTK) with hadoop streaming

I tried using http://blog.cloudera.com/blog/2008/11/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/ zip -r nltkandyaml.zip nltk yaml mv ntlkandyaml.zip ...

python hadoop nltk hadoop-streaming

asked Jun 11 at 16:23

nit254
11

2

votes

1answer

63 views

Hadoop streaming failed with java.io.FileNotFoundException

I have written a map only python map-reduce job which accepts data from standard input and process it to produce some output. It works fine when executed locally. However, when I am trying to execute ...

python hadoop mapreduce hadoop-streaming

asked Jun 10 at 20:29

Anand Gupta
11911

-3

votes

2answers

45 views

how to get output from part-r-0000 in apache pig

I am parsing pcap file using pig. I am getting output in part-r-0000 file. It is showing me following output. 1101 1646 503 679 556 480 80 471 How to get actual output from that file? What is the ...

python hadoop apache-pig snort

asked Jun 10 at 10:21

Laxdeep
1

1

vote

0answers

57 views

kmeans based on mapreduce by python

I am going to write a mapper and reducer for the kmeans algorithm, I think the best course of action to do is putting the distance calculator in mapper and sending to reducer with the cluster id as ...

python hadoop mrjob

asked Jun 10 at 9:15

user3616059
195

0

votes

1answer

19 views

Send output of Hadoop streaming job to STDOUT

For streaming jobs you have to specify an output directory. What if I wanted to output the results of the mapper to stdout instead of an HDFS directory. Is this possible? I want to do this so I can ...

python apache hadoop mapreduce

asked Jun 9 at 20:45

user3220769
173

0

votes

2answers

49 views

Convert list elements into array

I have a list tsv file which I am parsing and want to convert it into an array. Here is the file format - jobname1 queue maphours reducehours jobname2 queue maphours reducehours code with ...

python arrays list hadoop

asked Jun 9 at 0:32

rond
91314

7

votes

0answers

202 views

Hadoop streaming jobs SUCCEEDED but killed by ApplicationMaster

I just finished setting up a small hadoop cluster (using 3 ubuntu machines and apache hadoop 2.2.0) and am now trying to run python streaming jobs. Running a test job I encounter the following ...

python hadoop

asked Jun 2 at 11:39

GebitsGerbils
362

-1

votes

1answer

34 views

change a python script to Unix line-ending convention

What is the easiest way to change a python script to Unix line-ending convention? I am running a python script on Hadoop and seeing the following stderr log: /usr/bin/env: python : No such file or ...

python unix hadoop

asked May 28 at 20:33

user3681744
94

0

votes

3answers

54 views

Remove empty lines from hive query output I am saving on local filesystem

I am running a python script on my devbox to remotely ssh on a grid gateway box to launch another python script which runs the hive query and returns the output back and I save it on my devbox in the ...

python hadoop hive

asked May 28 at 19:38

rond
91314

1

vote

0answers

27 views

How to determine locality of HDFS file for use in Python?

I have a system that runs Python tasks across a compute cluster using Celery to manage the queue. These tasks operate on data stored in MapR-FS (which exposes the Hadoop HFDS API, so things ...

python hadoop celery hdfs mapr

asked May 28 at 17:42

Michael Moore
567

your communities

Tagged Questions

Related Tags