Unanswered 'bigdata' Questions

5

votes

0answers

135 views

Decision tree implementation issue in apache spark with java

I'm trying to implement simple demo for decision tree classifier using java and apache spark 1.0.0 version. I base on http://spark.apache.org/docs/1.0.0/mllib-decision-tree.html. So far I've wrote ...

asked Jun 28 at 22:38

caruso
675

3

votes

0answers

62 views

How can Kafka limitations be avoided?

We're trying to build a BI system that will collect very large amounts of data that should be processed by other components. We decided that it will be a good idea to have an intermediate layer to ...

java bigdata business-intelligence apache-kafka

asked Jul 21 at 11:06

Stephan
5,3312818

3

votes

0answers

78 views

Orchestration of Apache Spark using Apache Oozie

We are thinking of the integration of apache spark in our calculation process where we at first wanted to use apache oozie and standard MR or MO (Map-Only) jobs. After some research several ...

hadoop bigdata apache-spark oozie

asked Jul 14 at 14:44

Matthias Kricke
685414

2

votes

0answers

44 views

Export large amount of data from Cassandra to CSV

I'm using Cassandra 2.0.9 for store quite big amounts of data, let's say 100Gb, in one column family. I would like to export this data to CSV in fast way. I tried: sstable2json - it produces quite ...

csv cassandra bigdata cassandra-2.0

asked Jul 22 at 19:38

KrzysztofZalasa
111

2

votes

0answers

71 views

Manual Fix of Hbase table Overlap (Multi region has same start key)

I was inserting the data into Hbase through the java client. But, suddenly the Region server crashed at a point. So i restarted the Hbase, which after that the Hmaster was not running. When i run the ...

hadoop hbase hdfs bigdata

asked Jul 5 at 7:39

vivek_nk
1,091311

2

votes

0answers

71 views

bigrquery: Error with Google Big Query R interface

I'm using bigrquery R package to fetch the data. But i'm getting the following error. Let me know if anyone knows how to fix this error. "Waiting for authentication in browser... Authentication ...

r google-api bigdata google-oauth google-bigquery

asked Apr 24 at 8:39

hari
172

2

votes

0answers

49 views

EMR bootstrap action to run Hue on Mapr M3

Is there some bootstrap script to get hue running on EMR MapR, unlike setting up using this guide http://doc.mapr.com/display/MapR/Configuring+Hue

bigdata cloudera hue mapr

asked Feb 11 at 11:50

Praveen R
9316

2

votes

0answers

84 views

R instability with high numbers of ggplots

Background I'm trying to generate a whopping ton of histogram plots (about 100) by using ggplot and this multiplot function. multiplot takes a list of plots as its main argument, so I generate a ...

r ggplot2 bigdata

asked Jan 12 at 23:52

Trevor Alexander
618519

2

votes

0answers

71 views

django unit test database is empty

We are doing a project which uses more than 10 tables for data flow, but when using unit test for database table data query scheme, it returns an empty set. Is there any way we can run './manage.py ...

database django testing bigdata empty-list

asked Nov 12 '13 at 11:09

Marissa
133

2

votes

0answers

207 views

R bigmemory attach.big.matrix is very slow for very wide matrices

I am using the package bigmemory to interact with large matrices in R. This works well for large matrices except that the attach.big.matrix() function to reload a binary file created with ...

r matrix bigdata ehcache-bigmemory

asked Oct 1 '13 at 18:50

user2836087
113

2

votes

0answers

89 views

How to load rdf data to bigdata using nanosparqlserver

I have downloaded bigdata.war and have deployed it using sesame HTTP API. Now I am not getting as to how should I load rdf triples/provenance with triples to bigdata using nanosparqlserver? I am using ...

mysql apache http rdf bigdata

asked Jul 21 '13 at 11:55

Jannat Arora
390327

2

votes

0answers

178 views

Scala or Java analogues of PyTables & numexpr

I am looking for Scala or Java analogues of numexpr and PyTables (particularly tables.Expr). This is for a multicore analytics systems on multicore machines which needs to perform matrix operations ...

java scala bigdata pytables numexpr

asked Nov 15 '12 at 1:34

Daniel Mahler
1,144518

2

votes

0answers

170 views

Store and visualize super large social graph both on desk application and online

anyone knows about the most efficient way to store and visualize a large graph with several million edges? I'm aware of Gephi. But it can't visualize such a big data set.(at least in my laptop with ...

javascript graph nosql bigdata

asked Aug 31 '12 at 20:00

user1056824
635

1

vote

0answers

11 views

Curator framework for zookeeper - Interprocess mutex takes 50ms to acquire lock each time

I am using Curator framework Interprocess mutex for creating distributed lock to reserve some resource. However, I can see that zookeeper takes 50-100 ms each time to acquire a lock and 20-40 ms for ...

redis bigdata zookeeper curator

asked Jul 28 at 17:32

Shiva
83

1

vote

0answers

15 views

Kafka Spout consumer and another consumer running inside Spring not able to run simultaneously

My application accepts a url containing the data it needs to process through a rest service it exposes using spring. Each time it receives a url from which to accept data, the application Sends the ...

bigdata storm apache-kafka

asked Jul 28 at 11:27

punitjajodia
6217

1

vote

0answers

35 views

Why MapReduce processing Avro files is slower than processing flat files?

Why MapReduce processing Avro files is slower than processing flat files? I expected that processing Avro files would be a lot faster than processing flat files, but my assumption is wrong. Avro ...

hadoop mapreduce bigdata avro

asked Jul 15 at 17:55

diplomaticguru
226

1

vote

0answers

32 views

Mysql: Multiple updates from a single select

I have a case where I need to match a group of fields as Unique on Addresses table, but for that, on database, i have to detect duplicates, delete them from database and update all associated foreign ...

mysql database sql-update bigdata

asked Jul 11 at 15:00

Gabriel Conceição
62

1

vote

0answers

27 views

hadoop2.4.0 namenode -format showing NoClassDefFoundError error

I have configured and install the HADOOP I configured from the website http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os when I format the ...

java apache hadoop bigdata

asked Jul 7 at 10:27

Mohasin Ali
248111

1

vote

0answers

25 views

Finding longest common sequences in big data

I have logs from a bunch (millions) of small experiments. Each log contains a list (tens to hundreds) of entries. Each entry is a timestamp and an event ID (there are several thousands of unique ...

linux bigdata data-analysis sequencing

asked Jul 1 at 13:19

Alexander Gladysh
9,5541357109

1

vote

0answers

31 views

why storm performance are getting very slow after few minutes?

I'm running throughput topology for testing the performance. In the first two minutes I'm getting good performance average of 450k emitted/sec after 10 minutes it goes down to average of 100k per/sec. ...

cloud bigdata storm

asked Jun 26 at 15:51

15412s
133110

1

vote

0answers

17 views

Issue with running more than one topology on storm cluster

It is not possible to run more than one topology on the same cluster. All topologies are registered fine, I can see them in the UI, but only the first topology runs. No workers,executors,tasks are ...

cloud bigdata storm

asked Jun 26 at 11:34

15412s
133110

1

vote

0answers

99 views

Failing to write offset data to zookeeper in kafka-storm

I was setting up a storm cluster to calculate real time trending and other statistics, however I have some problems introducing the "recovery" feature into this project, by allowing the offset that ...

bigdata zookeeper storm apache-kafka

asked Jun 25 at 11:58

Juto
403515

1

vote

0answers

18 views

Slow performance in using storm local cluster

I'm trying to find out the storm as a pipe line performance. I ran the following code in local cluster mode: http://kaviddiss.com/2013/05/17/how-to-get-started-with-storm-framework-in-5-minutes/ It ...

cloud bigdata storm

asked Jun 24 at 10:11

15412s
133110

1

vote

0answers

41 views

Need suggestions implementing recursive logic in Hive UDF

We have a hive table that has around 500 million rows. Each row here represents a "version" of the data and Ive been tasked to create table which just contains the final version of each row. ...

hadoop hive bigdata recursive-query

asked Jun 13 at 13:52

Nate
62

1

vote

0answers

38 views

GoogleCloudPlatform/solutions-automated-file-loader-for-bigquery In PHP

I am not familiar with java development. I am PHP developer. I want solution for automated file loader for bigquery In PHP like this enter link description here currently i am using BigQuery REST ...

php google-app-engine bigdata google-bigquery

asked Jun 6 at 12:02

Ravindra
677

1

vote

0answers

29 views

Java equivalent to R iPlots package (alternative to Mondrian)

One demonstrated me the power of R iPlot package : you plot the same individual data two different ways, and when selecting data on one figure, it selects the matching data on the other figure. Eg: ...

java r bigdata mondrian

asked Jun 4 at 21:49

ThePolyscope
62

1

vote

0answers

34 views

Is the hadoop cluster configuration possible? what are minimum disk space requirements?

My hadoop clusters are based on Virtual Machine. Following is the configuration: 1 master and 9 slaves. master: disk space: 20GB memory: 16G CPU cores: 8 slave1 ~ slave9: disk space: 5GB ...

java memory hadoop bigdata

asked May 23 at 2:11

Alexia Wang
418

1

vote

0answers

122 views

Upconversion/ Grouping using Map Reduce

I have 2 documents List of offerings and associated zip codes US Postal code data. The first document is of the form: offer, location(currently only zips) 1, 84121 1, 84101 1, 58103 1, 58102 2, ...

hadoop mapreduce location bigdata

asked May 15 at 16:58

Anant
1613

1

vote

0answers

67 views

r - viterbi RHmm Error protection stack overflow

I was looking for a HMM implementation in R to analyze states in a string of characters and the HMM library seems to run slow, then I am using the RHmm library. My data is a string of 1953138 symbols ...

r stackoverflow bigdata viterbi

asked May 14 at 19:21

Sierra
62

1

vote

0answers

32 views

Hadoop data nodes die very often

Our Hadoop cluster is a cluster of 5 data nodes and 2 name nodes. The traffic is actually very high and a few nodes go down very often. But they come back after a while. Some times it takes a long ...

hadoop hdfs bigdata nfs

asked May 7 at 11:06

Bhargav Sarvepalli
466

1

vote

0answers

227 views

Apache Kafka consumer client connecting to Apache Zookeeper: EndOfStreamException

I get an error when trying to 'consume' messages from Kafka (2.9.2-0.8.1) with a Zookeer stand-alone (3.4.5). You can see the source code below as well as the error message and logfile from Zookeeper. ...

maven-2 cluster-computing bigdata zookeeper apache-kafka

asked May 6 at 10:03

sema
143112

1

vote

0answers

76 views

matrix multiplication on hadoop

I'm looking for the best and easy way of matrix multiplication on hadoop java. Meanwhile I looked at this link http://www.norstad.org/matrix-multiply/index.html but I felt tough to understand it. ...

java hadoop matrix bigdata

asked Apr 14 at 10:34

user3322698
367

1

vote

0answers

77 views

How to read large data set at hourly interval

For example, I have 30million records, stored in our datastore. Then I want to read a fraction of them randomly at 2 hours interval: e.g I want to read 1 million random records every 2 hours, and do ...

elasticsearch workflow bigdata apache-kafka

asked Apr 12 at 18:41

user648922
4318

1

vote

0answers

325 views

Hive query with where clause not working

I am querying an external Hbase table from Hive. when i do a simple query select * from Document_Table_Hive The query works and I get the records stored in the table. but when I do a query with ...

hive hbase bigdata hortonworks-data-platform

asked Mar 28 at 11:05

Afaque
196

1

vote

0answers

37 views

Finding and debugging bad record using hive

Is there any way to pinpoint the badrecord when we are loading the data using hive or while processing the data. The scenario Goes like this. Suppose I have file that need to be loaded as table using ...

mapreduce hive bigdata

asked Mar 17 at 11:28

Krish
62

1

vote

0answers

26 views

best solution for hirercical and non typed data

I have a db schema like this: ELEMENT(uuid[string], name[string], status[integer], ...) APPLICATION(uuid[string], name[string], config[string], status[integer], parent[foreign key on self or foreign ...

database performance nosql persistence bigdata

asked Mar 16 at 1:03

cresg820
162

1

vote

0answers

47 views

Hive optimizer not performing well for joins involving partitioned tables

I am using Hive version 0.7.1-cdh3u2 I have two big tables (let's say) A and B, both partitioned by day. I am running the following query select col1,col2 from A join B on (A.day=B.day and ...

sql database hadoop hive bigdata

asked Feb 27 at 11:00

Mukul Gupta
1549

1

vote

0answers

30 views

Trying to upgrade from CDH4.2 to CDH4.5, but can not Distribute it

I'm trying to upgrade from CDH4.2 to CDH4.5 use cloudera Manager. I click 'download' of CDH 4.5.0-1.cdh4.5.0.p0.30, it shows 100%, but the button still shows 'download', not distribute. I click ...

hadoop mapreduce bigdata cloudera cloudera-manager

asked Feb 25 at 2:44

user1284984
507

1

vote

0answers

178 views

Cloudera Manager. Failed to detect Cloudera Manager Server

I have two PC's with CentOS 6.5 client86-101.aihs.net 80.94.86.101 client86-103.aihs.net 80.94.86.103 cloudera-manager-server installed on client86-101.aihs.net. I have the problem on detecting ...

hadoop bigdata cloudera cloudera-manager

asked Feb 20 at 15:34

Peter Shipilo
408215

1

vote

0answers

56 views

How to use Yarn schedulers and queues ?

I need to access Yarn schedulers and queues from java programs to change the priority of submitted MR-jobs. Is it possible ? And if it is, please help with some code snippets. Similar codes for ...

java hadoop bigdata yarn

asked Feb 10 at 14:13

blackSmith
64139

1

vote

0answers

11 views

How to process dynamo objects for querying

I have complex objects (4 level relationships) such as a match and a team and a fixture which has players.i have data in millions which is growing daily.How do i prepare them for reporting if i hold ...

database bigdata

asked Feb 2 at 23:35

Rıfat Erdem Sahin
948

1

vote

0answers

306 views

Reading labview binary files in Matlab?

I have large .bin files (10GB-60GB) created by Labview software, the .bin files represent the output of two sensors used from experiments that I have done. The problem I have is importing the data ...

matlab bigdata fread

asked Jan 22 at 11:34

James Archer
9319

1

vote

0answers

141 views

Most Efficient Way of Chunking a Large Iterable in Python for Brute Forcing

I am trying to develop a way to address large parallel tasks for bruteforcing a keyspace. I'd like to be able to come up with a way to pass a worker a value in such a way that given a chunk size, that ...

python parallel-processing bigdata brute-force

asked Nov 20 '13 at 1:48

user3011243
61

1

vote

0answers

15 views

Searching in a large dataset (Stays in boxes => Meetings)

I am working on a model of social interactions in mice. I have mice and boxes and a simulation that outputs which mouse stays in which box during which time period. The problem is how to obtain, in ...

mysql database data hadoop bigdata

asked Nov 17 '13 at 0:07

user3000568
112

1

vote

0answers

597 views

Split Large table of Terabytes using MYSQL Sharding

I know that horizontal partitioning...you can create many tables. I've seen that In a application based sharding, you will have the same database structure on multiple database servers. But it won't ...

mysql database bigdata sharding

asked Oct 11 '13 at 5:30

Imran
734316

1

vote

0answers

786 views

MongoDB running out of disk space

I have to store around 200GB of raw data in one MongoDB collection. Which works fine. After I inserted all objects, I have to iterate over the whole collection with a cursor and write some new fields ...

java mongodb bigdata

asked Sep 26 '13 at 12:36

Saskia Vola
112

1

vote

0answers

70 views

no sql read and write intensive bigdata table

I am having 10 different queries and a total of 40 columns. Looking for solutions in available Big data noSQL data bases that will perform read and write intensive jobs (multiple queries with SLA). ...

nosql bigdata

asked Sep 19 '13 at 10:19

bns
63

1

vote

0answers

231 views

How to plot a heatmap of a big matrix with matplotlib (45K * 446)

I am trying to plot a heatmap of a big microarray dataset (45K rows per 446 columns). Using pcolor from matplotlib I am unable to do it because my pc goes easily out of memory (more than 8G).. I'd ...

python matplotlib bigdata heatmap

asked Sep 13 '13 at 16:55

nsl
1,05341842

1

vote

0answers

104 views

Error while downloading file from jasper

I am trying to download report from oracle 11g using jasper server. The URL is calling from APEX. For small files its working correctly But when the report become 15MB or something the exported file ...

nullpointerexception jasper-reports bigdata jasperserver

asked Sep 4 '13 at 12:02

user2706726
61

1

vote

0answers

85 views

python shelve TypeError on large dictionary object

I have a large dictionary object dict_tmp that takes 40GB in RAM (system has a total of 64GB), which has string keys and float values. I use d = shelve.open(fname, protocol=2) and d['dict_tmp'] = ...

python database performance bigdata shelve

asked Jul 27 '13 at 0:15

Ananda Narayan
92

your communities

Tagged Questions

Related Tags