Newest 'hadoop' Questions

0

votes

0answers

81 views

Non HBase solution for huge data that has update and delete in sequential manner

I have to design an application where there are around 5K structured base text files (file.txt) with data and format as below: Primary key is OrgId + ItemId OgId|^|ItemId|^|segmentId|^|Sequence|^|...

asked Aug 9 at 11:58

SUDARSHAN

1092

-1

votes

1answer

437 views

Is this Big Data architecture good enough to handle many requests per second?

I want to ask for a review of my big data app plan. I haven’t much experience in that field, so every single piece of advice would be appreciated. Here is a link to a diagram of the architecture: My ...

node.js big-data hadoop cassandra nginx

asked Feb 13 at 16:29

Alan Mroczek

1073

3

votes

4answers

361 views

Can someone explain the technicalities of MapReduce in layman's terms?

When people talk about MapReduce you think about Google and Hadoop. But what is MapReduce itself? How does it work? I came across this blog post that tries to explain just MapReduce without Hadoop, ...

java big-data indexing hadoop spark

asked Jan 7 at 0:28

Eddie Bravo

212

0

votes

1answer

76 views

#Apache-flink: Stream processing or Batch processing using Flink

I am tasked with redesigning an existing catalog processor and the requirement goes as below Requirement I have 5 to 10 vendors(each vendor can have multiple stores) who would provide me with 'XML' ...

distributed-computing apache hadoop distributed-system apache-kafka

asked Sep 22 '16 at 6:11

nura

43

2

votes

2answers

100 views

SRP in the “big data” setting

We have a codebase at work that: Ingests (low) thousands of small files. Each of these input files contains about 50k “micro-items” These “micro-items” are then clustered together to find “macro-...

java single-responsibility distributed-computing hadoop

asked Aug 5 '16 at 14:14

Ivan

40228

0

votes

0answers

487 views

How to use Hadoop HBase with Spring Boot without knowing the schema of the database ahead of time

I have created a basic application with spring boot and HSQL which connects an in-memory HSQL database with an angularjs front end using spring-boot and spring JPA with Hibernate. I am now trying to ...

spring hibernate hadoop

asked Jul 13 '16 at 19:39

user5625333

1

-3

votes

1answer

154 views

Should I use NoSQL or HDFS for storage?

I have millions of tweets currently stored in HDFS and I plan to analyze them from Spark (Data mining, text mining, Frequent Term-Based Text Clustering, Social Network Analysis) however, do not know ...

architecture nosql mongodb big-data hadoop

asked Mar 24 '16 at 1:12

J Doe

43

2

votes

0answers

417 views

Best practices for dashboard of near real-time analytics

I’m currently building a dashboard to view some analytics about the data generated by my company's product. We use MySQL as our database. The SQL queries to generate the analytics from the raw live ...

sql big-data hadoop elasticsearch analytics

asked Feb 4 '16 at 5:44

Julien

1111

1

vote

0answers

80 views

Improve communication between controller and trackers in a Twitter fetcher tool using RabbitMQ or Apache Flume

I've been working for a time with some researches developing a tool to fetch tweets from Twitter and process them in some way. The first prototype "worked" but became a pain as we used sockets to ...

design python message-queue hadoop

asked Jan 19 '15 at 17:27

David Moreno García

1113

0

votes

1answer

313 views

Is hadoop designed only for “simple” data processing jobs, where communications between the distributed nodes are sparse?

I am not a professional coder, but rather an engineer/mathematician that uses computer to solve numerical problems. So far most of my problems are math-related, such as solving large scale linear ...

math big-data hadoop

asked Jun 25 '14 at 9:19

user138668

31

3

votes

1answer

830 views

Hadoop and Object Reuse, Why?

In Hadoop, objects passed to reducers are reused. This is extremely surprising and hard to track down if you're not expecting it. Furthermore, the original tracker for this "feature" doesn't offer any ...

java performance hadoop

asked Feb 11 '14 at 15:37

Andrew White

22625

2

votes

1answer

7k views

How best to implement a Dashboard from data in HDFS/Hadoop [closed]

We have a bunch of data (several TB) in Hadoop HDFS and it's growing. We want to create a dashboard that reports on the contents in there e.g counts of different types of objects, trends over time etc....

hadoop

asked Oct 10 '13 at 13:24

kellyfj

11618

3

votes

2answers

1k views

Text search - big data problem

I have a problem I was hoping I could get some advice on! I have a LOT of text as input (about 20GB worth, not MASSIVE but big enough). This is just free text, unstructured. I have a 'category list'...

algorithms hadoop lucene

asked Jul 5 '13 at 16:54

Duncan

1164

5

votes

2answers

9k views

Optimal way to store 18 billion key, value pairs [closed]

I have around 200 million new objects coming in, and a 90 day retention policy, so that leaves me with 18 billion records to be stored in the form of key-value pairs. Key and value both will be a ...

java data-structures hadoop

asked Jun 5 '13 at 18:12

Chaos

13717

2

votes

1answer

1k views

How best to merge/sort/page through tons of JSON arrays?

Here's the scenario: Say you have millions of JSON documents stored as text files. Each JSON document is an array of "activity" objects, each of which contain a "created_datetime" attribute. What is ...

performance data nosql json hadoop

asked Feb 5 '13 at 18:55

Infin8Loop

5961515

1

vote

1answer

333 views

Is it smart to design a command and control server, that will monitor system resources and spin up/spin down servers at times of peak?

I am building an application that will be modular, in a way that it will be a set of separate systems communicating with each other. It uses Hadoop on all systems, and HBase on 3 of the 4. Scaling ...

cloud-computing hadoop

asked Aug 10 '12 at 19:21

user60812

2061211

4

votes

3answers

2k views

Why do HDFS clusters have only a single NameNode?

I'm trying to understand better how Hadoop works, and I'm reading The NameNode is a Single Point of Failure for the HDFS Cluster. HDFS is not currently a High Availability system. When the NameNode ...

hadoop

asked Apr 4 '12 at 3:07

grautur

1213

4

votes

2answers

976 views

Asynchronous Java

I'm wondering if I wanted to implement a web service based on java that does web analytics, what sort of architecture should I use. The actualy processing of the Big Data would be done by Hadoop. ...

java architecture hadoop

asked Feb 23 '12 at 2:53

Jason Madux

433

3

votes

3answers

33k views

Is cloudera hadoop certification worth the investment [duplicate]

I am considering investing time to learn Hadoop and it's related technologies. The problem is that my current day job will not be using Hadoop any time soon and even if I learn from books, blogs ...

hadoop

asked Dec 23 '11 at 20:33

geoaxis

167116

1

vote

1answer

122 views

How do you control nodes in a server farm?

I've been reading about hadoop and multi-node setups, and it says in the documentation that you must have a JVM and hadoop software already running on those nodes. My question is, do people install ...

distributed-computing cloud-computing hadoop

asked Jul 4 '11 at 15:53

Mahmoud Hossam

3,01312340

5

votes

2answers

1k views

Can map-reduce say “Hello World”?

Gathering that map-reduce is being used to process huge amounts of data, I set out to understand it. My queries were: What class of problems does it aim to solve? How does it help breaking down of ...

web-development database hadoop

asked Jul 4 '11 at 14:55

Amol

937

4

votes

2answers

1k views

how to convince other we should move to hadoop?

Everything I've read about Hadoop seems like exactly the technology we need to make our enterprise more scalable. We have terabytes of raw data that is in non-relational form (text files of some kind)....

hadoop

asked Mar 15 '11 at 4:38

Ramy

13717

Tagged Questions

Non HBase solution for huge data that has update and delete in sequential manner

Is this Big Data architecture good enough to handle many requests per second?

Can someone explain the technicalities of MapReduce in layman's terms?

#Apache-flink: Stream processing or Batch processing using Flink

SRP in the “big data” setting

How to use Hadoop HBase with Spring Boot without knowing the schema of the database ahead of time

Should I use NoSQL or HDFS for storage?

Best practices for dashboard of near real-time analytics

Improve communication between controller and trackers in a Twitter fetcher tool using RabbitMQ or Apache Flume

Is hadoop designed only for “simple” data processing jobs, where communications between the distributed nodes are sparse?

Hadoop and Object Reuse, Why?

How best to implement a Dashboard from data in HDFS/Hadoop [closed]

Text search - big data problem

Optimal way to store 18 billion key, value pairs [closed]

How best to merge/sort/page through tons of JSON arrays?

Is it smart to design a command and control server, that will monitor system resources and spin up/spin down servers at times of peak?

Why do HDFS clusters have only a single NameNode?

Asynchronous Java

Is cloudera hadoop certification worth the investment [duplicate]

How do you control nodes in a server farm?

Can map-reduce say “Hello World”?

how to convince other we should move to hadoop?

Hot Network Questions

Tagged Questions

Related Tags