Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

I'm implementing a system that's basically a pipeline of XML documents: XML documents are retrieved over the Internet, validated, further processed etc. until they are ingested in a relational (non-XML) database. After the ingestion in the database they can be discarded.

Since the various components of the pipeline are somewhat independent from each other I want to use a number of separate applications, each performing a "step" in the pipeline. What should be the reasoning behind choosing the file-system for data sharing between the above applications versus some noSQL database?

The data to be shared is mostly XML files and total volume of data that goes through the pipeline maybe 10 gigabytes per day.

share|improve this question

1 Answer 1

up vote 3 down vote accepted

I'd recommend some kind of message queue system. Each XML document enters the queue and is processed asynchronously by a consumer. The consumer can then save the data into the database or publish the document to the next consumer's queue, depending on what rules you have set up in your system.

Consumers may work on a single document at a time or you might even have multiple consumers processing documents in parallel at each step.

There are a number of Java libraries for implementing message queues:

share|improve this answer
2  
In MQ systems there's often a concept of a "topic" vs a "queue" where a topic has multiple subscribers and for any message sent they all receive a copy, for a queue each message sent to it only has one recipient. Just worthwhile to to understand these two concepts as most modern MQ systems provide both queues for synchronous messaging, and topics for broadcasting to multiple processes concurrently –  Jimmy Hoffa Nov 15 '13 at 23:25

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.