Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

Currently, I'm working with a group on building a model. This model simulates interactions between many "agents" in a region. Agents can be any entity such as a city, a farmer, a business etc. Each agent is represented by its own object within java. Each agent will need to be able to interact with all of the other agents, and therefore, needs to have access to information about the other agents.

So, for example, lets say we have 1 city object interacting with 99 farmer objects. The city needs to know for instance, the total land that each of the 99 farmers owns in order to perform some sort of calculations.

So, my question is, what is the most performance efficient way of completing something like this? There may even be 500-1000 or more agents in the future, and each agent may need to access many variables from the other agents. Would you potentially set up a table within mysql for each agent and have them access eachother's information that way? Or can something be set up directly within java that allows the agents to access each other's information.

I'm looking to possibly multithread this model in the future too. In any case, I want the most performance efficient way of completing this.

share|improve this question

closed as too broad by Doval, MichaelT, GlenH7, MainMa, gnat Jul 27 '14 at 19:26

There are either too many possible answers, or good answers would be too long for this format. Please add details to narrow the answer set or to isolate an issue that can be answered in a few paragraphs. If this question can be reworded to fit the rules in the help center, please edit the question.

    
Sorry, this question is way too broad. –  Doval Jul 23 '14 at 17:17
1  
The most performant way depends on the exact specifications of your problem. Do you really have cities and farmers or something else? It may be better to ask for different approaches and then compare them yourself in terms of performance. Nevertheless this is a well formulated and understandable question. –  Trilarion Jul 23 '14 at 17:30
    
@Doval It's because OP places the main concern on architecture (methodology and scalability), whereas the more important concern should have been the details of the operations (queries) needed to be performed by the simulations. –  rwong Jul 24 '14 at 12:52

3 Answers 3

You have two options:

  1. Every agent knows about every other agent, and queries those agents as needed.
  2. Every agent publishes its state changes, and other agents listen to the changes they care about.

The second approach has several benefits over the first, most important being that changes only happen when necessary. For example, if only 2 farmers trade land, the city doesn't need to query the other 97 farmers.

There are many ways to go about implementing this. I would not put a database in the mix, but would instead use an in-memory event bus with publishes and subscribers. There are ready-made systems to do this, or you could roll your own.

share|improve this answer
    
So, by in-memory event bus, do you mean something like an observer type pattern? And I had a feeling that a database would not be the best way, as performance would be much slower versus something built directly in java. –  user2864336 Jul 23 '14 at 17:44
    
@user2864336 - I think of the Observer pattern as being 1:M between a source of events and its observers. An event bus, by comparison is a single object where multiple publishers can publish events and multiple subscribers can listen for them (M:M). This could be a message queue, a "production-grade" container like Mule, or a simple object that accepts events and publishes them (in which case you could call it an Observable for multiple Observers -- and use Java's Observable/Observer implementation to implement). –  kdgregory Jul 23 '14 at 18:14
    
Ah, I like the sound of that, using a message queue of some sort. I'll probably implement my own to make it more customizable. I'll read up on this more online since there seem to be many articles written on it. Thanks for the suggestions and help! –  user2864336 Jul 23 '14 at 18:33

Get the data design done first. Understanding what data you need for each entity, and how the entities relate should simplify your design.

You may find you have many entities which can be modeled as sub-types. Different sub-types may have different behavior, but should contain the same state (data). Understanding the common and specific behaviors of similar agents will help you design your class hierarchy.

You will likely want a database for persistence, but keep the behavior in the application. I would expect most of the data to be at rest (persisted to the database) most of the time. Databases can be quite efficient at answering questions like how many acres do the farmers in this city have.

If by agent you mean objects, then you are looking at a rather small set of data. You may be able to keep the data in an in-memory database. Even if you don't the data may well fit in the database and/or operating system memory.

EDIT: The container should be pooling connections to the database so you shouldn't have any agents connecting to the mysql. They should be asking model object (database proxies) for data, and those may run a database query. Well designed databases are very efficient at gathering data about related entities (agents).

Unless you are running simulations of long term data, it is unlikely the changes will happen often enough to make an observer/mediator type structure work very well. This pattern more appropriate when data is changing several times a minute, not several times a year. Even with hundreds of farmers, I would expect changes to be minutes between during harvest season, and days between the rest of the year.

If you are doing everything in memory you risk loosing all the data whenever the application goes down. Over a period of years, this will be a certainty. Being able to update and restart the application would appear to be essential to your design evolution.

share|improve this answer
    
Each agent is represented by its own object. In the future, we will have different types of farmers which could be represented as sub-classes of the main farmer class. I'm using mysql currently to output summary info for each year about each farmer such as profit, corn harvested, etc. But if the city is querying information about 100s of farmers many times each year, do you think it'd make sense to use a structure like the user above mentioned? Something like an observer/mediator type structure? I'm afraid that the city constantly connecting to mysql to access farmer information will be slow. –  user2864336 Jul 24 '14 at 17:12
    
@user2864336 I've addressed your issues in an edit. –  BillThor Jul 25 '14 at 0:07
1  
I was actually thinking of using connection pooling. But, will this work during an actual "simulation". By simulation, I mean something similar to a climate simulation...or a weather forecasting simulation. We will be simulating a 20-30 year period of agent interactions that will take the computer a few hours (in actual time) to run. This is for research purposes. So in "actual" time, the agents will be accessing each others data possibly several times a minute, but in "simulation" time, only several times a year. ..hence why I question using mysql over the observer/mediator structure. –  user2864336 Jul 25 '14 at 2:22

As pointed out in the other answers, it depends on your unique circumstances (needed to run the simulations) whether to use a real database, or a real geographic information system (GIS).


Regardless of you choice of tooling, one thing is certain: your simulations will involve application logic that is superficially similar to what is done by databases and GIS systems. Namely:

  • Queries over a large number of objects -
    some of which may be outside the object hierarchy.
  • Calculations e.g. sums, averages, minimums and maximums, or other statistical / mathematical / geometrical operations over objects within a certain spatial extent.

Recognize that you will need to learn new concepts no matter what.

  • Using existing tools (databases and GIS systems) does not save you from having to learn about relational and spatial queries.
  • Writing your own code does not save you from having to learn about relational and spatial queries (*).

(*) repeated for emphasis.


However, by using existing tools,

  • You could save time from implementing algorithms for doing that.
  • In exchange, you will spend more time learning how those tools work, and formulating your simulation requirements into queries with these tools.

On the other hand, the concern expressed in your question is too far from your immediate needs. Namely:

  • Should I favor a centralized (database) approach, or a decentralized (peer-to-peer, or object-oriented) approach when modeling these operations?
  • What would be the performance from each approach?
    • (See below.)
  • How do I partition the system for scalability?

You can always revisit architectural decisions next month; however you need to figure out how to meet the needs of your simulation application this week.


What would be the performance from each approach?

Practically speaking, you wouldn't know until you implement both approaches, optimize them, and then compare their performance.

Theoretically speaking, we need to know exactly what types of algorithms (queries) are performed. Such as:

  • Queries can be solved by "keeping a tally (sum) up to date"
  • Queries can be solved by a look-up of a handful of items
  • Queries that involves 2D spatial search
  • Queries that involves evaluating a mathematical equation for every pair of objects. (This is going to be slow.)

If you only give "examples", the usefulness of the advice will depend on how good your examples reflect your application needs.

share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.