Take the 2-minute tour ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free, no registration required.

We have a python application which does a DFS search against data in a database. Right now, that data is small so I can hold everything in a python container but the data is going to grow and there might come a time when it is impossible to hold everything in RAM. On the other hand, querying data from the database during every iteration is not a smart solution, is it?

But the question is more general. The solution is most probably in some smart caching but I don't have any experience in that.

What are the techniques of dealing with large data when you need to frequently (say you are running an algorithm) access it?

share|improve this question
1  
One of the techniques improving performance of database operations (RDBMS or NoSQL) is running most of the algorithm server-side in the form of stored procedure or database-specific batch script‌​. Usually there's also quite many engine-specific optimization techniques available like indexes, data clustering etc. Your question is too vague and abstract for any practically useful recommendations –  xmojmr Mar 7 at 12:39
    
@xmojmr I am implying by server-side in this case you mean running an sql script instead of doing it in python or any other programming language but that is nearly impossible in my case(as I have some problem specific optimizations, which are not sensible with sql) and also, the performance wasn't quite good(I tried). –  khajvah Mar 7 at 14:13
    
Yes, by server-side I meant run the data processing logic as close as possible to the physical data-storage eliminating network I/O, that is inside the database server using the server's scripting language. For this to be efficient the problem needs to be translated into the relational algebra using sets and set operators (that's where SQL server's are good at). You may need to generate a few problem specific stored procedures and drive them by your Python code. Server-side performance must be better than client-side performance. Try again –  xmojmr Mar 7 at 15:06
    
@xmojmr Oh, by performance difference I meant when I keep the data in RAM and running python as opposed to running an sql script. Thanks for suggestion, I will consider your solution. –  khajvah Mar 7 at 17:41
1  
@xmojmr The reasons I didn't put more information about specific database I use and the specific problem I have are 1. The current technology we use might change if we find something better and 2. I wanted to make this question more general answering a question: "How algorithms deal with data which is impossible to keep in the RAM?". –  khajvah Mar 7 at 17:57

1 Answer 1

There is an entire field called External Memory Algotihms, there are lots of textbooks about it.

An example of an external memory algorithm is External Sorting.

The relational database management system is not always the best solution in such cases. Some NoSQL (Not Only Structured Query Language) database management systems have been designed to perform better in some cases.

About In Memory Depth First Search there are different algorithms and implementations.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.