Sign up ×
Programmers Stack Exchange is a question and answer site for professional programmers interested in conceptual questions about software development. It's 100% free.

This might be a little dumb question but why do we save text in the db instead of something smaller?

Couldn't there be some other way to store data in the db like in a compressed form, and then have "our computer", which wants the data, uncompress the data from the db? What I mean is to put the load on the computers and not on the db and bandwidth.

Like let's say we store compressed form in the db, and have all the computing done by the user's computer. So if we would do a select statement it would compresse it into something only the db understands, and get back the compressed data which our computer would uncompress and show in the same way we get it now? or is text better?

Wouldn't that be better considering there's usually a lot of load on the db and that it usually can become quite large? but if the data would be compressed already it would take up less space at least

share|improve this question

closed as unclear what you're asking by Ixrec, MetaFight, JacquesB, Snowman, Robert Harvey Aug 31 at 15:50

Please clarify your specific problem or add additional details to highlight exactly what you need. As it's currently written, it’s hard to tell exactly what you're asking. See the How to Ask page for help clarifying this question.If this question can be reworded to fit the rules in the help center, please edit the question.

2  
What is a "serialized form" if not "text"? If you mean a human-unreadable "binary" serialized form, that's not necessarily any more efficient than "text". Could you give any examples of the kind of data where you'd expect this to be more efficient? Also, good databases will probably use compression algorithms too, which means simple "inefficiencies" like using readable string values for enums should get trivially compressed away. –  Ixrec Aug 31 at 14:01
8  
I don't think you understand what "serialized" means... –  Michael Borgwardt Aug 31 at 14:04
    
Most/many of the 'text' data we store is too small for efficient compression (typical set of name, street, city, zip...). Also we want fast searches, so the db has to index it and for that needs the text anyway (and otherwise we would have to send around whole tables so your local computer can find a single row? not efficient at all). Also for some use case NoSQL/document store databases (and others like Postgres) allow to store data in serialized form, though they often still index the single elements). Depending on your exact use case you could find some db that's optimized for large text. –  thorsten müller Aug 31 at 14:07
3  
I'm voting to close as "unclear" until the OP clarifies what distinction he's trying to make between "text" data and "serialized" data and why he thinks it's a meaningful one. –  Ixrec Aug 31 at 14:17
1  
@MichaelBlackburn The current version is still puzzling in a few places (e.g. it sounds like it's mixing up the db compressing the data when it saves to disk with the client compressing data when building its SQL queries) so I'm on the fence about it, but I've cast a reopen vote for now so we'll see what the high-rep users who aren't at work right now think of the edits. –  Ixrec Sep 3 at 15:21

3 Answers 3

You are mixing serialize with compress. Can use XML serialzation to store a form or class.

You can compress text and store it in a binary. You might get all of 7:1 compression. For that compression you lose the ability to search the text which is the primary purpose of a database.

share|improve this answer

You are suffering from a common problem among engineers: that of over-optimization in one frame. The two classical constraints of computation are time & space. They are generally opposed; you cannot conserve one without "spending" another. The Y2K bug was in fact an example of this. Space constraints made programmers "save" two digits in the year, appending "19" in front of the date year is a computation (aka time). Likewise, you are suggesting we save space by compressing the file, consuming computation time to save space on disk.

Sometimes this is valid. In fact, a significant quantity of TCP/IP is already transparently gzip-compressed in transit. The time spent compressing & decompressing it is negligible compared to the network resources required to transfer an uncompressed HTML document.

In your case, it isn't, for the following reasons: Data storage is cheap compared to Computation; Think of your computer today vs. your first computer. My first computer (that I bought for myself) was a 120Mhz processor, with 16MB of RAM and a 1.2 Gb drive. My current computer (somewhat aged) has a 3.6Ghz CPU, 32GB of RAM and about 16TB of storage. The CPU is 30x faster, but the RAM is 2000x greater (not to mention waaaay faster) and the storage is 12500x greater. Rainbow "decryption" attacks are an example of leveraging space to compensate for our computational defects. We've gotten much more room than we have time.

Second, and more practical: If you compress the text in situ, you lose the ability to search it. Depending on the compression algorithm and surrounding data, the string "WKRP in Cincinnati" might be compressed to a different set of characters, and the similar string "WKRP/Cincinnati" would almost certainly be different in any cypher. In order to search your compressed database, the user would need to decompress (or god forbid, download) the entire thing.

share|improve this answer
    
What if the software used to this new kind of database is interpreting everything for you so it would be exactly the same as a normal database except text isn't stored but something that takes much less space. What I mean is that the Sqlservermanagementstudio would already be interpreting everything before you access it and it would appear as text to you, but in reality it's "something else"... –  Joe Sep 7 at 8:27
    
Well, it wouldn't be very useful software if it presented the values to you unencyphered. In fact, everything in the computer system is stored as 1s and 0s, so it's already doing a great deal of this. –  Michael Blackburn Sep 8 at 14:37
    
In point of fact, it's not even 1s and 0s, but different levels of electromagnetic charge, which when read either exceed a certain threshold or do not. In theory, the database could export the exact levels of charge stored, and it would be "an exercise for the reader" to interpret those levels as being above or below the threshold, marking those above as "1" and those below as "0" and then calculating "01000001" as "65," then looking that up in an ASCII table to mean "A." At what point does the "realness" of the data become useful or important to the consumer? –  Michael Blackburn Sep 8 at 14:39
    
Again, you're looking to save space at the expense of time, which is backward to the way Moore's law has progressed. Computing space has expanded significantly faster than computing time. –  Michael Blackburn Sep 8 at 14:41
    
Ok. Thanks for the input Michael! –  Joe Sep 11 at 10:25

Well for one it would make querying the db outside of your program quite a hassle. I don't think the slight gain is worth the time effort, and this includes support on the db later on.

If your goal is quick look ups you are also adding overhead to searching.

Sure if you have large blocks of text it might be more worth it but a better approach might be to then separate your databases or archive more regularly.

share|improve this answer

Not the answer you're looking for? Browse other questions tagged or ask your own question.