Why do most log files use plain text rather than a binary format?

Question

Logging is something that is necessary but is (relatively) rarely used. As such it can be made much more compact in terms of storage.

For example the data most commonly logged like ip, date, time and other data that can be represented as an integer is being stored as text.

If logging was stored as binary data, a lot of space could be preserved thus requiring less rotation and increasing disk lifespan, especially with SSDs where writes are limited.

Some may say that it is such a minor issue that it does not really matter, but taking in consideration the effort needed to build such mechanism it makes no sense not to. Anyone can make this for like two days in his spare time, why don't people do this?

I would challenge your assertion that people don't do this. Many do. Some don't, sure, but plenty do. — Servy, 7 hours ago
@Servy I apologize for my ignorance, I am thinking of web-servers and access logs in particular, probably should mention that in the question. — php_nub_qq, 7 hours ago
"If your machine runs so close to it's limits that such issues really would matter, you most likely have more serious problems..." — gnat, 7 hours ago
> If logging was stored as binary data, a lot of space could be preserved Well, old logs are typically compressed. — leonbloy, 5 hours ago
Reading a text log on a machine that's halfway broken might be a huge advantage over needing a binary to analyze it. — tofro, 4 hours ago

Alex · Answer 1 · 2016-10-04 15:26:25Z

up vote 42 down vote

systemd famously stores its log files in binary format. The main issues I have heard with it are:

if the log gets corrupted it's hard to recover as it needs specialist tooling
you can't use standard tools such as vi, grep etc to analyse them

The main reason for using a binary format (to my knowledge) was that it was deemed easier for creating indices etc i.e. to treat it more like a database file.

I would argue that the disk space advantage is relatively small (and diminishing) in practice. If you want to store large amounts of logging then zipping rolled logs is really quite efficient.

On balance, the advantages of tooling and familiarity probably would err on the side of text logging in most cases.

answered 7 hours ago

Alex

71137

1

Good point. I was immediately thinking of systemd too. The even more important part here is that your application doesn't have to know how the log data is stored. It can be provided as a system service. – 5gon12eder 7 hours ago

+1 but the answer could be more universal. Logs are not exclusive to lunix systems. – Tulains Córdova 7 hours ago

@ Tulains Córdova. Indeed. But then again I use vi and grep on my Sun, OSX and Windows boxes too. If it works... – Alex 7 hours ago

You don't even need to zip rolled logs to save space if the filesystem where you store your logs supports encryption (either at the file level, or the whole disk) – alroc 4 hours ago

4

"famously", more like "infamously" – whatsisname 4 hours ago

| show 1 more comment

Robert Harvey · Answer 2 · 2016-10-04 21:36:03Z

up vote 18 down vote

There are a lot of debatable presumptions here.

Logging has been an integral part of (almost) every job I've had. It is essential if you want any sort of visibility on the health of your applications. I doubt that it is a "fringe" use; most organizations I've been involved with consider logs very important.

Storing logs as binary means you must decode them before you can read them. Text logs have the virtue of simplicity and ease of use. If you're contemplating the binary route, you might as well store logs in a database instead, where you can interrogate them and statistically analyze them.

SSD's are more reliable than HDD's nowadays, and the arguments against lots of writes are largely moot. If you're really worried about it, store your logs on an ordinary HDD.

edited 1 hour ago

answered 7 hours ago

Robert Harvey

126k30278461

1

"you might as well store logs in a database, where you can interrogate them and statistically analyze them." At a previous job, we had a custom tool that imports our (text-based) logs into a database for exactly this purpose. – Mason Wheeler 7 hours ago

3

I thin what OP meant by _"SSD where writes are limited" is the fact that in SSD have a limited write/erase cycles and writing too much on a sector diminished the service life of the device. She didn't mean that writes are lost. – Tulains Córdova 7 hours ago

3

@TulainsCórdova: Yes, I knew what she meant. – Robert Harvey 7 hours ago

@Robert Harvey, as a former DBA, I have to tell you that text stored in normal VARCHAR and related fields is not compressed. It lives within a block with binary attributes. But if you hexdumped it, you could read the text. – DocSalvager 1 hour ago

@DocSalvager: I didn't assert otherwise. – Robert Harvey 1 hour ago

| show 2 more comments

SusanW · Answer 3 · 2016-10-04 19:55:13Z

Log files are a critical part of any serious application: if the logging in the app is any good, then they let you see which key events have happened and when; what errors have occurred; and general application health that goes beyond whatever monitoring has been designed in. It's common to hear about a problem, check the application's built-in diagnostics (pop open its web console or use a diagnostic tool like JMX), and then resort to checking the log files.

If you use a non-text format, then you are immediately faced with a hurdle: how do you read the binary logs? With the log-reading tool, which isn't on your production servers! Or it is, but oh dear, we've added a new field and this is the old reader. Didn't we test this? Yes, but nobody deployed it here. Meanwhile, your screen is starting to light up with users pinging you.

Or perhaps this isn't your app, but you are doing support and you think you know it's this other system, and WTF? the logs are in a binary format? Ok, start reading wiki pages, and where do you start? Now I've copied them across to my local machine, but - they're corrupted? Have I done some kind of non-binary transfer? Or is the log-reading tool messed up?

In short, text-reading tools are cross-platform and ubiquitous, and logs are often long-lived and sometimes need to be read in a hurry. If you invent a binary format, then you are cut off from a whole world of well-understood and easy-to-use tools. Serious loss of functionality just when you need it.

Most logging environments strike a compromise: keep the current logs readable and present, and compress the older ones. That means you get the benefit of the compression - more so, in fact, because a binary format wouldn't shrink the log messages. At the same time, you can use less and grep and so on.

So, what possible benefits might arise from using binary? A small amount of space efficiency - increasingly unimportant. Fewer (or smaller) writes? Well, maybe - actually, the number of writes will relate to the number of disk-commits, so if log-lines are significantly smaller than the disk blocksize, then an SSD would be assigning new blocks over and over anyway. So, binary is an appropriate choice if:

you are writing huge amounts of structured data
the logs have to be created particularly quickly
you are unlikely to need to analyze them under "support conditions"

but this is sounding less like application logging; these are output files or activity records. Putting them in a file is probably only one step away from writing them to a database.

EDIT

I think there's a general confusion here between "program logs" (as per logging frameworks) vs "records" (as in access logs, login records etc). I suspect the question relates most closely to the latter, and in that case the issue is far less well-defined. It's perfectly acceptable for a message-record or activity log to be in a compact format, especially as it's likely to be well-defined and used for analysis rather than troubleshooting. Tools that do this include tcpdump and the Unix system monitor sar. Program logs on the other hand tend to be much more ad hoc.

ChrisW · Answer 4 · 2016-10-04 19:59:19Z

Why do most log files use plain text rather than a binary format?

Search for the word "text" in the Unix philosophy Wikipedia article, for example you'll find statements like:

McIlroy, then head of the Bell Labs CSRC (Computing Sciences Research Center), and inventor of the Unix pipe,[9] summarized the Unix philosophy as follows:[10]

This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.

Or for example, from Basics of the Unix Philosophy,

Rule of Composition: Design programs to be connected with other programs.

It's hard to avoid programming overcomplicated monoliths if none of your programs can talk to each other.

Unix tradition strongly encourages writing programs that read and write simple, textual, stream-oriented, device-independent formats. Under classic Unix, as many programs as possible are written as simple filters, which take a simple text stream on input and process it into another simple text stream on output.

Despite popular mythology, this practice is favored not because Unix programmers hate graphical user interfaces. It's because if you don't write programs that accept and emit simple text streams, it's much more difficult to hook the programs together.

Text streams are to Unix tools as messages are to objects in an object-oriented setting. The simplicity of the text-stream interface enforces the encapsulation of the tools. More elaborate forms of inter-process communication, such as remote procedure calls, show a tendency to involve programs with each others' internals too much.

Anyone can make this for like two days in his spare time, why don't people do this?

Storing the log file in binary is only the beginning (and trivial). You'd then need to write tools to:

Display the whole log file (edit)
Display the end of the log, without reading the beginning of it (tail -f)
Search for stuff in the file (grep)
Filter to only display selected/interesting stuff (using an arbitrarily complicated filter expression)
Email the log to someone else who doesn't have your log-file-decoder-software
Copy-and-paste a fragment of the log file
Read the log file while the program (which creates the log file) is still being developed and debugged
Read log files from old versions of the software (which are deployed on customer sites and running).

Obviously software can and does use binary file formats too (e.g. for relational databases) but it's not worthwhile (in a YAGNI sense), usually not worth doing, for log files.

Don't forget documentation! I wrote a binary message recorder for a system a few years ago, which logged incoming requests for regression/replay. Now, the only way to understand these awful files is to look at the code that read/writes them, and yet other teams use them and ask questions about them. Horrible things. — SusanW, 32 mins ago

Thomas Matthews · Answer 5 · 2016-10-04 16:12:15Z

Log files are in text format because they can be easily read using any type of text editor or by displaying the contents via console command.

However, some log files are in binary format if there is a lot of data. For example, the product I am working on stores a maximum of 15000 records. In order to store the records in the least amount of room, they are stored in binary. However, a special application must be written to view the records or convert them to a format that can be used (e.g. spreadsheets).

In summary, not all log files are in textual format. Textual format has an advantage that custom tools are not needed to view the content. Where there is a lot of data, the file may be in binary format. The binary format will need a (custom) application to read the data and display in a human readable format. More data can be packed into a binary format. Whether to use textual format or binary format is a decision based on the amount of data and ease of viewing the contents.

JRobert · Answer 6 · 2016-10-04 17:59:43Z

In embedded systems where I might not have an output channel available during run-time, the application can't afford the speed hit imposed by the logging, or logging would alter or mask the effect I'm trying to record, I've often resorted to stuffing binary data into an array or a ring buffer, and either printf()ing it at the end of the test run or dumping it raw and writing an interpreter to print it as readable. Either way, I want to end up with readable data.

In systems with more resources, why invent schemes to optimize what doesn't need optimizing?

Cort Ammon · Answer 7 · 2016-10-04 18:54:42Z

The two main questions you would want to ask before choosing between text and binary are:

Who is my audience?
What content do I need to convey?

A common opinion is that the audience of a log message is a human being. This is obviously not a perfect assumption, because there are plenty of log crawling scripts out there, but it is a common one. In this case, it makes sense to convey the information in a medium which humans are comfortable with. Text has a long standing tradition of being this medium.

As for content, consider that a binary log must have a well defined format. The format must be well defined enough for other people to write software which operates on those logs. Some logs are quite well structured (your question lists several). Other logs need the ability to convey content in a less-well-defined natural language form. Such natural language cases are a poor match for binary formats.

For the logs which could be well described in binary, you have to make a choice. Because text works for everyone, it is often seen as the default choice. If you log your results in text, people can work with your logs. It's been proven thousands of times. Binary files are trickier. As a result, it may be that developers output text simply because everyone knows what that's going to behave like.

Darthfett · Answer 8 · 2016-10-04 20:09:51Z

Log files are intended to aid debugging of issues. Typically, hard drive space is much cheaper than engineering time. Log files use text because there are many tools for working with text (such as tail -f). Even HTTP uses plain-text (see also why don't we send binary around instead of text on http).

Additionally, it's cheaper to develop a plain-text logging system and verify that it works, easier to debug if it goes wrong, and easier to recover any useful information in case the system fails and corrupts part of the log.

Art Swri · Answer 9 · 2016-10-04 18:11:29Z

We count on unit testing for attaining and maintaining the robustness of our software. (Most of our code runs in a server, headless; post-operation analysis of log files is a key strategy.). Nearly every class in our implementation does some logging. An important part of our unit testing is the use of 'mock' loggers that are used when unit testing. A unit test creates a mock logger and provides it to the item being tested. It then (when useful/appropriate) analyses what got logged (especially errors and warnings). Using a text-based log format makes this much easier for much the same reasons that analyses performed on 'real' logs: there are more tools at your disposal that are quick to use and adapt.

asked	today
viewed	2146 times
active	today

current community

your communities

more stack exchange communities

Why do most log files use plain text rather than a binary format?

9 Answers 9

Why do most log files use plain text rather than a binary format?

Anyone can make this for like two days in his spare time, why don't people do this?

Your Answer

Not the answer you're looking for? Browse other questions tagged logging storage or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

Why do most log files use plain text rather than a binary format?

9 Answers 9

Why do most log files use plain text rather than a binary format?

Anyone can make this for like two days in his spare time, why don't people do this?

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged logging storage or ask your own question.

Linked

Related

Hot Network Questions