Logs and traces are an essential part in problem solving and performance measuring in software projects. However, while it may seem easy or even unimportant at first, it can be a difficult task to have the program generate the most important logging information.
Not all the logging messages have the same purpose or have the same audience.
A trace is a message whose target audience are developers. They help in solving bugs by allowing a developer to reconstruct the execution flow after a failure. Traces usually reveal information about the internal structure of the software. For instance, a tracing library may allow recording the thread that is running when the trace is generated. The trace may also include information about which component or module generates it.
A log is a message whose target audience may be the customer, the end user or an administrator. Logs register important information regarding events that have occurred in the system. User login and logout, establishing a new connection with a server or a user request are examples of events that deserve logging.
Information and Format
Finding the right message for a log or a trace is an iterative process, pretty much like refactoring the code. When starting the development of an application, it’s hard to know which traces and logs will be relevant. It is likely that the developer includes some information that is irrelevant, while missing some information that is actually essential in order to reconstruct the execution scenario.
A trace should inform about changes in the system state that will affect the output of future procedures or calculations. For instance, a user login or a failed connection attempt. Requests and responses are obvious events to trace, but also can be lower level events like connection establishments at the socket level. Sometimes, debugging requires to include more verbose traces with intermediate calculations.
A log should describe the performance of the application use cases. Log messages will be read by people other than the developers that built the system, so including information regarding the internal structure is pointless.
The message format is also fundamental for processing traces and logs. Having a set of formatting rules allows automated processing (typically filtering) of recorded events. In the case of logs, formatting allows a collecting application (for example logstash) to process it properly. In the case of traces, a developer may use different tools (like grep or sed) to process the messages and extract the most relevant information. Single line traces will be simpler to read and process.
I used to implement libraries that allowed sending traces to different sinks, namely the console, a socket or a file. However, over time, the standard output has become my favourite device of choice. Sending traces and logs to the standard output is the simplest and most flexible way of generating messages.
The standard output can be redirected to a socket or a file using other tools (for instance tee or netcat). Basically all languages (that I know of) have built-in facilities to send messages to the standard output so, usually, there’s no point in implementing unnecessarily complex tools for sending traces here or there.
As always, the standard output is no silver bullet. There may be situations where sending to standard output might not be the best solution (it may even be impossible). For example, when talking about an embedded system where the standard output is mapped to a slow serial device, log transmission may impair the performance of the application.
One important thing to note is that the tracing and logging facilities are a shared resource. If a developer decides to trace every single step that a module takes, it will pollute the trace buffer. This will affect the ability of other developers to read and process the traces. Traces must be generated keeping in mind that others will read them, and therefore, we should not include irrelevant or redundant information, nor information that might be meaningless to others.
The application architecture conditions the choice of the logging mechanism. In a distributed application, it is desirable to centralize logging information into a server. In an embedded system generating dozens of traces per millisecond, it may be impossible to record all the traces, so a circular buffer in RAM may be required.
Logging execution information is essential to any software application. Getting it right requires some effort comparable to software refactoring. The effort pays off in the long run, with faster problem solving thanks to improved diagnostics.