Learn how to design, implement, test and configure canonical logging across microservices using Python and Tornado web framework.
Nature is a meticulous logger, and its logs are beautiful. Calcium carbonate layers in a seashell are nature’s log of ocean temperature, water quality, and food supply. Annual rings in tree cambium are nature’s log of dry and rainy seasons and forest fires. Fossils in the layers in sedimentary rocks are nature’s log of the flora and fauna life that existed at the time.
In software projects, logs, like tests, are often afterthoughts. But at Slang Labs, we take inspiration from nature’s elegance and meticulousness.
We are building a platform for programmers to make interaction with their mobile and web apps more natural by adding Voice Augmented eXperiences (VAX). The platform is powered by a collection of microservices. Each log entry a microservice emits is a fossil record of a request. Logs are designed for timely and effective use in raising alerts and swift diagnosis of issues.
This blog post captures the essence of our logging tactics:
Programmers love to write code and build tools. But we don’t suffer Not Invented Here syndrome. No glory in reinventing the wheel.
We needed a structured logging format that is reasonably human-readable for manual examination. We found Logfmt, that encodes as a string of key-value pairs, offers a good balance.
While debugging issues, we realized that combing through logs is time-consuming. It is because logs for a request are scattered and interleaved with logs for other requests.
We liked the idea of canonical log lines: consolidating all log info in a single log entry for each request. For each incoming request, our microservices emit a single logline at the time of sending the response.
We need to diagnose the complete life of a request as it passes through many microservices.
We assign a request id at the beginning and pass it along to the other services. It is sent as an attribute in the HTTP header, or a query parameter, or in the body payload.
The request-id is used to string together all calls made to internal microservices.
Logs are useful for debugging. But we also use it for analytics, dashboards, monitoring, and alerts.
We desired the ability to integrate with multiple tools. We prefer using logger configuration to channel log entries to multiple stores. Glue code, if any, must be localized.
Logging implementation has the following pieces:
A context is maintained throughout the life of a request. At any stage, more key-value pairs can be added to the context. All these pairs are logged in a single entry at the time of sending the response.
We use aiotask-context package for managing context for each async request. Other alternatives are aiocontext, and tasklocals. There is also contextvars standard Python package if you are using Python 3.7+.
The complexity of the context management is encapsulated in easy to use utility functions to get and set key-value pairs in the context.
Python logfmt package does the heavy lifting of emitting the log. It takes a dictionary, and convert it to a string consisting of “key=value” pairs.
A log utility function encapsulates all this processing. It adheres to Python logger API. In case of an exception, it stores exc_info traceback in logs.
It is time to use the context and log format utility functions.
We use Tornado to develop microservices. It provides various hooks to shape logging and error reporting behavior.
It has a log_function setting to pass a function, that Tornado calls at the end of every request to log the result. We implement this method to emit a log entry for the request with all key-value pairs in the context.
Other hooks are:
These functions are overridden in the base request handler. A req_id is assigned to the incoming request, and stored in the log context.
This completes the wiring in Tornado request handlers for logging canonical log entry in logfmt format for each request for all successful responses, errors, and uncaught exceptions.
At all locations, information is stuffed in the key-value form in the task context. At the time of sending the response, Tornado calls log_function, which unpacks the context, and logs it in logfmt.
Tornado and asyncio work on the principle of cooperative multitasking. Multiple requests (with associated context) can exist at a given time. So the context has to be preserved across interleaving of these requests.
That’s why the aiotask_context requires a different task factory to be set for the asyncio event loop. It must be done while starting the HTTP server.
Python’s standard logging package documentation has an excellent HOWTO guide and Cookbook. These are rich sources of information, and doesn’t leave much to add. Following are some of the best practices:
Each log entry has to be relayed to multiple data-store destinations. This can be achieved by setting the log configuration for the Python logging package. It has to be a yaml file using the format defined by thelogging.config package.
In the example below, the config defines two log formats for two handlers (you may define as many formats, handlers, and loggers as you need):
Notice that this configuration not just defines a logger addrservice for this service, but also modifies the behavior of Tornado's general logger. There are several pre-defined handlers. Here the SteamHandler and RotatingFileHandler are being used to write to console and log files respectively.
The first thing the server does is reading this config and setting up the loggers.
In the previous section, we wired the logging to use logfmt format, maintain a context for each request, emit log only at the end, and multicast the logs to many destinations.
Logs are processed in real-time. Entries are grouped based on the presence and value of keys (from key-value pairs). We have a dashboard showing various kinds of errors in all services. When any threshold is breached, we get automated alters.
That automation deserves a separate blog post in its own right. Here we will examine only the log output. It looks like a row in a NoSQL data store. Let’s see it in action.
It will show the console log:
The console log will show brief log entries:
The log file will show logfmt-style one-line detailed canonical log entries:
Tests are configured to be quiet:
If you want to change the log message during tests, change log level from ERROR to INFO:
With that change, if you run the tests, you can examine the logs:
For effective logging in microservices: