Python Microservices, Part 3: Effective Canonical Logging across Services
Nature is a meticulous logger, and its logs are beautiful. Calcium carbonate layers in a seashell are nature’s log of ocean temperature, water quality, and food supply. Annual rings in tree cambium are nature’s log of dry and rainy seasons and forest fires. Fossils in the layers in sedimentary rocks are nature’s log of the flora and fauna life that existed at the time.
In software projects, logs, like tests, are often afterthoughts. But at Slang Labs, we take inspiration from nature’s elegance and meticulousness.
We are building a platform for programmers to make interaction with their mobile and web apps more natural by adding Voice Augmented eXperiences (VAX). The platform is powered by a collection of microservices. Each log entry a microservice emits is a fossil record of a request. Logs are designed for timely and effective use in raising alerts and swift diagnosis of issues.
This blog post captures the essence of our logging tactics:
- Requirements: what we needed from logging, and why.
- Implementation: how we implemented these requirements in Python and Tornado.
- Testing: how we examine logs manually, and if needed in the unit and integration tests.
We want to use off-the-shelf log tools.
Programmers love to write code and build tools. But we don’t suffer Not Invented Here syndrome. No glory in reinventing the wheel.
We needed a structured logging format that is reasonably human-readable for manual examination. We found Logfmt, that encodes as a string of key-value pairs, offers a good balance.
We want to diagnose an issue quickly.
While debugging issues, we realized that combing through logs is time-consuming. It is because logs for a request are scattered and interleaved with logs for other requests.
We liked the idea of canonical log lines: consolidating all log info in a single log entry for each request. For each incoming request, our microservices emit a single logline at the time of sending the response.
We want to debug across microservices.
We need to diagnose the complete life of a request as it passes through many microservices.
We assign a request id at the beginning and pass it along to the other services. It is sent as an attribute in the HTTP header, or a query parameter, or in the body payload.
The request-id is used to string together all calls made to internal microservices.
We want to multicast logs.
Logs are useful for debugging. But we also use it for analytics, dashboards, monitoring, and alerts.
We desired the ability to integrate with multiple tools. We prefer using logger configuration to channel log entries to multiple stores. Glue code, if any, must be localized.
Logging implementation has the following pieces:
- Log Context: utility functions to maintain context (key-value map) during the life of a request.
- Log Format: utility function that examines the accumulated context and emits a canonical log entry in logfmt format.
- Tornado Request Handlers: wiring various Tornado hooks to use these utility functions.
- Tornado Server: replacing the default task factory with the one that maintains and switches context while processing concurrent requests.
- Log Configuration: setting log multicast to various destinations.
Get the source code:
Log Context utility functions:
A context is maintained throughout the life of a request. At any stage, more key-value pairs can be added to the context. All these pairs are logged in a single entry at the time of sending the response.
We use aiotask-context package for managing context for each async request. Other alternatives are aiocontext, and tasklocals. There is also contextvars standard Python package if you are using Python 3.7+.
The complexity of the context management is encapsulated in easy to use utility functions to get and set key-value pairs in the context.
Log Formatting utility function:
Python logfmt package does the heavy lifting of emitting the log. It takes a dictionary, and convert it to a string consisting of “key=value” pairs.
A log utility function encapsulates all this processing. It adheres to Python logger API. In case of an exception, it stores exc_info traceback in logs.
Tornado Request Handlers:
It is time to use the context and log format utility functions.
We use Tornado to develop microservices. It provides various hooks to shape logging and error reporting behavior.
It has a log_function setting to pass a function, that Tornado calls at the end of every request to log the result. We implement this method to emit a log entry for the request with all key-value pairs in the context.
Other hooks are:
- write_error to customize the error response. Information about the error is added to the log context.
- log_exception to log uncaught exceptions. It can be overwritten to log in logfmt format.
These functions are overridden in the base request handler. A req_id is assigned to the incoming request, and stored in the log context.
This completes the wiring in Tornado request handlers for logging canonical log entry in logfmt format for each request for all successful responses, errors, and uncaught exceptions.
At all locations, information is stuffed in the key-value form in the task context. At the time of sending the response, Tornado calls log_function, which unpacks the context, and logs it in logfmt.
Tornado and asyncio work on the principle of cooperative multitasking. Multiple requests (with associated context) can exist at a given time. So the context has to be preserved across interleaving of these requests.
That’s why the aiotask_context requires a different task factory to be set for the asyncio event loop. It must be done while starting the HTTP server.
Log Configuration for multicasting:
Python’s standard logging package documentation has an excellent HOWTO guide and Cookbook. These are rich sources of information, and doesn’t leave much to add. Following are some of the best practices:
- Do NOT use ROOT logger directly through logging.debug(), logging.error() methods directly because it is easy to overlook their default behavior.
- Do NOT use module loggers of variety logging.getLogger(__name__) because any complex project will require controlling logging through configuration (see next point). These may cause surprise if you forget to set disable_existing_loggers to false or overlook how modules are loaded and initialized. If used at all, call logging.getLogger(__name__) inside a function, rather than outside at the beginning of a module.
- dictConfig (in yaml) offers the right balance of versatility and flexibility compared to ini-based fileConfigor doing it in code. Specifying logger in config files allows you to use different logging levels and infra in prod deployment, stage deployments, and local debugging (with increasingly more logs).
Each log entry has to be relayed to multiple data-store destinations. This can be achieved by setting the log configuration for the Python logging package. It has to be a yaml file using the format defined by thelogging.config package.
In the example below, the config defines two log formats for two handlers (you may define as many formats, handlers, and loggers as you need):
- brief to display on the console; log prefix is kept simple.
- detailed to log into a file; it has all fields in key-value pairs.
Notice that this configuration not just defines a logger addrservice for this service, but also modifies the behavior of Tornado's general logger. There are several pre-defined handlers. Here the SteamHandler and RotatingFileHandler are being used to write to console and log files respectively.
The first thing the server does is reading this config and setting up the loggers.
In the previous section, we wired the logging to use logfmt format, maintain a context for each request, emit log only at the end, and multicast the logs to many destinations.
Logs are processed in real-time. Entries are grouped based on the presence and value of keys (from key-value pairs). We have a dashboard showing various kinds of errors in all services. When any threshold is breached, we get automated alters.
That automation deserves a separate blog post in its own right. Here we will examine only the log output. It looks like a row in a NoSQL data store. Let’s see it in action.
Start the server:
It will show the console log:
Watch the log:
Send a request:
The console log will show brief log entries:
The log file will show logfmt-style one-line detailed canonical log entries:
Unit and Integration Tests
Tests are configured to be quiet:
If you want to change the log message during tests, change log level from ERROR to INFO:
With that change, if you run the tests, you can examine the logs:
For effective logging in microservices:
- Log Format: Pick a standard format suitable for off-the-shelf log processing tools and humans.
- Canonical Entry: Emit one log entry for each request at the response time consolidating all log info. It makes debugging easy.
- Request Id: Assign an id to each request to identify all calls made to internal services for serving a request.
- Configuration: Use logger configuration to channel log entries to multiple stores.