Instrumenting Applications for Enhanced Observability
Published: March 21, 2024
Being able to observe applications is not just about debugging specific issues, but also about being able to trace calls through a system, and determine if something is going wrong with the application or specific transactions. By tracking and tracing application behaviour we can improve reliability and customer experiences by detecting problems early and reducing the time taken to resolve issues. Overall, this will lower the cost of ownership for developing and operating customer applications.
Typical Logging Patterns
Most developers tend to see logging as ‘for them’, to determine what’s happening at the time of application development to get things working, or perhaps at best resolve some issues as they crop up. Logs tend to contain pieces of information as a transaction is processed and be spread across multiple lines. Often they’ll include stack traces or dumps of data that are hard to parse or read unless you understand the code.
While there’s a place for this kind of debug data in logs, logs can offer so much more and deliver real value for observability and understanding of how our applications are behaving.
Alternative Approach
Application developers should consider curating logs for more than just debugging code issues. Consider creating log entries that are specifically curated to capture an event in the application.
For example, an API should ideally write out an access event when a request arrives, and an audit event when the API response is about to be returned. We’ll talk below about what to log, but these events let us track requests and responses for each API in the system as business transactions.
In a web application each request for an operation (each button push, save, create, etc.). In a batch processor, log an event for each processed object or operation. By thinking of each of these transactions in the system and logging them, we can see the application behaving (or misbehaving).
What to Log
When logging an event, try to keep the information all in one log line. Try to consider the following attributes in the log:
Don’t forget to be careful not to log any private or identifying information, such as credit cards, account numbers with customer names, or any type of credential, without appropriate hashing or obfuscation.
In terms of formatting, name-value pairs (action=”create”) are always easily read by humans and machines. Also, use ISO date timestamp formats that include time zones.
What Then?
Logging items like this lets us use platforms like Splunk or ELK to ingest all the log entries and get a good picture of what’s going on in our applications. We’ll be able to spot errors as they occur, trace activity across our system, and spot slowdowns in performance. Handily, this can also help with business reporting, and even tracking security issues (see our post and eBook on the overlap between observability and security).
Retaining logs over time will allow you to use things like machine learning to automatically report on what’s normal and what might be a problem for your particular application or system.
Solsys has instrumented systems, especially API gateways and API endpoints, to support this kind of instrumentation for observability. We also have a lot of expertise in ingesting these log outputs into systems where they can be used by operators and business team members to improve reliability, and performance, reduce time to resolving issues, and improve your customer’s experiences. Feel free to reach out to talk more about our experiences and whether we can help!