domingo, 9 de septiembre de 2018

Level up logs and ELK - Logging best practices with Logback

Articles index:

  1. Introduction (Everyone)
  2. JSON as logs format (Everyone)
  3. Logging best practices with Logback (Targetting Java DEVs)
  4. Logging cutting-edge practices (Targetting Java DEVs) 
  5. Contract first log generator (Targetting Java DEVs)
  6. ElasticSearch VRR Estimation Strategy (Targetting OPS)
  7. VRR Java + Logback configuration (Targetting OPS)
  8. VRR FileBeat configuration (Targetting OPS)
  9. VRR Logstash configuration and Index templates (Targetting OPS)
  10. VRR Curator configuration (Targetting OPS)
  11. Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

Logging best practices with Logback

 

Logs are the product of, so far, and in my experience only, very personal view of how software must be built and the amount of nights on call the developer has suffered.
In this chapter I am going to recommend some ways of logging information in order to make it more useful when it comes to Index, Search and Filter.

All examples expressed in Java + SLF4J.
Github repo: https://github.com/albertonavarro/leveluplogging/tree/master/leveluplogging

Simple Message:

Most of the free log information will come as message information. From "Application is ready to listen messages" to "Unknown error trying to upload file". These messages have a low information interest, I struggled to make these kind of messages without any parameter.

Example code in GitHub

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        logger.info("Starting main - without arguments");
    }
}

This could be the output for a simple Message in plain text:
20:15:52.106 [main] INFO  Main1 - Starting main - without arguments

You would still need some help to make that log line useful, like Grok or Splunk.
They could expect that your format to be constant and immutable, consisting of

input {
  beats {
    port => 5044
  }
}

filter {
    grok {
        match => [ "message" , "%{TIME:@timestamp} \[%{WORD:thread}\] 
%{LOGLEVEL:loglevel}\s+%{WORD:classname} - %{GREEDYDATA:message}" ]
        overwrite => ["message"]
    }
}

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "logstash-classic"
    }
}

Later on, these "messages" are translated to an identical message field in the ElasticSearch document that you may search by exact or partial match.




Notice than thanks to Grok, we have extracted "classname" and "loglevel" as long as that format never changes.

You might event count, compare, dashboard the number of events of a type VS the number of events of a different type, correlate by time, etc.

This would be the aspect of the same message in JSON:

{"@timestamp":"2018-09-06T20:15:52.106+02:00","message":"Starting main - without arguments",
"level":"INFO","level_value":20000}
 
Fields are already split and named, no need to do extra parsing to extract the same benefits.

Arguments: 

Example code in GitHub

At some point, we find it necessary to add dynamic information to our logs, flavor can go from free text to semi-structured format. This could be my own evolution:

Initially I wrote them like this:
logger.info("Market {} is ready", marketId)

Then I tried to give it some more structure:
logger.info("Market marketId={} is ready", marketId)

I ended like this in order to improve queries format:
logger.info("New Market Status marketId={} newStatus={}", marketId, "ready")
The reason for accumulating free text on the left and data on the right is simplifying the queries. In Kibana it would easier and more performing querying "Market is ready" than "Market" AND "is ready".

Output for this last test would look like:
23:26:52.164 [main] INFO  Main2 - New Market Status marketId=1234 newStatus=ready 

Reusing previous Grok expression, we could extract message="New Market Status marketId=1234 newStatus=ready", but we would need to run something else on top of it to get ElasticSearch to understand there is extra information there that it can index separately.

Logstash's KV plugin (Elastic KV plugin) will extract key/value pairs from a given input, so in this case, output would look like this (notice new fields marketId and newStatus)
 


Now ElasticSearch can filter by "marketId", by "status", and go deeper and wider in data analysis.

The only weak point for KV is that it could have "false positives" detecting key/value pairs. Until recently, only single character separators were supported. Now char sequences defined by regex are as well (finally). By default. separator is character "=", that you would easily find in base64 strings or URLs. That would dirt and damage your index performance rather quickly.

JSON output would have same output and problems, we would still need to separate information from within message text.

Structured Arguments

Example code in GitHub

The amazing library logstash-logback-encoder also provides another way of expressing arguments in a more structured way. Nicely wrapped in the StructuredArguments class, it offers a number of convenient methods to parameterize Key/Values, Lists and Maps.

Syntax is quite similar, just a little bit more verbose:
logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));

It leaves the following console output, as it automatically formats key value pairs. Likewise previous method, information can be extracted using KV plugin.
20:20:02.037 [main] INFO  Main3 - New Market Status marketId=1234 newStatus=ready

However this is way more valuable when the output is JSON instead of plain text, as it not only formats the message the same way, but it also introduces at root level the new pair of attributes, marketId and newStatus, as independent fields.

{"@timestamp":"2018-09-06T20:20:02.037+02:00","message":"New Market Status marketId=1234 
newStatus=ready","level":"INFO","level_value":20000,"marketId":1234,"newStatus":"ready"}

This is very important, because this document, either directly sent to ElasticSearch, or gone through Logstash, will be indexed by ElasticSearch taking "marketId" and "newStatus" as independently indexed information without extra effort at all (no need for Grok or KV).

Compare required Logstash configuration with previous one:

input {
  beats {
    port => 5044
    codec => json
  }
  
}

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "logstash-classic"
    }
}

Finally resulting in this output in ElasticSearch



Notice (again) that "marketId" and "newStatus" are recognized without Grok or KV plugins configured at all in Logstash. Moreover, there won't be false-positives (I talked about this in the KV section some lines above). Some more extra fields pop-up coming from logstash-logback-encoder.

MDC

Example code in GitHub

One of the most important and useful tools in modern log frameworks is the Mapped Diagnostic Context (MDC).

The most common scenario where MDC highlights is the log correlation within a service and across services; MDC uses a ThreadLocal to store a map of key/values per any thread that uses the logger.

If you created an aspect or a filter in your service that assigned in that map a well-known key (e.g. "rid" for "request id") with a new UUID value per request, all logs made by that request would have the same "rid". This value could be also taken by http/jms clients to propagate the "rid" to other services that we own, and keep the same id across all services.
This is an extended pattern to trace calls across distributed services in industry.

In any service that used multiple threads to execute a given code in parallel, this is as good as using the thread name to distinguish actions from one or other thread. However, thread name won't help you if task is started by one thread and finished by another, or you wanted to know what happened with that request in next service it invoked.

Simulating  such behavior with following code:

public static void main(String[] args) {
    MDC.put("rid", UUID.randomUUID().toString());
    logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));
    logger.info("Some more logging");
}

Console output for MDC requires some extra configuration per field, so don't get too excited with these fields.

<encoder>
    <pattern>%d{HH:mm:ss.SSS} %X{rid} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>

So now we are able to see it in the console/file

20:27:28.007 b5fab580-12f4-4266-8a97-927d03d1b01a [main] INFO  Main4 - New Market Status 
marketId=1234 newStatus=ready
20:27:28.043 b5fab580-12f4-4266-8a97-927d03d1b01a [main] INFO  Main4 - Some more logging

And given the strict format, to read it with grok as any other field (Notice extra configuration per new MDC field).

grok {
        match => [ "message" , "%{TIME:@timestamp} %{UUID:rid} \[%{WORD:thread}\] 
%{LOGLEVEL:loglevel}\s+%{WORD:classname} - %{GREEDYDATA:message}" ]
        overwrite => ["message"]
}

Once again, with the right Logstash configuration

input {
  beats {
    port => 5044
    codec => json
  }
  
}

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "logstash-classic"
    }
}

and the right Logback configuration (below) we can get MDC mapped in JSON with no effort for new fields at all:

{"@timestamp":"2018-09-06T20:27:28.007+02:00","thread_name":"main",
"rid":"b5fab580-12f4-4266-8a97-927d03d1b01a","logger_name":"Main4",
"message":"New Market Status marketId=1234 newStatus=ready","level":"INFO",
"marketId":1234,"newStatus":"ready"}
 
{"@timestamp":"2018-09-06T20:27:28.043+02:00","thread_name":"main",
"rid":"b5fab580-12f4-4266-8a97-927d03d1b01a","logger_name":"Main4",
"message":"Some more logging","level":"INFO"}

Once again, "rid" appears outside the message and as independent field, so it is automatically accepted by Logstash and indexed by ElasticSearch. No extra Logstash configuration either.


Logback Configuration for JSON using logstash-logback-encoder

It's not the purpose of the article to get deep into configuration details. Just mentioning this is the configuration that I'm using for this examples. Best guide to get this right is (GitHub) logstash-logback-encoder

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <appender name="stash" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logFile.json</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>file.log.%d{yyyy-MM-dd}</fileNamePattern>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
            <providers>
                <timestamp/>
                <threadName/>
                <mdc/>
                <loggerName/>
                <message/>
                <logLevel/>
                <arguments/>
                <stackTrace/>
                <stackHash/>
            </providers>
        </encoder>
    </appender>

    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>
    
    <root level="all">
        <appender-ref ref="stash" />
        <appender-ref ref="STDOUT" />
    </root>

</configuration>


Next: 4 - Logging cutting-edge practices (experimental)


No hay comentarios:

Publicar un comentario

Nota: solo los miembros de este blog pueden publicar comentarios.