Looking for Q: best practices

Articles index:

Introduction (Everyone)
JSON as logs format (Everyone)
Logging best practices with Logback (Targetting Java DEVs)
Logging cutting-edge practices (Targetting Java DEVs)
Contract first log generator (Targetting Java DEVs)
ElasticSearch VRR Estimation Strategy (Targetting OPS)
VRR Java + Logback configuration (Targetting OPS)
VRR FileBeat configuration (Targetting OPS)
VRR Logstash configuration and Index templates (Targetting OPS)
VRR Curator configuration (Targetting OPS)
Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

Logging best practices with Logback

Logs are the product of, so far, and in my experience only, very personal view of how software must be built and the amount of nights on call the developer has suffered.
In this chapter I am going to recommend some ways of logging information in order to make it more useful when it comes to Index, Search and Filter.

All examples expressed in Java + SLF4J.
Github repo: https://github.com/albertonavarro/leveluplogging/tree/master/leveluplogging

Simple Message:

Most of the free log information will come as message information. From "Application is ready to listen messages" to "Unknown error trying to upload file". These messages have a low information interest, I struggled to make these kind of messages without any parameter.

Example code in GitHub


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        logger.info("Starting main - without arguments");
    }
}

This could be the output for a simple Message in plain text:

20:15:52.106 [main] INFO  Main1 - Starting main - without arguments

You would still need some help to make that log line useful, like Grok or Splunk.
They could expect that your format to be constant and immutable, consisting of


input {
  beats {
    port => 5044
  }
}

filter {
    grok {
        match => [ "message" , "%{TIME:@timestamp} \[%{WORD:thread}\]

%{LOGLEVEL:loglevel}\s+%{WORD:classname} - %{GREEDYDATA:message}" ]
        overwrite => ["message"]
    }
}

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "logstash-classic"
    }
}

Later on, these "messages" are translated to an identical message field in the ElasticSearch document that you may search by exact or partial match.

Notice than thanks to Grok, we have extracted "classname" and "loglevel" as long as that format never changes.

You might event count, compare, dashboard the number of events of a type VS the number of events of a different type, correlate by time, etc.

This would be the aspect of the same message in JSON:

{"@timestamp":"2018-09-06T20:15:52.106+02:00","message":"Starting main - without arguments",

"level":"INFO","level_value":20000}

Fields are already split and named, no need to do extra parsing to extract the same benefits.

Arguments:

Example code in GitHub

At some point, we find it necessary to add dynamic information to our logs, flavor can go from free text to semi-structured format. This could be my own evolution:

Initially I wrote them like this:

logger.info("Market {} is ready", marketId)

Then I tried to give it some more structure:

logger.info("Market marketId={} is ready", marketId)

I ended like this in order to improve queries format:

logger.info("New Market Status marketId={} newStatus={}", marketId, "ready")

The reason for accumulating free text on the left and data on the right is simplifying the queries. In Kibana it would easier and more performing querying "Market is ready" than "Market" AND "is ready".

Output for this last test would look like:

23:26:52.164 [main] INFO  Main2 - New Market Status marketId=1234 newStatus=ready

Reusing previous Grok expression, we could extract message="New Market Status marketId=1234 newStatus=ready", but we would need to run something else on top of it to get ElasticSearch to understand there is extra information there that it can index separately.

Logstash's KV plugin (Elastic KV plugin) will extract key/value pairs from a given input, so in this case, output would look like this (notice new fields marketId and newStatus)

Now ElasticSearch can filter by "marketId", by "status", and go deeper and wider in data analysis.

The only weak point for KV is that it could have "false positives" detecting key/value pairs. Until recently, only single character separators were supported. Now char sequences defined by regex are as well (finally). By default. separator is character "=", that you would easily find in base64 strings or URLs. That would dirt and damage your index performance rather quickly.

JSON output would have same output and problems, we would still need to separate information from within message text.

Structured Arguments

Example code in GitHub

The amazing library logstash-logback-encoder also provides another way of expressing arguments in a more structured way. Nicely wrapped in the StructuredArguments class, it offers a number of convenient methods to parameterize Key/Values, Lists and Maps.

Syntax is quite similar, just a little bit more verbose:

logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));

It leaves the following console output, as it automatically formats key value pairs. Likewise previous method, information can be extracted using KV plugin.

20:20:02.037 [main] INFO  Main3 - New Market Status marketId=1234 newStatus=ready

However this is way more valuable when the output is JSON instead of plain text, as it not only formats the message the same way, but it also introduces at root level the new pair of attributes, marketId and newStatus, as independent fields.

{"@timestamp":"2018-09-06T20:20:02.037+02:00","message":"New Market Status marketId=1234

newStatus=ready","level":"INFO","level_value":20000,"marketId":1234,"newStatus":"ready"}

This is very important, because this document, either directly sent to ElasticSearch, or gone through Logstash, will be indexed by ElasticSearch taking "marketId" and "newStatus" as independently indexed information without extra effort at all (no need for Grok or KV).

Compare required Logstash configuration with previous one:


input {
  beats {
    port => 5044
    codec => json
  }
  
}

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "logstash-classic"
    }
}

Finally resulting in this output in ElasticSearch

Notice (again) that "marketId" and "newStatus" are recognized without Grok or KV plugins configured at all in Logstash. Moreover, there won't be false-positives (I talked about this in the KV section some lines above). Some more extra fields pop-up coming from logstash-logback-encoder.

MDC

Example code in GitHub

One of the most important and useful tools in modern log frameworks is the Mapped Diagnostic Context (MDC).

The most common scenario where MDC highlights is the log correlation within a service and across services; MDC uses a ThreadLocal to store a map of key/values per any thread that uses the logger.

If you created an aspect or a filter in your service that assigned in that map a well-known key (e.g. "rid" for "request id") with a new UUID value per request, all logs made by that request would have the same "rid". This value could be also taken by http/jms clients to propagate the "rid" to other services that we own, and keep the same id across all services.
This is an extended pattern to trace calls across distributed services in industry.

In any service that used multiple threads to execute a given code in parallel, this is as good as using the thread name to distinguish actions from one or other thread. However, thread name won't help you if task is started by one thread and finished by another, or you wanted to know what happened with that request in next service it invoked.

Simulating such behavior with following code:

public static void main(String[] args) {
    MDC.put("rid", UUID.randomUUID().toString());
    logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));
    logger.info("Some more logging");
}

Console output for MDC requires some extra configuration per field, so don't get too excited with these fields.


<encoder>
    <pattern>%d{HH:mm:ss.SSS} %X{rid} [%thread] %-5level %logger{36} - %msg%n</pattern>
</encoder>

So now we are able to see it in the console/file

20:27:28.007 b5fab580-12f4-4266-8a97-927d03d1b01a [main] INFO  Main4 - New Market Status

marketId=1234 newStatus=ready
20:27:28.043 b5fab580-12f4-4266-8a97-927d03d1b01a [main] INFO  Main4 - Some more logging

And given the strict format, to read it with grok as any other field (Notice extra configuration per new MDC field).

grok {
        match => [ "message" , "%{TIME:@timestamp} %{UUID:rid} \[%{WORD:thread}\]

%{LOGLEVEL:loglevel}\s+%{WORD:classname} - %{GREEDYDATA:message}" ]
        overwrite => ["message"]

Once again, with the right Logstash configuration

input {
  beats {
    port => 5044
    codec => json
  }
  
}

output {
    elasticsearch {
        hosts => "localhost:9200"
        index => "logstash-classic"
    }
}

and the right Logback configuration (below) we can get MDC mapped in JSON with no effort for new fields at all:

{"@timestamp":"2018-09-06T20:27:28.007+02:00","thread_name":"main",

"rid":"b5fab580-12f4-4266-8a97-927d03d1b01a","logger_name":"Main4",

"message":"New Market Status marketId=1234 newStatus=ready","level":"INFO",

"marketId":1234,"newStatus":"ready"}

{"@timestamp":"2018-09-06T20:27:28.043+02:00","thread_name":"main",

"rid":"b5fab580-12f4-4266-8a97-927d03d1b01a","logger_name":"Main4",

"message":"Some more logging","level":"INFO"}

Once again, "rid" appears outside the message and as independent field, so it is automatically accepted by Logstash and indexed by ElasticSearch. No extra Logstash configuration either.

Logback Configuration for JSON using logstash-logback-encoder

It's not the purpose of the article to get deep into configuration details. Just mentioning this is the configuration that I'm using for this examples. Best guide to get this right is (GitHub) logstash-logback-encoder

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
    <appender name="stash" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>logFile.json</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
            <fileNamePattern>file.log.%d{yyyy-MM-dd}</fileNamePattern>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <encoder class="net.logstash.logback.encoder.LoggingEventCompositeJsonEncoder">
            <providers>
                <timestamp/>
                <threadName/>
                <mdc/>
                <loggerName/>
                <message/>
                <logLevel/>
                <arguments/>
                <stackTrace/>
                <stackHash/>
            </providers>
        </encoder>
    </appender>

    <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
        <encoder>
            <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
        </encoder>
    </appender>
    
    <root level="all">
        <appender-ref ref="stash" />
        <appender-ref ref="STDOUT" />
    </root>

</configuration>

Next: 4 - Logging cutting-edge practices (experimental)

After a year dealing with Gradle in different companies, I think it's high time to start collecting some patterns and suggestions.

1) Avoid redundant folder declaration.

Remember, apply plugin java already sets up some defaults folder, you don't need to declare them if you don't plan to change them.

Recommended
apply plugin: 'java'

Redundant
apply plugin: 'java' sourceSets { main { java { srcDir 'src/main/java' } resources { srcDir 'src/main/resources' } } test { java { srcDir 'src/test/java' } resources { srcDir 'src/test/resources' } } }

Groovy plugin and scala plugin also declares other folders, just read some documentation before start using them.

2) Choose your syntax for dependency management.

Gradle allows you two syntax for dependency declaration.

Extended (group: 'ws.lazylogin', name: 'lazylogin-core', version: '1.0.0')
Compact ('ws.lazylogin:lazylogin-core:1.0.0').

Be consistent, your users and colleagues will be pleased.

I prefer the compact one :)

3) Your users don't need to know project details to compile it.

Any project should be able to clean + build from the code and a JVM in place. Ideally nothing else should be required.

Although some projects will inevitably break this suggestion, try not to.

Some examples of build breaking customs.

Usage of variables declared in ${user.home}/.gradle/gradle.properties that don't have default value in build.gradle or exist in project's gradle.properties.

This would fail for anyone not owning / not knowing what variables to include / contaminate hi/er user gradle properties file. No no no no no.

Remember, project variables are overridden by user projects, no risk in declaring default values in project's gradle.properties to, at least, allow building. (More info: https://discuss.gradle.org/t/order-of-precedence-for-defining-gradle-properties/7274/2)

Assumption of executables in the PATH or some predefined location.

Try your harder to make your project easy to build.

4) Prevent incorrect dependency versioning

Declare your dependencies and version in a single place (that was a good idea in Maven's universe) and reuse those dependencies by its variable name.

The risk of not following this advice is the likelihood you'll end up using different versions for the same library in different modules in the same project, and you don't want that, neither do I.

Solution would look like this:

Dependency declaration (they can be declared in a different file)
ext{ //VERSIONS v = [ spring: '3.2.0.RELEASE', jme3: '3.1.0-snapshot-github' ] deps = [ //SPRING spring_core: 'org.springframework:spring-core:'+v.spring, spring_beans: 'org.springframework:spring-beans:'+v.spring, spring_context: 'org.springframework:spring-context:'+v.spring, //JME3 jme3_core: 'com.jme3:jme3-core:' + v.jme3, jme3_effects: 'com.jme3:jme3-effects:' + v.jme3 ] }

Dependency declaration (they can be declared in a different file)

ext{
//VERSIONS
    v = [
        spring: '3.2.0.RELEASE',
        jme3: '3.1.0-snapshot-github'
    ]

    deps = [
        //SPRING
        spring_core: 'org.springframework:spring-core:'+v.spring,
        spring_beans: 'org.springframework:spring-beans:'+v.spring,
        spring_context: 'org.springframework:spring-context:'+v.spring,
        //JME3
        jme3_core: 'com.jme3:jme3-core:' + v.jme3,
        jme3_effects: 'com.jme3:jme3-effects:' + v.jme3
    ]
}

Dependency usage for any subproject
dependencies { compile ( project(':mod-api'), deps.jme3_core, deps.jme3_effects, deps.spring_beans, deps.spring_core ) }

Instead of the more error prone way:

Dependency usage for a given subproject A
dependencies { compile ( project(':mod-api'), 'com.jme3:jme3-core:3.1.0-snapshot-github', 'com.jme3:jme3-effects:3.1.0-snapshot-github', 'org.springframework:spring-beans:3.2.0.RELEASE', 'org.springframework:spring-core:3.2.0.RELEASE' ) }

Dependency usage for a given subproject B
dependencies { compile ( project(':mod-api'), 'com.jme3:jme3-core:3.1.0-snapshot-github', 'com.jme3:jme3-effects:3.1.0-snapshot-github', 'org.springframework:spring-beans:3.1.0.RELEASE', 'org.springframework:spring-core:3.2.0.RELEASE' ) }

5) Group your dependency by dependency configuration.

It's nicer :)

Dependencies grouped by configurations
compile ( project(':mod-api'), libs.lazylogin_common_context, libs.nifty, libs.jme3_core, libs.jme3_effects, libs.spring_beans, libs.spring_core, libs.jackson_dataformat_yaml, libs.jackson_databind ) runtime ( libs.nifty_default_controls, libs.eventbus, libs.auto_value))

Dependencies ungrouped (and disorganized)
compile project(':mod-api') compile libs.lazylogin_common_context, compile libs.nifty, compile libs.jme3_core, compile libs.jme3_effects, compile libs.spring_beans, compile libs.spring_core, runtime libs.nifty_default_controls, compile libs.jackson_dataformat_yaml, compile libs.jackson_databind, runtime libs.eventbus, runtime libs.auto_value)

(Not possible when you want to exclude dependencies from dependencies).

Looking for Q

Special pages

domingo, 9 de septiembre de 2018

Level up logs and ELK - Logging best practices with Logback

Articles index:

Logging best practices with Logback

Simple Message:

Arguments:

Structured Arguments

MDC

Logback Configuration for JSON using logstash-logback-encoder

sábado, 2 de mayo de 2015

Some lessons learned: Gradle

Etiquetas