miércoles, 14 de noviembre de 2018

Level up logs and ELK - Logging Cutting-Edge Practices - Documenting and sharing your most important logged information

 

Articles index:

  1. Introduction (Everyone)
  2. JSON as logs format (Everyone)
  3. Logging best practices with Logback (Targetting Java DEVs)
  4. Logging cutting-edge practices (Targetting Java DEVs) 
  5. Contract first log generator (Targetting Java DEVs)
  6. ElasticSearch VRR Estimation Strategy (Targetting OPS)
  7. VRR Java + Logback configuration (Targetting OPS)
  8. VRR FileBeat configuration (Targetting OPS)
  9. VRR Logstash configuration and Index templates (Targetting OPS)
  10. VRR Curator configuration (Targetting OPS)
  11. Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

Logging cutting-edge practices: Sharing your log strategy with other teams and departments



Once you have your ELK cluster in place, and your team is making an application (or several) logging into it, you still have a problem to solve.

Given either of the following scenarios:
  • A team newcomer is asked to investigate something that involves logs.
  • CTO has granted Support access to ELK so they can have some insights to application activity or problems.
  • Another team is investigating a cross-application problem.

Assuming they know the service, host and other common OPS-depending attributes...
  • What attributes can they query in ELK?
  • What to they mean?
  • What text does come with the data they need?
  • How do they know what data is correlated?

Bonus question
  • How do you enforce that same attribute has same name in all your logs? Moreover, how do you enforce it's always the same type?

I'll be proposing a number of different solutions in a iterative evolution, pick the one you prefer.

Baseline:


You are using  latest log format and best practices from Level up logs and ELK - Logging best practices with Logback
logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));

Cons:
  • Name or case could differ for same concept in different names.
  • Type mismatch is possible, same name, different type.
  • Enforcing usage as policy.
  • Not documentation available for sharing with other teams or for your own reference.

 

Iteration #1: Constant names


First problem to solve is ensuring we are not using different keys for the same concept.
Initially there's nothing that avoided this to happen:
logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));
[...] 
logger.info("Editing Market Status {} {}", kv("id", 1234), kv("newstatus", "ready"));

On one hand, no developer is going to manually check all the names for any variable to enforce consistency of names. On the other, even lower/upper case mismatch could ruin your queries as they are stored in case sensitive fields.

First iteration would make us think of a set of constant Strings or Enums to help us to deal with variety of names. Documentation about meaning, usage, required type and description could be explicit in comments.
public class LoggerUtils {
    public static final String MK_ID = "marketId"; //String - Market Id
    public static final String MK_NEW_STATUS = "newStatus"; //MarketStatus.java enum - New Market Status
}

Finally code would look like this (Enforced by code review and good will only)
logger.info("New Market Status {} {}", kv(MK_ID, 1234), kv(MK_NEW_STATUS, "ready"));
[...] 
logger.info("Editing Market Status {} {}", kv(MK_ID, 1234), kv(MK_NEW_STATUS, "ready"));

Cons:
  • Name or case could differ for same concept in different names.
  • Type mismatch is possible: same name, different type.
  • Enforce usage as policy.
  • Not documentation available for sharing with other teams or for your own reference.

Iteration #2: Enforcing types


This problem comes from accidentally reusing same name for different kinds of items, being "id" a paradigmatic example.
logger.info("New Market Status {} {}", kv("id", 1234), kv("newStatus", "ready"));
[...] 
logger.info("User has signed up {}", kv("id", "email@forexample.com"));

Unless forced, first occurrence of a type defines the type for that index. Therefore you will find an error as following in Logstash logs/output:

logstash_1       | [2018-11-15T22:08:29,392][WARN ][logstash.outputs.elasticsearch] 
Could not index event to Elasticsearch. {
:status=>400, 
:action=>[
  "index", {
    :_id=>nil, 
    :_index=>"vrr-leveluplogging-low-2018-11-15", 
    :_type=>"doc", 
    :_routing=>nil
  }, 
  #<LogStash::Event:0x9b41f3e>
], 
:response=>{
  "index"=>{
    "_index"=>
    "vrr-leveluplogging-low-2018-11-15", 
    "_type"=>"doc", 
    "_id"=>"hJZrGWcBSDbbRtN6HHw3", 
    "status"=>400, 
    "error"=>{
      "type"=>"illegal_argument_exception", 
      "reason"=>"mapper [id] of different type, current_type [long], merged_type [text]"
}}}}


If you check Kibana's Management -> Index Patterns -> vrr-* (in our example)


Field "id" has been defined as "number", therefore when Logstash tries to put a "string" it fails.

Once introduced the problem, let's try the solution.


import net.logstash.logback.argument.StructuredArgument;
import static net.logstash.logback.argument.StructuredArguments.kv;

public class LoggerUtils {

    public static StructuredArgument logMarketId(Long id) {
        return kv("marketId", id);
    }

    public static StructuredArgument logCustId(String id) {
        return kv("custId", id);
    }

}

We could create a helper class that centralized the responsibility of not reusing names, and name-type relationship.

It would be used like this now (old and new version):
logger.info("User has signed up {}", kv("id", "email@forexample.com"));
logger.info("User has signed up {}", logCustId("email@forexample.com"));

Yet you need to enforce creating these functions and its usage.

Cons:
  • Name or case could differ for same concept in different names
  • Type mismatch is possible, same name, different type
  • Enforce usage as policy
  • Not documentation available for sharing with other teams or for your own reference

Please notice that a field accepting a type, e.g. "id" accepting "number", also implies that it can take an array of the same type (and not only a single value); it will be handled by ElasticSearch transparently. They will be indexed as searches will return the same document for a query of either of the values in the array.


Iteration #3: Enforcing usage


How to avoid that a random developer adds their own log lines, breaking all the rules above? Good and Rogue developer below:

import net.logstash.logback.argument.StructuredArgument;
import org.slf4j.Logger;import org.slf4j.LoggerFactory; 
import static example.LoggerUtils.logCustId;
import static net.logstash.logback.argument.StructuredArguments.kv; 

[...] 

logger.info("Good developer using custom LoggerUtils {}", logCustId("email@forexample.com")); 

[...] 

logger.info("Roge developer not using custom LoggerUtils {}", kv("custId",1234L));


Fortunately for us, Checkstyle already allows to control arbitrary imports, and also allows suppresion of this rule for a given class. Let's put it together.

Checkstyle configuration forbidding usage of kv import in the project:

<module name="Checker">
    <property name="charset" value="UTF-8">

    <module name="SuppressionFilter">
        <property name="file" value="${config_loc}/suppressions.xml">
    </property></module>
    
    <module name="TreeWalker">
        <module name="IllegalImport">
            <property name="illegalClasses" value="net\.logstash\.logback\.argument\.StructuredArguments.*">
            <property name="regexp" value="true">
        </property></property></module>
    </module>
</property></module>

Making an exception with LoggerUtils class:

<suppressions>
    <suppress checks="IllegalImport" files="LoggerUtils\.java"></suppress>
</suppressions>

You could argue that there are ways to avoid this prohibition, I agree, always are.
Goal here is not to avoid by all means that a Rogue developer logged anything he liked. Goal is to avoid mistakes, and programmatically bypass this checks should really shine in a code review.

As a result of applying above configuration in our favorite build tool (Maven, Gradle, etc..) you would find a broken build (if you configure it so, and so I recommend) and an explanation as follows (Gradle output):


:compileJava
:processResources UP-TO-DATE
:classes
[ant:checkstyle] [ERROR] /home/alberto/code/leveluplogging/leveluplogging/src/main/java/example/ErrorType.java:9:1: 
Illegal import - net.logstash.logback.argument.StructuredArguments.kv. [IllegalImport]
:checkstyleMain FAILED

FAILURE: Build failed with an exception.


Cons:
  • Name or case could differ for same concept in different names.
  • Type mismatch is possible: same name, different type.
  • Enforce usage as policy.  
  • Not documentation available for sharing with other teams or for your own reference.

 

Iteration #4: Contract-first LoggerUtils code generation


A solution that kept the goodness of the three previous iterations, but focused on a "documentation first" approach (analogy of contract-first for developers) would involve, in my opinion, the following elements:

  1. A description file that explains in the closest to human language possible what are you logging that could be of any interest to anyone.
  2. A code generator application that took the description file and made some equivalent to LoggerUtils in your favorite language. 
  3. A documentation generator application that took the description file and made a full human readable documentation in your favorite markup / text format.
  4. A plugin to attack previous two steps in your build chain (Gradle plugin, Maven plugin..)
This topic will take me long enough to deserve another article. See link below "Next"
 




Next: 5 - Contract first log generator