Mostrando entradas con la etiqueta kibana. Mostrar todas las entradas
Mostrando entradas con la etiqueta kibana. Mostrar todas las entradas

miércoles, 14 de noviembre de 2018

Level up logs and ELK - Logging Cutting-Edge Practices - Documenting and sharing your most important logged information

 

Articles index:

  1. Introduction (Everyone)
  2. JSON as logs format (Everyone)
  3. Logging best practices with Logback (Targetting Java DEVs)
  4. Logging cutting-edge practices (Targetting Java DEVs) 
  5. Contract first log generator (Targetting Java DEVs)
  6. ElasticSearch VRR Estimation Strategy (Targetting OPS)
  7. VRR Java + Logback configuration (Targetting OPS)
  8. VRR FileBeat configuration (Targetting OPS)
  9. VRR Logstash configuration and Index templates (Targetting OPS)
  10. VRR Curator configuration (Targetting OPS)
  11. Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

Logging cutting-edge practices: Sharing your log strategy with other teams and departments



Once you have your ELK cluster in place, and your team is making an application (or several) logging into it, you still have a problem to solve.

Given either of the following scenarios:
  • A team newcomer is asked to investigate something that involves logs.
  • CTO has granted Support access to ELK so they can have some insights to application activity or problems.
  • Another team is investigating a cross-application problem.

Assuming they know the service, host and other common OPS-depending attributes...
  • What attributes can they query in ELK?
  • What to they mean?
  • What text does come with the data they need?
  • How do they know what data is correlated?

Bonus question
  • How do you enforce that same attribute has same name in all your logs? Moreover, how do you enforce it's always the same type?

I'll be proposing a number of different solutions in a iterative evolution, pick the one you prefer.

Baseline:


You are using  latest log format and best practices from Level up logs and ELK - Logging best practices with Logback
logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));

Cons:
  • Name or case could differ for same concept in different names.
  • Type mismatch is possible, same name, different type.
  • Enforcing usage as policy.
  • Not documentation available for sharing with other teams or for your own reference.

 

Iteration #1: Constant names


First problem to solve is ensuring we are not using different keys for the same concept.
Initially there's nothing that avoided this to happen:
logger.info("New Market Status {} {}", kv("marketId", 1234), kv("newStatus", "ready"));
[...] 
logger.info("Editing Market Status {} {}", kv("id", 1234), kv("newstatus", "ready"));

On one hand, no developer is going to manually check all the names for any variable to enforce consistency of names. On the other, even lower/upper case mismatch could ruin your queries as they are stored in case sensitive fields.

First iteration would make us think of a set of constant Strings or Enums to help us to deal with variety of names. Documentation about meaning, usage, required type and description could be explicit in comments.
public class LoggerUtils {
    public static final String MK_ID = "marketId"; //String - Market Id
    public static final String MK_NEW_STATUS = "newStatus"; //MarketStatus.java enum - New Market Status
}

Finally code would look like this (Enforced by code review and good will only)
logger.info("New Market Status {} {}", kv(MK_ID, 1234), kv(MK_NEW_STATUS, "ready"));
[...] 
logger.info("Editing Market Status {} {}", kv(MK_ID, 1234), kv(MK_NEW_STATUS, "ready"));

Cons:
  • Name or case could differ for same concept in different names.
  • Type mismatch is possible: same name, different type.
  • Enforce usage as policy.
  • Not documentation available for sharing with other teams or for your own reference.

Iteration #2: Enforcing types


This problem comes from accidentally reusing same name for different kinds of items, being "id" a paradigmatic example.
logger.info("New Market Status {} {}", kv("id", 1234), kv("newStatus", "ready"));
[...] 
logger.info("User has signed up {}", kv("id", "email@forexample.com"));

Unless forced, first occurrence of a type defines the type for that index. Therefore you will find an error as following in Logstash logs/output:

logstash_1       | [2018-11-15T22:08:29,392][WARN ][logstash.outputs.elasticsearch] 
Could not index event to Elasticsearch. {
:status=>400, 
:action=>[
  "index", {
    :_id=>nil, 
    :_index=>"vrr-leveluplogging-low-2018-11-15", 
    :_type=>"doc", 
    :_routing=>nil
  }, 
  #<LogStash::Event:0x9b41f3e>
], 
:response=>{
  "index"=>{
    "_index"=>
    "vrr-leveluplogging-low-2018-11-15", 
    "_type"=>"doc", 
    "_id"=>"hJZrGWcBSDbbRtN6HHw3", 
    "status"=>400, 
    "error"=>{
      "type"=>"illegal_argument_exception", 
      "reason"=>"mapper [id] of different type, current_type [long], merged_type [text]"
}}}}


If you check Kibana's Management -> Index Patterns -> vrr-* (in our example)


Field "id" has been defined as "number", therefore when Logstash tries to put a "string" it fails.

Once introduced the problem, let's try the solution.


import net.logstash.logback.argument.StructuredArgument;
import static net.logstash.logback.argument.StructuredArguments.kv;

public class LoggerUtils {

    public static StructuredArgument logMarketId(Long id) {
        return kv("marketId", id);
    }

    public static StructuredArgument logCustId(String id) {
        return kv("custId", id);
    }

}

We could create a helper class that centralized the responsibility of not reusing names, and name-type relationship.

It would be used like this now (old and new version):
logger.info("User has signed up {}", kv("id", "email@forexample.com"));
logger.info("User has signed up {}", logCustId("email@forexample.com"));

Yet you need to enforce creating these functions and its usage.

Cons:
  • Name or case could differ for same concept in different names
  • Type mismatch is possible, same name, different type
  • Enforce usage as policy
  • Not documentation available for sharing with other teams or for your own reference

Please notice that a field accepting a type, e.g. "id" accepting "number", also implies that it can take an array of the same type (and not only a single value); it will be handled by ElasticSearch transparently. They will be indexed as searches will return the same document for a query of either of the values in the array.


Iteration #3: Enforcing usage


How to avoid that a random developer adds their own log lines, breaking all the rules above? Good and Rogue developer below:

import net.logstash.logback.argument.StructuredArgument;
import org.slf4j.Logger;import org.slf4j.LoggerFactory; 
import static example.LoggerUtils.logCustId;
import static net.logstash.logback.argument.StructuredArguments.kv; 

[...] 

logger.info("Good developer using custom LoggerUtils {}", logCustId("email@forexample.com")); 

[...] 

logger.info("Roge developer not using custom LoggerUtils {}", kv("custId",1234L));


Fortunately for us, Checkstyle already allows to control arbitrary imports, and also allows suppresion of this rule for a given class. Let's put it together.

Checkstyle configuration forbidding usage of kv import in the project:

<module name="Checker">
    <property name="charset" value="UTF-8">

    <module name="SuppressionFilter">
        <property name="file" value="${config_loc}/suppressions.xml">
    </property></module>
    
    <module name="TreeWalker">
        <module name="IllegalImport">
            <property name="illegalClasses" value="net\.logstash\.logback\.argument\.StructuredArguments.*">
            <property name="regexp" value="true">
        </property></property></module>
    </module>
</property></module>

Making an exception with LoggerUtils class:

<suppressions>
    <suppress checks="IllegalImport" files="LoggerUtils\.java"></suppress>
</suppressions>

You could argue that there are ways to avoid this prohibition, I agree, always are.
Goal here is not to avoid by all means that a Rogue developer logged anything he liked. Goal is to avoid mistakes, and programmatically bypass this checks should really shine in a code review.

As a result of applying above configuration in our favorite build tool (Maven, Gradle, etc..) you would find a broken build (if you configure it so, and so I recommend) and an explanation as follows (Gradle output):


:compileJava
:processResources UP-TO-DATE
:classes
[ant:checkstyle] [ERROR] /home/alberto/code/leveluplogging/leveluplogging/src/main/java/example/ErrorType.java:9:1: 
Illegal import - net.logstash.logback.argument.StructuredArguments.kv. [IllegalImport]
:checkstyleMain FAILED

FAILURE: Build failed with an exception.


Cons:
  • Name or case could differ for same concept in different names.
  • Type mismatch is possible: same name, different type.
  • Enforce usage as policy.  
  • Not documentation available for sharing with other teams or for your own reference.

 

Iteration #4: Contract-first LoggerUtils code generation


A solution that kept the goodness of the three previous iterations, but focused on a "documentation first" approach (analogy of contract-first for developers) would involve, in my opinion, the following elements:

  1. A description file that explains in the closest to human language possible what are you logging that could be of any interest to anyone.
  2. A code generator application that took the description file and made some equivalent to LoggerUtils in your favorite language. 
  3. A documentation generator application that took the description file and made a full human readable documentation in your favorite markup / text format.
  4. A plugin to attack previous two steps in your build chain (Gradle plugin, Maven plugin..)
This topic will take me long enough to deserve another article. See link below "Next"
 




Next: 5 - Contract first log generator

miércoles, 5 de septiembre de 2018

Level up logs and ELK - Introduction

Articles index:

    1. Introduction (Everyone)
    2. JSON as logs format (Everyone)
    3. Logging best practices with Logback (Targetting Java DEVs)
    4. Logging cutting-edge practices (Targetting Java DEVs) 
    5. Contract first log generator (Targetting Java DEVs)
    6. ElasticSearch VRR Estimation Strategy (Targetting OPS)
    7. VRR Java + Logback configuration (Targetting OPS)
    8. VRR FileBeat configuration (Targetting OPS)
    9. VRR Logstash configuration and Index templates (Targetting OPS)
    10. VRR Curator configuration (Targetting OPS)
    11. Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

       

      Introduction

       

      Why this? Why now?

      This is the result of many years as a developer knowing that there was something called "logs":
      A log is something super important that you cannot change because someone reads them, you cannot read them either because you don't have ssh access to the boxes they are generated in. Write them, but not too much, disk may fill.
      Then learned how to write them, then suffered how to read them (grep) until I knew there was a superexpensive tool that could collect, sort, query and present them for you.
      Then I love them, always thought of them like the ultimate audit tool but still too many colleagues preferred to use database for that sort of functionality.

      I got better at logging like it was a nice story happening in my application, learned also how to correlate logs across multiple services, but I barely managed to create good dashboards.It happened that Splunk was too expensive to buy, and ElasticSearch too expensive to maintain. Infamous years without managing logs happened again.

      Finally I got a job that involved architecture-level monitoring decisions and got the opportunity to develop a logging strategy for ElasticSearch (Splunk and other managed platforms didn't require that much hard thinking as they were providing the know-how and setup time). The strategy I will be developing in the next few articles came as a solution to many common restrictions in all companies I've been around the last decade.

      It is a long story, will try to make it concise, bear with me and, if you belong to that huge 95% of companies that uses ElasticSearch as a supergrep, you'll raise your game.

      Objectives of this series of articles:

      1. Save up to 90% disk space based on VRR (Variable Replication factor and Retention) estimations by playing with replication, retention and custom classification.
          • Differentiate important from redundant information and apply different policies to them.
      2. Log useful information for once, that you will be able to filter, query, plot and alert on.
        • We are covering parameters, structured arguments, and how to avoid grok to parse them.
      3. Save tons of OPS time by using the right tools to empower DEVs to be responsible of their logs.
        •  Let's avoid bothering our heroes with each change in a log line. Minimizing OPS time is paramount.

      Some assumptions:

      • All my examples will orbit around Java applications using SLF4J log framework, backed by Logback.
      • Logs are dumped to files, read by FileBeat, sent to Logstash.
      • Logstash receives the log lines and send them to ElasticSearch after some processing.
      • Kibana as ElasticSearch UI.

      Even if your stack is not 100% identical, I am sure you can apply some bits from here.


      Next:  2 - JSON as logs format