miércoles, 5 de septiembre de 2018

Level up logs and ELK - Introduction

Articles index:

    1. Introduction (Everyone)
    2. JSON as logs format (Everyone)
    3. Logging best practices with Logback (Targetting Java DEVs)
    4. Logging cutting-edge practices (Targetting Java DEVs) 
    5. Contract first log generator (Targetting Java DEVs)
    6. ElasticSearch VRR Estimation Strategy (Targetting OPS)
    7. VRR Java + Logback configuration (Targetting OPS)
    8. VRR FileBeat configuration (Targetting OPS)
    9. VRR Logstash configuration and Index templates (Targetting OPS)
    10. VRR Curator configuration (Targetting OPS)
    11. Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

       

      Introduction

       

      Why this? Why now?

      This is the result of many years as a developer knowing that there was something called "logs":
      A log is something super important that you cannot change because someone reads them, you cannot read them either because you don't have ssh access to the boxes they are generated in. Write them, but not too much, disk may fill.
      Then learned how to write them, then suffered how to read them (grep) until I knew there was a superexpensive tool that could collect, sort, query and present them for you.
      Then I love them, always thought of them like the ultimate audit tool but still too many colleagues preferred to use database for that sort of functionality.

      I got better at logging like it was a nice story happening in my application, learned also how to correlate logs across multiple services, but I barely managed to create good dashboards.It happened that Splunk was too expensive to buy, and ElasticSearch too expensive to maintain. Infamous years without managing logs happened again.

      Finally I got a job that involved architecture-level monitoring decisions and got the opportunity to develop a logging strategy for ElasticSearch (Splunk and other managed platforms didn't require that much hard thinking as they were providing the know-how and setup time). The strategy I will be developing in the next few articles came as a solution to many common restrictions in all companies I've been around the last decade.

      It is a long story, will try to make it concise, bear with me and, if you belong to that huge 95% of companies that uses ElasticSearch as a supergrep, you'll raise your game.

      Objectives of this series of articles:

      1. Save up to 90% disk space based on VRR (Variable Replication factor and Retention) estimations by playing with replication, retention and custom classification.
          • Differentiate important from redundant information and apply different policies to them.
      2. Log useful information for once, that you will be able to filter, query, plot and alert on.
        • We are covering parameters, structured arguments, and how to avoid grok to parse them.
      3. Save tons of OPS time by using the right tools to empower DEVs to be responsible of their logs.
        •  Let's avoid bothering our heroes with each change in a log line. Minimizing OPS time is paramount.

      Some assumptions:

      • All my examples will orbit around Java applications using SLF4J log framework, backed by Logback.
      • Logs are dumped to files, read by FileBeat, sent to Logstash.
      • Logstash receives the log lines and send them to ElasticSearch after some processing.
      • Kibana as ElasticSearch UI.

      Even if your stack is not 100% identical, I am sure you can apply some bits from here.


      Next:  2 - JSON as logs format


      2 comentarios:

      1. Best article i've found on net. Thank you so much!

        ResponderEliminar
        Respuestas
        1. Thanks for your comment, let me know if you put it in practice. I'm short on testimonials.

          Eliminar

      Nota: solo los miembros de este blog pueden publicar comentarios.