jueves, 6 de septiembre de 2018

Level up logs and ELK - JSON logs

Articles index:

  1. Introduction (Everyone)
  2. JSON as logs format (Everyone)
  3. Logging best practices with Logback (Targetting Java DEVs)
  4. Logging cutting-edge practices (Targetting Java DEVs) 
  5. Contract first log generator (Targetting Java DEVs)
  6. ElasticSearch VRR Estimation Strategy (Targetting OPS)
  7. VRR Java + Logback configuration (Targetting OPS)
  8. VRR FileBeat configuration (Targetting OPS)
  9. VRR Logstash configuration and Index templates (Targetting OPS)
  10. VRR Curator configuration (Targetting OPS)
  11. Logstash Grok, JSON Filter and JSON Input performance comparison (Targetting OPS)

JSON as logs format

 

JSON Logs

JSON is not the only solution, but it's one of them, and the one I am advocating for until I find something better.

Based on my personal experience across almost a dozen companies, Logs lifecycle looks like this a lot:

Bad log lifecycle:
  1. Logs are created as plain text, using natural language, unstructured.
  2. Lines are batched together if they belong to an exception stacktrace, otherwise treated as individual messages (regex work).
  3. Parsed using Grok, message and timestamp are extracted from parts of the parsing (more regex).
  4. If logs are structured, or unstructured fixed-format, some useful information can be extracted by using Grok (e.g. Apache or Nginx logs, they are always the same, easy).
  5. OPS team see how Grok consumes all CPU it can find and reject to add more regex to the expression.
  6. Developers want to get information from logs to index and plot, they ask OPS to please add some lines to Grok. Three OPS suicide and other two quit the company. Developers finally get what they want.
  7. DEV team changes the logs without telling OPS, so previously useful information stops flowing in, dashboards are now empty and nobody cares.
It doesn't really matter if your company doesn't match all previous bullet points. As long as you are using Grok, you will struggle to squeeze 50+ different regex in a single or series of Grok Regex expressions in order to extract all the information you want, and, as long as your log writing team is not your Logstash maintaining team, your Grok configuration will get outdated for good and it will happen soon.

However, if your application produced JSON, that's it, all fields go through Logstash and end up in ElasticSearch without OPS intervention at all.

Alternative steps using JSON + Logback + Logstash + ElasticSearch:
  1. Logs are created in JSON, it's developer responsibility to choose what extra metrics needs to be extracted from the code itself. Even exceptions with stacktrace are single liners JSON documents in the log file.
  2. Log files are taken by FileBeat and sent to Logstash line by line. This configuration is written once and won't change much after that.
  3. Logstash takes these lines and send it to its index in ElasticSearch without any other processing, again, write once (for all applications, not even once per application).
  4. ElasticSearch take this information, index per application, day and priority, it will keep the extra fields that developers put in the logs in first place.
  5. When developers want to expose more fields, they don't need to bother anyone, if it's in the logs, they will be in ElasticSearch. (Maybe asking for a reindexing in ES every now and then, not too much).

Using JSON as your log format is, by all means, part of the solution I am presenting here.

I haven't yet explored other topologies, like using Fluentd instead of Logstash, or FileBeat sending to ElasticSearch directly, yet to explore.


Next: 3 - Logging best practices with Logback


No hay comentarios:

Publicar un comentario

Nota: solo los miembros de este blog pueden publicar comentarios.