BLOG

Feed Life to Log Chunks

by Sysco LABS Blog 16 October 2019

Written by:

Ishara Madhavi – Intern Software Engineeing

 

Have you ever tried to read the logs from an app, web server or a system test? If you have, then you’d agree that it’s no easy task to understand the system behavior from reading these logs.

If you’ve ever been reading these logs, and find yourself thinking “Is there some way that these tiny letters on the terminals and command prompts can be brought to life?”

If these statements resonate with you, then the ELK Stack could be the answer to your problems!

Working with logs can be tedious, unless a proper mechanism is there to capture log behavior.

Recently, I was assigned the task to analyze a log script of a test run with the help of the ELK stack.

 

What is an ELK stack?

The ELK stack is the shorthand used to describe a stack containing three open-source project layers: Elasticsearch, Logstash, and Kibana.

Logstash fetches log data in the desired manner and Elasticsearch provides a platform to query through the log data. Kibana is the platform responsible for visualizing the jargon inside your log file.

This stack provides capability to better analyze log chunks, query through data and create visualizations for the log data. This process is a blessing for detecting any system error, and a guide to monitor system health.

 

A. Getting Started…

First things first! You’ll need to get your hands on the ELK stack. This link leads to the home page, where you can download Elasticsearch, Logstash and Kibana. Of course, you will also need a file that contains logs in order try it out.

This blog post will cover, my workings with the ELK stack in a Linux platform.

After downloading the 3 archives, extract them into separate folders and open the bin folders in 3 separate terminals. Use the following commands to start the Elasticsearch and Kibana processes.
./elasticssearch
./kibana
Before starting Logstash, a file has to be created inside the Logstash bin folder. Within this file, there should be 3 specific blocks: input, filter and output. The basic structure of the configuration file should be as follows.
input {}
filter {}
output {}

 

  1. Input

This section specifies the mode of the expected input. It can be a file, specified by a path or a standard input provided via the terminal.

 

  1. Filter

This is the most important job in the process. Here logs have to be categorized based on their formats, so that Logstash can be used to extract each log separately.

Do you remember how we assign a variable ‘x’ to hold any piece of data for future reference while writing any kind of program? Similarly, important features can be extracted in a log message and assigned fields which capture significant data inside any log message.

This filtering section specifies how the extraction of different log types should be done, based on their patterns throughout the log file. It uses filters (some of them mentioned below) for log extraction.

  • Grok Filter — Patterns of different Log types.
  • Mutate Filter — Removes unnecessary fields, converts field types as per the requirements in Kibana visualization.
  • Date Filter — Converts log-timestamp to @timestamp field, as per the requirements in Kibana visualization.
  • Aggregate Filter – Finds the elapsed time between start and end events.

By now you may have questions such as:

 

How do I Create a Grok Filter?

Filtering a message is equivalent to writing a pattern that can capture the structure of the message, and the Grok Filter does most of it.

An example is shown below. The first block contains a sample INFO (log message from the log file). The second block contains a Grok Filter containing a regular expression specifying the pattern of the message.

Inside the match block, “message” points out a Grok pattern. A full list to the grok patterns available can be found here. You can check the correctness of the regular expression you write with this tool.

 

What if I need my own patterns?
Simple! Create your own pattern in a configuration file and save it inside logstash-7.2.0/vendor/bundle/jruby/2.5.0/gems/logstash-patterns-core-4.1.2/patterns with any name, in the following format.

How do I categorize logs in filtering?

Let’s say you need to find the response logs, request logs, info logs and error logs inside your log file separately. All you need to do is identify a way to differentiate these logs. If your log messages already contain the words “RESPONSE”,”REQUEST”,”INFO” and “ERROR”, the job is pretty easy.

Using conditional blocks, you can group and add tags to your logs. Let’s say you need to identify the INFO logs group, in which an example log chunk is given above. All you need to do is, write an If block to check if it is an INFO log chunk, and if so, apply the Grok patterns to match it. To specify that it’s an INFO message, you can add a tag.

We can grasp an understanding on filters, in great scope, through this documentation.

 

What is a Mutate Filter?

The Mutate filter is used to convert field data types and to remove unnecessary fields captured under the Grok filter.

 

What does a Date Filter do?

Generally, what the developer needs is not the timestamp of the Logstash records, but the timestamps inside the log messages.

When extracting that particular timestamp in any Grok Format, it automatically becomes “String”. This is an issue, when creating visualizations from data in Kibana. Thus, it is a better practice to make this log timestamp into a “Date” type. To do so, the best method is to match the log-timestamp to the “@timestamp” target. The Date Filter does this job neatly.

How do I find the Time Duration between Two Events?

This can be calculated using the “aggregate” filter plugin. To find the elapsed time between two events, there should be 2 types of tags indicating a “start-event” or “end-event” in the logs. Also a unique field should exist to distinguish that this particular “start-event” is related to that particular “end-event”.

Time mapping is done between the “@timestamp” fields. If you need to know the elapsed time in milliseconds, following this trick will be helpful.

 

  1. Output

This section specifies, the mode of the expected output and the Elastic Search Index name which will hold the configured log data…

…combing all 3 sections into one file and saving it in any name. But with the .conf extension, we can create a Logstash configuration file.
Once done, check if you are receiving the desired output by running the following command in a terminal inside the bin folder of Logstash.
./logstash -f test1.conf

 

B. Next, Create an Index Pattern

Go to localhost:5601. This contains the Kibana Visualizations for the log data. Prior to the log data visualization, an index pattern needs to be created. This will represent the configured log data from Logstash.

This can be started with the option “Connect to your Elasticsearch index” (which can be found in the bottom right side on the home page of Kibana). Select the name of the index inside the configuration file and create the index.

 

C. Create New Visualization

Select the index pattern, which was created before, to create new visualization. A window similar to the one below should appear:

1 — This acts as the main filter. If a filtering isn’t specified here, all logs captured will appear in the visualization.  KQL (Kibana Query Language) or Lucene can be used to write a query for visualization.

2 — This specifies what metric/metrics are to be taken for the Y-axis of your visualization.

3, 4 — These contain buckets for the visualization. 3 specifies the bucket to be taken as the X-Axis. A date histogram aggregation for the X-Axis is what’s commonly used in the visualizing logs.

4 — is used to add sub-buckets (this is optional). If you need to highlight certain special fields in your logs, you could use the filter aggregation in the sub buckets under the split-series option. This will require writing a Lucene or KQL query to filter the results.

By clicking “Apply changes” and “Update” on the page, the visualizations of the logs will appear on the right side panel.
I will post a few of my created visualizations below so you can get an idea of what it looks like. I do recommend you try this on your own as well.


Remote Connections to Kibana
To allow connections from remote users to your Kibana, go to kibana-7.2.0-linux-x86_64/config/kibana.yml and change server.hosts as follows, and uncomment line.server.host: "0.0.0.0"

 

Visualize live logs in Kibana

Using Logs UI, not just the saved logs, but the live log visualization can be viewed as well!

 

Final thoughts…

I have experienced how efficient and convenient this mechanism is in identifying possible reasons behind system errors.

The time-consuming part is writing the Logstash configuration file and identifying patterns in logs (in order to classify and extract significant data chunks inside the logs). The rest of the process is interesting and a piece of cake.

 

 

 

References

https://www.elastic.co/guide/en/logstash/current/pipeline.html?source=post_page—–a968bb95ad58———————-

https://grokdebug.herokuapp.com/patterns?source=post_page—–a968bb95ad58———————-#

https://regexr.com/?source=post_page—–a968bb95ad58———————-

https://discuss.elastic.co/t/timestamp-in-microseconds/70369/4?source=post_page—–a968bb95ad58———————-

https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html?source=post_page—–a968bb95ad58———————-

https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html?source=post_page—–a968bb95ad58———————-#plugins-filters-date-match

https://www.elastic.co/guide/en/kibana/6.7/logs-ui.html?source=post_page—–a968bb95ad58———————-

Leave a Comment

Tags