Product CapabilitiesUse Cases

Engineering Tips Series: Create Metrics from Logs for Real-Time Cloud Application Monitoring Without Breaking Your Bank

By January 30, 2018 No Comments

Analyzing logs historically is quite useful for forensic analysis. However, highly-dynamic, modern, cloud-native applications require real-time analysis at scale with instant visibility. While log files contain a wealth of information on system and application activity, the parsing, processing and visual rendering of them is often a very slow and costly process, hindering your ability to real-time troubleshoot a high-velocity, cloud application environment.

To reach analytics speed and instant visibility, without the cost and efficiency burdens of traditional log processing, many Wavefront customers are creating metrics directly from logs. Our “Log Data Metrics Integration” provides a powerful tool for transforming log data into the type of time series metrics needed for real-time troubleshooting, trend analysis, and capacity planning.

One real-time analysis example for using metrics created from log data is a count of website status codes. This is often a good metric to start with, when investigating reports of web application problems.  The following screenshot illustrates the tracking of Apache status code over time; in this case, just codes in the 3xx – 4xx range.


Having captured this data as time-series metrics, one can easily evaluate what the current behavior is and compare it to past behavior.  In the above graph, we see that status code ‘404’ is occurring at a rate of over 100 times per second.  With just a few mouse clicks, you can easily discover if this is normal for this time of day by comparing these numbers to this time of day yesterday, or this day of the week last week/month/year.  With this data, you can quickly determine if this is a recurring, normal behavior or an anomaly that needs to be investigated and resolved. And this all comes without high costs often associated with extracting, storing and processing verbose log data.

Creating Metrics from Logs

Log file data enters Wavefront via the Wavefront proxy.  Wavefront supports two methods for sending log data metrics to the proxy: Filebeat and TCP.  The proxy is enabled for log file ingestion via simple changes to the configuration file:

filebeatPort=5044

rawLogsPort=5055

logsIngestionConfigFile=<wavefront_config_path>/logsIngestion.yaml

 

These entries instruct the Wavefront proxy to listen for log data in various formats: on port 5044 it listens using the Lumberjack protocol, which works with Filebeat, and on port 5055 it listens for raw logs data over a TCP socket as a UTF-8 encoded string (which works with Splunk, and many others).

If your organization does not use common logging tools ELK  (Filebeat) or Splunk, all that is required is a simple script or program that can provide a socket connection to “rawLogPort”, read log records, and feed them unchanged into the port.

Grok patterns (similar to Logstash) are used to extract your log data. To configure this, open the file you specified in the proxy configuration file – e.g.logsIngestion.yaml”.   The log metrics used in the above screenshot were produced with this configuration:

aggregationIntervalSeconds: 5  # Metrics are aggregated and sent at this interval
counters:
– pattern: ‘<%{BASE10NUM}>%{MONTH} %{MONTHDAY} %{TIME} %{HOST} %{DATA:procPin} %{DATA:ipPlaceholder}\s-\s- \[%{DATA:timestamp}\] %{QUOTEDSTRING:httpGet} %{BASE10NUM:status} %{GREEDYDATA:message}’
metricName: “httpStat-%{status}”

This simple configuration produced status code metrics from our test log file.  The data was aggregated over a 5 second period (configurable).  There is one entry every five seconds for each distinct status code – e.g. one entry named “httpStat-200 for log entries with status code ‘200’ containing a count for the last 5 seconds.

Grok is based on regex and was developed by Logstash, part of Elastic.  The development of the pattern needed to match log file records is greatly simplified with the use of the online Grok Debugger – the screenshot below shows the above pattern being applied to a sample log record.

The Grok Debugger site includes patterns for a number of common log formats – e.g. Linux syslogs – as well as definitions for the pattern names used in the above expression.

With a few simple configuration changes at the Wavefront proxy, log file data can become a very valuable source of metrics for troubleshooting, analysis, and forecasting.

If the metrics vs. logs topic is of interest to you, check out our logs to metrics blog series or read about a real-life example on the efficiency of extrapolating metrics from logs from one of Wavefront’s larger European customers. In addition, check out our recorded webinar to view a product demo, as well as explore the metrics vs. logs dilemma with actual use cases.

Get Started with Wavefront Follow @VmwareMike Follow @WavefrontHQ

Customer Success Engineer for Wavefront by VMware

@VmwMike