Use CasesWavefront Integrations

Observe Your Big Data Performance with Wavefront Analytics and New Hadoop Integrations

Why Hadoop Monitoring?

For Big Data applications in the cloud and otherwise, Hadoop is widely used for storing large data sets while efficiently running many concurrent applications on computer clusters. There are many elements of the Hadoop ecosystem, however, in this blog, I will focus on how to use Wavefront to analyze Hadoop performance metrics coming from its core components including HDFS, MapReduce, and YARN.

Understanding Hadoop performance metrics help you gain critical insights into Hadoop’s inner workings, empowering Hadoop admins and DevOps/developer teams to detect potential problems and investigate them earlier. For example, monitoring Hadoop provides visibility into supervising jobs execution and job resubmission actions. This insight helps Hadoop admins to alert on emerging cluster issues before they become a bottleneck for application/job execution. DevOps teams can use these restart, fail, entry, and exit data pipe metrics to let developers know when and why they should get involved.

In addition, a priority pain point for DevOps teams to address is how to efficiently manage resources. Metering Hadoop resource usage supports forecasting capacity and financial budgeting. With this insight, Hadoop operators can be vigilant in making many operational and resource management decisions including deciding whether to use private or public clouds.

The Wavefront platform ingests, analyzes, visualizes and correlates Hadoop metrics with all other metrics from the Hadoop ecosystem such as Spark, Kafka, Zookeeper, and Mesos.  Having full-stack visibility into the inner workings of Hadoop and its ecosystem is essential for DevOps teams. This visibility helps them to transition from reactive to proactive resource management and significantly reduces application and infrastructure troubleshooting times.

Hadoop HDFS Analytics

HDFS stores large data divided into smaller blocks. Data blocks are stored on computer clusters. Each cluster has NameNode and many DataNodes. NameNodes store file system metadata and are critical for cluster availability. DataNodes store actual data that is usually 3-fold replicated.

Wavefront’s HDFS integration comes with pre-packaged dashboards which Hadoop admins and DevOps teams can use to get important visibility into HDFS health and utilization, helping them understand capacity trends and isolate issues.

There are plenty of metrics to collect and visualize from each of the mentioned HDFS components. Below are some important metrics that can be monitored with Wavefront:

  • Remaining, FSNameSystem.CapacityRemaining – for tracking the remaining capacity
  • UnderReplicatedBlocks – if there are many blocks with anything less than 3 replicates, it might indicate a failing of the node
  • UnderFailuresTotal – a lot of failing machines might point to the failing cluster

Before you start pulling these metrics, you need to setup Wavefront’s HDFS integration, outlined here in four easy steps:

  1. Install the Telegraf agent.
  2. Install the Jolokia JVM-Agent on HDFS nodes.
  3. Configure the Telegraf Jolokia Input Plugin.
  4. Restart Telegraf.

For detailed setup instructions, visit the Wavefront HDFS documentation.

After HDFS metrics are flowing into Wavefront, its HDFS prepackaged dashboards will auto-populate as shown in the picture below:

Hadoop MapReduce Analytics

The MapReduce component is responsible for processing large data sets across many computer clusters. The main MapReduce functions include a “map” which creates key-value pairs from the input data and “reduce” which takes key-value pairs and processes the same key data.

There are many MapReduce metrics to choose from, as described here,  however, I will focus on some of the important ones that Wavefront can collect. These include mapsFailedrate, mapsRunning, reduceFailedRate , reduceRunning. Having them in Wavefront helps you get visibility into MapReduce health and performance.

Another set of metrics that you can collect include garbage collector metrics. Since Hadoop components are running on Java, it’s important to track garbage collection as it influences tasks performance. Important garbage collector metrics that Wavefront collects include:

  • GcCount – total GC count
  • GcTimeMillis – total GC time in msec

Setting up Wavefront’s Hadoop MapReduce integration can be done in 4 easy steps:

  1. Install the Telegraf agent.
  2. Create a script to gather Hadoop MapReduce metrics.
  3. Configure Telegraf EXEC Input Plugin.
  4. Restart Telegraf.

For detailed setup instructions, visit the Wavefront MapReduce documentation.

After setting up this integration, the MapReduce dashboard will auto-populate with metrics. It provides at-a-glance visibility into Cluster and Applications Health.  This insight helps to identify infrastructure bottlenecks, for example, a network bottleneck caused by excessive network traffic from “map” and “reduce” job tasks.

The combination of Wavefront analytics, the powerful Wavefront query language, and numerous Wavefront integrations helps teams see cross-process interaction issues that are often hidden and hard to detect. To learn more about Wavefront’s correlation capabilities, see one of our demo videos.

Hadoop YARN Analytics

For each application, YARN allocates computer resources via Resource Manager (per cluster) and launches resources via NodeManager (per node). An application is managed by ApplicationMaster (per Application).  The full list of Yarn component metrics is available from Cloudera, or other Hadoop vendors.  For example, an essential metric is appsFailed, which if increases quickly, can point to issues with services. You can see this and many other metrics using Wavefront’s Hadoop YARN integration. It can be setup in 3 easy steps:

  1. Install the Telegraf agent.
  2. Configure the Telegraf HTTPJSON Input Plugin.
  3. Restart Telegraf.

For detailed setup instructions, visit the Wavefront Yarn documentation.

The prepackaged Wavefront dashboard for YARN is populated automatically after integration setup and metrics detection. A sample YARN dashboard sample is shown in the image below.

Historical metrics data for YARN can help detect if a task runs longer than usual, as well as help find the anomaly cause.

I hope you find this blog useful. Go beyond these examples and explore for yourself the power of Wavefront analytics and our growing support of integrations. Start now with your free trial.

Get Started with Wavefront    Follow @nesgor    Follow @WavefrontHQ

Gordana Neskovic is a Senior Product Marketing Manager for Wavefront, now VMware. Previously Gordana was a Data Scientist - AI Data Solution Architect at Wells Fargo and Data Scientist at SFO - ITT and Pinterest. Her current interests are at the intersection of analytics, data science, and communications.