Full-Stack Troubleshooting with Analytics/AI

Resolve Incidents Faster Using Shared Insights

Troubleshooting distributed cloud applications is not trivial due to many moving parts, application dependencies, and frequent code updates. VMware Tanzu Observability by Wavefront collects all metrics in one place from your applications, clouds, and infrastructure. With all metrics data in Tanzu Observability, SREs and developers can:

  • Quickly navigate and isolate any production issues using the Tanzu Observability Query Language and its 100+ functions, operators, etc.
  • Collaboratively troubleshoot an incident and share live views with a single click
  • Easily add custom metrics to your production code to rapidly isolate newly deployed code issues


Automate Anomaly Detection with AI and Machine Learning

Finding a needle in the haystack is not easy when dealing with distributed cloud applications and containerized microservices emitting thousands of metrics. AI Genie™ helps you automatically identify “unknown unknowns” so you can quickly get to an incident's root cause – isolate applications, infrastructure, cloud, and edge.

AI Genie uses machine learning-based anomaly detection, helping you to quickly isolate anomalous metrics across all system components.

You don’t need any statistics or analytical background to utilize AI Genie’s anomaly and forecasting functions. Highlight anomalies and predict capacity bottlenecks using AI/ML.

Detect Root Cause Quicker with Single Place for All Metrics and Analytics

Tanzu Observability collects all metrics and events.  It can help you zero-in on the root cause quicker and to lower MTTR. Use Tanzu Observability to:

  • Get full context from key metrics and drill into details with 1-sec granularity. For instance, during the increased request load, see what’s causing CPU utilization spikes or what’s most impacted – for example, MySQL.
  • Perform historical analysis and retain troubleshooting data at full resolution for 18 months
  • Accelerate code releases by pinpointing issues with unified views of metrics and events for all microservices
  • Utilize metrics tagging to quickly search and iterate through all your performance data – from system wide to individual hosts/machines


Triage Incidents Faster with the Tanzu Observability Alert Viewer

Between sorting through noisy alerts and the complexities created by distributed applications running on ephemeral containers and multicloud, DevOps and SRE teams find themselves overwhelmed as they try to reduce MTTR. With the Tanzu Observability Alert Viewer, they can:

  • Get the full view of an incident immediately with its related alerts and events, surfaced via AI/ML algorithms
  • Accurately assess incident impact, and correlate priority with highlighted shared point tags
  • Speed up time from detection to triage, isolation, and remediation for all types of incidents
“[Tanzu Observability]’s powerful query language allows us to easily visualize and debug our time series telemetry data. It’s well-tuned alerting helps us lower MTTD. Our engineers customize their own metrics to fully monitor their systems’ health and performance.” - Jing Zhao, DevOps Software Engineer, DoorDash

 

Learn more

Previous
Accurate, Actionable Alerts with Analytics/AI
Accurate, Actionable Alerts with Analytics/AI

Next
Optimize Performance
Optimize Performance