Apache Kudu allows users to act quickly on data as-it-happens
Cloudera is aiming to simplify the path to real-time analytics with Apache Kudu, an open source software storage engine for fast analytics on fast moving data.
“Real-time data analysis has been a challenge for enterprises because it required a complex lambda architecture to merge together real-time stream processing and batch analytics. Kudu dramatically eases that architecture with a single storage engine that addresses both needs,” said Charles Zedlewski, senior vice president of products at Cloudera. “The high-demand workloads in place today, which include a growing number of new machine-learning models, can identify cybersecurity threats, predict maintenance issues in the Industrial Internet of Things (IIoT), and bring much more accuracy to all types of online reporting.”
Kudu was designed from the ground up to take advantage of innovation in the hardware landscape, which has seen solid state storage, memory, and RAM become more affordable. As a standalone storage engine, Kudu has already proven itself for mission-critical production use in clusters with hundreds of nodes handling many millions of inserts per second. Kudu is purpose-built to enable use cases that require fast, large-scale analytic scans while supporting rapidly updating data – necessary for handling time series data, machine data analytics, online reporting, or other analytic or operational workload needs.
“Apache Kudu is a prime example of how the Apache Hadoop® platform is evolving from a sharply defined set of Apache projects to a mixing and matching of open source and proprietary technologies that form, in essence, a big data operating environment,” said Tony Baer, principal analyst at Ovum. “Kudu bypasses the hurdles associated with complex lambda architectures to address use cases involving fast-changing data, where the ability to rapidly modify and update the database are critical.”
Beta programs for select Cloudera customers, directly and through partners, have driven Kudu into critical production environments. Further adoption is anticipated among Cloudera’s customer base to address the ever-increasing number of use cases that require real-time analytics.
“Achieving compliance and operational reporting alongside analytical success requires both the ability to process large amounts of data to find trends, and to detect and respond to anomalies quickly,” said Michael Reed, director of enterprise information management at Meridian Health. “We’re excited about the potential of Kudu to allow us to do analytical and real-time operations in a single place to help us to simplify the systems that we build.”
In addition to Kudu, Cloudera 5.10 (and the release of Cloudera Director 2.3) continues to enhance enterprise-grade capabilities for cloud deployments and improve cost-efficiencies in these environments. New capabilities include:
- Reduced operating costs for batch processing on transient workloads with improved performance of Apache Hive on Amazon S3
- More comprehensive auditing and lineage in the cloud with single-cluster Cloudera Navigator support for Amazon S3
- Reduced time to deploy initial use case with faster first run deployments across cloud environments
In September of 2015, Cloudera announced the public beta release of Apache Kudu, and two months later, Cloudera donated Kudu to the Apache Software Foundation (ASF) to open it to the broader development community – garnering contributions from engineers at State Farm, Xiaomi, Intel, and others. Kudu is now generally available and shipping as a standard component of Cloudera Enterprise, giving customers a robust set of storage engines – NoSQL, HDFS, object store, and relational – to meet the specific needs of their use case.