Cloudera has announced that it has further matured Apache Spark integration within Apache Hadoop environments, with critical achievements around usability and interoperability throughout the past year.
Due to its development ease and flexible data processing, Spark has soared in popularity within the open source community and across customer use cases. It is the most active project in the Apache Software Foundation (ASF), with more than 800 developers from more than 200 companies. Cloudera’s team of Spark committers have been actively driving the enterprise capabilities of Spark and uniting Spark within Hadoop to meet customer needs and further production adoption.
”The embrace of Spark by the developer community and Cloudera’s efforts in the past year to drive its mainstream adoption have been nothing short of remarkable,” said Doug Cutting, chief architect at Cloudera. “With the most customers running Spark with Hadoop, we have already made impressive strides in furthering the enterprise capabilities of Spark for Hadoop deployments across industries and use cases. With the addition of Spark SQL and MLlib to Cloudera’s platform, and a clear roadmap with the One Platform Initiative, Spark adoption will continue to soar for batch, streaming, and machine learning use cases.”
As more customers aimed to take advantage of Internet of Things and real-time streaming data, they needed an enterprise-grade stream processing engine to support their applications. To address this, Cloudera led development on Spark Streaming resiliency, ensuring zero data loss and bringing it up to production standards. This critical improvement, paired with the integration of Apache Kafka within the platform, has allowed Cloudera customers to build complete IoT applications within a unified platform and has had a drastic impact on Spark Streaming adoption overall.
Driving Broad Customer Adoption
With the most experience supporting Spark as part of Hadoop, Cloudera has more customers running Spark on Hadoop than all other vendors combined and powers some of the largest multi-tenant Spark clusters today, including deployments over 800 nodes.
With over 170 customers running Spark across a vast range industries, including finance, healthcare, retail, and insurance, Cloudera has helped customers embrace a wide range of next-generation use cases, including:
- Cox Automotive: Leading provider of products and services for automotive dealers and car buyers, moved from hourly analytics to real-time insights into ad campaigns using Spark Streaming
- PRGX: World’s leading provider of accounts payable recovery audit services, stated Spark’s flexible, performant data processing has been a “saving grace” and resulted in a 9-10x performance improvement compared to legacy systems
- Online Retailer: Leveraged Spark to reduce data processing time by 30% and to take advantage of real-time trends for greater engagement
- Allstate: One of the nation’s largest insurance providers, uses Cloudera and Apache Spark to combine more than 80 years of data for highly refined pricing models
- RelayHealth: Healthcare technology solution provider and subsidiary of McKesson, builds predictive models for when payments to healthcare providers will be received, improving their cash flow. The company processes healthcare payment interactions between 200,000 physicians, 2,000 hospitals, and 1,900 health plan subscribers
- Barclays: Multinational banking and financial services company, builds an insights engine that securely analyzes previously disparate transaction data and delivers relevant insights to Barclays customers in an easily digestible manner
In addition, Cloudera’s Accelerator Program for Spark has driven dozens of robust Spark applications and integrations with the leading third-party tools, further expanding the capabilities of Spark to customers. Key partners include Datameer, Informatica, Oracle, Paxata, Pentaho, Platfora, StreamSets, Syncsort, and Talend.
“The opportunity for Informatica and Cloudera to work together to further the development and deployment of Apache Spark alongside Hadoop is great for our joint customers,” said Sanjay Krishnamurthi, senior vice president and chief technology officer, Informatica. “These customers are leveraging Spark inside Informatica’s Big Data Management platform to deliver trusted analytics at scale. Together with Cloudera, we are providing high-speed discovery of data assets for holistic big data governance and security and for simpler big data integration, which ensures trust in the face of ever-growing data volumes.”