Powering IoT Data Management Through Open Source: A Take from Major Cloud Players

Subscribe To Download This Insight

1Q 2020 | IN-5752

The enterprises’ drive for digitalization and a rapid move to the cloud platform has intensified the focus on data-centric solutions. Putting the Internet of Things (IoT) into perspective, the age of the hardware-centric approach is passing, and therefore, more market opportunities for software and data-enabled IoT applications are opening up. While the explosion of Big IoT Data, Machine Learning (ML), and Real Time technologies are the ultimate disruptive buzzwords for the industry, the competitive landscape is tightening among top vendors, which are racing to provide sophisticated End-to-End (E2E) IoT data management solutions. Meanwhile, the other important driver for the IoT technology democratization is open-source project, which are widely used by the developer’s community, but also in recent years have been adopted by significant cloud vendors.

Registered users can unlock up to five pieces of premium content each month.

Log in or register to unlock this Insight.

 

The Role of the Open Source in the Top Vendors IoT

NEWS


The enterprises’ drive for digitalization and a rapid move to the cloud platform has intensified the focus on data-centric solutions. Putting the Internet of Things (IoT) into perspective, the age of the hardware-centric approach is passing, and therefore, more market opportunities for software and data-enabled IoT applications are opening up. While the explosion of Big IoT Data, Machine Learning (ML), and Real Time technologies are the ultimate disruptive buzzwords for the industry, the competitive landscape is tightening among top vendors, which are racing to provide sophisticated End-to-End (E2E) IoT data management solutions. Meanwhile, the other important driver for the IoT technology democratization is open-source project, which are widely used by the developer’s community, but also in recent years have been adopted by significant cloud vendors.  There are several vendors, which centered their commercial strategy around open source and showed the long-time commitment of open-source driven monetization approaches:

  • Cloudera launched Cloudera Streaming Analytics (CSA), which is a new subset of the Cloudera Data Flow (CDF) platform. The CSA is enabled by the Apache Flink stream processing engine, which operates on the preexisting Hadoop cluster and enables the processing/streaming of Big IoT Data. Before this announcement, Cloudera already offered Apache Spark and Spark streaming functionality, which similarly run on the Hadoop cluster. However, with the Flink offering, Cloudera hopes to position itself as a consolidated E2E IoT platform, since Flink exemplifies more superior architecture with greater processing capacity (Kappa over Lambda) compared to Apache Spark.
  • Confluent is a vendor focused on an open-source mature project, such as Apache Kafka, which enabled the company to build scalable infrastructure for data management and streaming technologies. Despite its open-source-centric strategy, Confluent still provides a commercial layer and proprietary toolkit. For example, Confluent Control Center for Management and Monitoring is a proprietary set of tools available as an upgrade option, which nevertheless still builds on the Apache Kafka messaging service.

Understanding some Apache Offerings

IMPACT


The Apache Foundation was founded in 1999 in Wakefield, Massachusetts. Currently, it includes more than 350 projects and initiatives with its own Apache Incubator and more than US$20 Billion in valuation of the Apache Open Source Software products. Its product offerings are distributed by Apache Licensing and its Free and Open-Source Software (FOSS). The popularity of the Apache is driven by its open-source community, whereas its enterprise level popularity rapidly grew with the development of digital services and fears of vendor lock-in. From the other side, Apache’s success and wide use within IoT is driven by the cost concerns of proprietary technology development alongside the cost for scalability; the open source infrastructure could further enable interoperability and ease the integration of the other products without extra investment. When it comes to data management, IoT, and real-time technologies, Apache Projects have a number of interesting offerings to consider:

Apache Kafka: Apache Kafka is a purpose build real-time data pipeline with mixed architecture, which supports parallelism of batch and real-time streaming of data. One of Kafka’s differentiating features is its ability to act as a middleware between real-time data pipelines. Additiocnally, it is fully integrated with the system, such as Spark, Storm, and cloud-native CEP systems.

Apache Spark: Apache Spark is a high-speed parallel in-memory data processing engine with an extensive Application Programming Interface (API) toolbox, which enables streaming ML or SQL workload (enabling quick integration of datasets). Comparatively, Spark is superior to Storm due to its ability to apply the same code to batch and streaming data, as well as higher speed to messaging, which arguably decreases cost (through less down time) and eases the use cases for developers.

Apache Storm: Apache Storm is a very low latency stream processing engine, enabling batch and micro batch processing, which is also suitable for near-real-time processing of IoT data

Apache Flink: Apache Flink is an open source distributed streaming processor, which is powering large scale (multi-node throughput) IoT data, where the streaming flow is supported by batch, streaming, ML, and graph processing libraries. One of Flink‘s main points of differentiation is its streaming computation model, , while Apache Spark performs a micro-batching processing model.

Cloud Vendor's Approach to Open Source Technology

RECOMMENDATIONS


IBM’s acquisition of the Red Hat is leading industry attention to the open source solutions, followed by Cloudera’s merger with Hortonworks. Arguably the adoption of the open source by cloud vendors is driven by the cost incentive to reduce the capital investment into solutions for the end users. Practically it is not entirely right; since open-source is not free, there is still the need to buy a license, as well as potentially invest in the expansion and customization of the project.

There are number of business approaches and models of how cloud vendors are approaching open source:

  • In perspective, IBM and Confluent are pioneering the business model based on open source technology. IBM has invested more than US$1 million into open source projects, like Apache Spark, instead of proprietary variations of the technology. The investment in open source enables opportunity to launch microservices across a number of verticals and monetize purely on toolsets of proprietary origin.
  • The second approach is to add a commercial layer (i.e., toolkit) on top of the open source offering. For example, Cloudera Data Flow (CDF) is offering Storm, Spark Structured Streaming, and Kafka Streams, with the recent addition of Flink. The enterprise can utilize the Apache basic functionality, or have Cloudera’s proprietary upgrade, which will increase the availability of various tools, services, and data management functions. Such an approach is one of the most widely used among cloud vendors, which allows them to appeal to Small and Medium-sized Enterprises (SMEs) due to the lower cost of open source, monetize, and the tailor services to specific verticals and business.
  • The third strategic approach is to appeal to open source community through active contribution and industry-wide standardization. Cloud vendors can position themselves as a driving force for sustainable long-term architecture, which guarantees a positive community perception by; a) avoiding vendor lock-in rhetoric and criticism, and b) increasing standardization and interoperability of the technologies, which provides seamless integration. The second option is especially attractive, since the vendor can gain traction by increasing interoperability and enabling wider IoT ecosystem creation. Such an approach would allow them to monetize orchestration and data management services by leveraging multi-cloud and hybrid capabilities. Microsoft Azure’s strategy clearly involves making its IoT offering attractive to a wider range of enterprises and the Information Technology/Operational Technology (IT/OT) community. Additionally, there is a growing demand for multi-cloud or hybrid infrastructure; Azure is diversifying its proprietary technologies revenue by offering open source. Containers offer a lightweight compute choice that has a degree of portability if present in a multi-cloud infrastructure. This is particularly the case if containers are managed via open-source orchestration tools, such as Kubernetes.