Designing Data Platforms to Harness the Power of Fog Computing

Learn how to harness the power of the cloud all the way out to the edge to build the dynamic real-time systems of tomorrow.

Ryan Gross

--

Image Source: Wikimedia

Note: This article originally appeared in The BI Journal Volume 26 Issue 1 from TDWI under the title “Unleash the Power of Fog Computing”

The Current State of Data Processing Systems

For the last 5 years, enterprises have been scrambling to centralize their analytics processing on the cloud (hence the current $20B+ valuations of Databricks & Snowflake). Most recently, all of the major data platform vendors have converged their messaging around the concept of a “LakeHouse” architecture that takes the best attributes from traditional data warehouses and enables them to run on platforms with data lake storage architectures. For near-real-time scenarios, several streaming platforms have been built as well (e.g. Storm, Spark Streaming, Pulsar, and Flink). These systems also adopt a cloud-based, centralized architecture and assume that data ingestion will direct edge streams to cloud message brokers like Kafka[HP1] [GR2] , Kinesis, or Event Hubs. These systems have been able to scale to handle petabytes of data, but often at great cost.

[Figure 1: Lakehouse Architecture Diagram]: The Data LakeHouse paradigm seeks to leverage the best of Lakes and Warehouses. However, they are still fully-centralized architectures reliant on ingestion. Image by Author

Pressures on the current model

As value is realized through data analytics within organizations, the pressure to increase the volume of data being processed grows. This leads to increasing complexity and non-linear costs to process ever increasing volumes of data. For instance, reprocessing a full dataset to add a calculated column or correct a bug is easy with a few gigabytes of data, but is extremely expensive over a petabyte.

Moore’s law allows 41% more data to be processed (Cross, 2016), compounded annually, while data is growing at a compounded annual growth rate of 61%”[GR2] (Patrizio, 2019). Playing these growth rates forward, if we assume that there is enough power to process all of the useful data in the world today (which likely isn’t…

--

--

Ryan Gross

Emerging Tech & Data Leader at Credera | Interested in how people & machines learn, and how to bring them together.