Bringing Batch, Streaming, and Actual-Time Collectively

June 30, 2025

64

(Summit Artwork Creations/Shutterstock)

Firms have lengthy dreamed of a single information platform that may deal with their real-time and batch information and workloads, with out some clunky interface in between. Nonetheless, until your title is Uber or Netflix, you most likely don’t have the engineering assets to construct a contemporary Lambda structure your self.

When he was an engineer at Confluent, Hojjat Jafarpour took a shot at making a system that moved the ball ahead on Apache co-creator Jay Kreps’ dream of a Kappa structure that solved the Lambda dilemma by reimagining the whole lot as a stream. The consequence was the 2017 launch of kSQL, which offered a SQL interface atop information flowing via Kafka. By itself, kSQL didn’t create a Kappa structure, but it surely stuffed an vital hole.

Whereas kSQL simplified lots of issues, it nonetheless had its limitations, stated Jafarpour, who was the engineering lead on the kSQL challenge. For starters, it was tightly coupled to Kafka itself. Should you needed to learn and write to different streaming information platforms, equivalent to AWS’s Amazon Kinesis, you then had been again to doing software program engineering and integrating a number of distributed techniques, which is tough.

After transferring up right into a extra customer-facing position at Confluent, Jafarpour gained a brand new understanding of the kinds of issues that prospects and prospects actually needed out of their information infrastructure. When it got here to getting worth out of real-time processing techniques, real-world firms continued to precise frustration with the continued expense and complexity that it entailed.

DeltaStream Founder and CEO Hojjat Jafarpour

That’s what motivated Jafarpour to leap ship in 2020 and located his personal firm, known as DeltaStream. Jafarpour wasn’t prepared to surrender on SQL or Kafka, however as an alternative he needed to construct a greater abstraction for a stream processing product that rode atop present Kafka and Kinesis pipelines on the market.

To information improvement at DeltaStream, Jafarpour took his inspiration from Snowflake, which managed to offer a really clear interface for its subtle cloud information warehouse.

“We wish to make it tremendous easy so that you can use it, take away the entire complexity of operations and infrastructure, and also you simply come and use it and get worth out of your information,” Jafarpour advised BigDATAwire on the latest Snowflake convention. “The concept was to construct one thing related on your streaming information.”

As a substitute of reinventing the wheel, Jafarpour determined to construct DeltaStream atop the very best stream processing engine that existed out there: Apache Flink.

“We get the facility of Flink, however we summary the complexity of that from the person,” Jafarpour stated. “So the person doesn’t need to take care of the complexity, however they’d have the ability to get the scalability, elasticity and the entire issues that Flink brings.”

Jafarpour noticed that one of the crucial widespread use circumstances for Flink deployments is to course of fast-moving information to make sure that downstream dashboards, functions, and user-facing analytics are stored up-to-date with the freshest information attainable. That always meant taking streaming information and loading information into some sort of analytics database, the place it may be consumed as a materialized view.

“Lots of use circumstances folks would run Flink with one thing like Postgres, Clickhouse, or Pinot, and once more, you have got two completely different techniques to handle,” Jafarpour stated. “As I stated, we needed to construct a whole information platform for streaming information. We noticed that lots of streaming use circumstances want that materialized view use case. Why not make it as a part of the platform?”

So along with Apache Flink, DeltaStream additionally incorporates an OLAP database as a part of the providing. Clients are given the choice of utilizing both open supply Clickhouse or Postgres to construct materialized views to serve downstream real-time analytics use circumstances.

“The great factor is that we’re a cloud service, so underneath the hood we are able to herald these parts and put them collectively with out prospects having to fret about it,” Jafarpour stated.

DeltaStream, which raised $10 million in 2022, has been adopted by organizations that have to ingest giant quantities of incoming information, from IoT sources or change information seize (CDC) logs. The corporate has prospects in gaming, safety, and monetary companies, stated Jafarpour, who beforehand was an engineer at Informatica and Quantcast and has a PhD in laptop science.

Earlier this month, the Menlo Park, California firm rolled out the following iteration of the product: DeltaStream Fusion. The brand new version offers prospects the power to land information into Apache Iceberg tables, after which run queries in opposition to these Iceberg tables.

DeltaStream Fusion makes use of Flink, Spark, and Clickhouse for streaming, batch, and real-time use circumstances

To energy DeltaStream Fusion, Jafarpour surveyed the varied open supply engines out there available on the market, and picked the one he thought was greatest fitted for the job: Apache Spark.

“Spark is the right software for batch. Flink is sweet for streaming, although each of them wish to do the opposite facet,” Jafarpour stated. “The great factor is that we abstracted it from the person. If it’s a streaming question, it’s going to compile right into a Flink job. If it’s a question for the Iceberg tables, it’s going to make use of Spark for operating that question.”

Paradoxically, Confluent itself would comply with Jafarpour’s lead by adopting Flink. In 2023, it spent reported $100 million to purchase Immerok, one of many main firms behind Apache Flink into its choices. The corporate hasn’t fully deserted kSQL (now known as ksqlDB), but it surely’s clear that Flink is the strategic stream processing engine at Confluent at present. Databricks has additionally moved to assist Flink inside Delta Lake.

Jafarpour is philosophical in regards to the transfer past kSQL.

“That was one of many first merchandise in that house, and often while you construct a product the primary time, you make lots of choices that a few of them are good, a few of them are dangerous, relying on the state of affairs,” he stated. “And as I stated, as you construct issues and as you see how individuals are utilizing it, you’re going to see the shortcomings and energy of the product. My conclusion was that, okay, it’s time for the following technology of those techniques.”

Associated Objects:

Slicing and Dicing the Actual-Time Analytics Database Market

Confluent Expands Apache Flink Capabilities to Simplify AI and Stream Processing

5 Drivers Behind the Fast Rise of Apache Flink