Deep Dive into Windowing concepts in Apache Flink

Windows plays a major role and also defined as a core part in processing infinite streams. Windows split the incoming stream into buckets of finite size, over which required transformations can be applied.

Though there are many components in windows that includes trigger, evictor, allowedLateness, sideOutputLateData…etc, this post focusses on data-driven & time-driven window operations.

Data-Driven Window

Data-Driven can also be known as Count Window, where window will be triggered once it reaches the point, where required amount (count) of data has been received from the stream. In Flink, this can be achieved through countWindow and countWindowAllfunction.

Examples for both countWindow and countWindowAll are as follows:

countWindowAll

Below code sums (using sum function) whenever number of elements entered the socket reaches 3 (implemented using countWindowAll(3)). Input Stream can be anything, however here Socket TextStream has been created using nc -lk 9999. To have it as an interactive environment, I have used flink-scala-shell (by running start-scala-shell.sh in local mode) for executing the code. However, this has been supported only in few versions of flink (refer flink-scala-shell). If your flink doesn’t support flink-scala-shell, please create a jar for the same piece of code and upload it to Flink UI.

//senv - Streaming Execution Environment in flink-scala-shellval dataStream = senv.socketTextStream("localhost",9999,' ')
val outputSink = dataStream.flatMap(_.split(" "))
.map(_.toInt).countWindowAll(3)
outputSink.print()
senv.execute("Program to implement countWindowAll")

countWindowAll can also be referred as Non-Keyed Window, due to the fact that windows are not grouped together using keyBy operation. In other words, there is no need to group the elements for this requirement.

countWindow

Also referred as Keyed Windows, where the elements are grouped together based on given condition. Below example shows the code for countWindow where elements are grouped using the condition % 2 . In contrast to countWindowAll , window will be triggered whenever respective window’s count reaches 3. Here, in this case, either odd or even number count reaches 3, it will be summed and displayed.

val dataStream = senv.socketTextStream("localhost",9999,' ')
val keyedDataStream = dataStream.flatMap(_.split(" "))
.map(_.toInt).keyBy(a => a%2)
val outputSink = keyedDataStream.countWindow(3).sum(0)
outputSink.print()
senv.execute("Program to implement countWindow through Even/Odd No")

Time-Driven Window

Flink supports different types of time windows like Tumbling time windows, Sliding time windows, Session time windows..etc. Out of which, Tumbling time and Sliding time windows are discussed below:

Tumbling Time Window

Tumbling Window (10 minutes)

Tumbling time defines a window of given time duration, that tumbles. It means that elements are grouped according to their arrival time, and every element belongs to exactly one window. This type of Window is non-overlapping, i.e the events in one window will not overlap in other windows. Above pictures shows a tumbling time window of 10 minutes where all elements that falls within a window are grouped together.

Below code creates a tumbling window of 5seconds. Within 5 seconds, due to keyBy operation, odd and event numbers are placed into separate windows and their repsective sums are computed.

val dataStream = senv.socketTextStream("localhost",9999,' ')
val keyedDataStream = dataStream.flatMap(_.split(" ")).map(_.toInt)
.keyBy(a => a%2)
//Creates tumbling window of 5 seconds
val tumblingWindow = keyedDataStream.timeWindow(Time.seconds(5))
val outputSink = tumblingWindow.sum(0)
outputSink.print()
senv.execute("Program to implement tumbling window")

Sliding Time Window

Sliding Window (20 minutes, slide 10 minutes)

Unlike Tumbling time window, sliding window slides over the incoming stream of data. Sliding window can be overlapping. Picture shows how window operations are carried out for sliding window of 20 minutes with slide of 10 minutes.

The same program discussed previously has been changed to develop sliding window of 20 seconds with 10 seconds slide.

val dataStream = senv.socketTextStream("localhost",9999,' ')
val keyedDataStream = dataStream.flatMap(_.split(" "))
.map(_.toInt).keyBy(a => a%2)
//Creates sliding window of 20 seconds with 10 seconds slide
val slidingWindow = keyedDataStream.timeWindow(Time.seconds(20)
,Time.seconds(10))
val outputSink = slidingWindow.sum(0)
outputSink.print()
senv.execute("Program to implement sliding window")

Conclusion

Though there are other concepts and other types of windows, I think this covers the basic and initial part of windowing in Flink. I’m planning to cover other concepts in a separate post. Meanwhile, please try the code provided here to understand the output for each window.

Happy Learning and please go through my other posts too!!!

--

--

--

Big Data Habitue. Current stint at PlayStation. https://www.linkedin.com/in/balachandar-paulraj-b8a26727

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

4 lesser known but awesome Flutter resources

The magic behind LibOS and Unikernels

Kubernetes and NFV

Margin is the money borrowed from a lender to purchase an investment and is the difference between…

HTML basic commands:

How to Increase the Chances of Your CV Getting Shortlisted?

Code Smell 06 — Too Clever Programmer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Balachandar Paulraj

Balachandar Paulraj

Big Data Habitue. Current stint at PlayStation. https://www.linkedin.com/in/balachandar-paulraj-b8a26727

More from Medium

Getting Started with Apache Spark, SparkSQL & Scala with Mac Terminal.

Big data file format a quick overview and evaluation

Big Data (Apache Hadoop and Apache Spark)

The Art of Apache Hadoop