Open in app

Sign In

Write

Sign In

Balachandar Paulraj
Balachandar Paulraj

296 Followers

Home

Lists

About

Pinned

Essential Considerations for Data Engineers When Selecting a NoSQL Database

In the realm of modern data engineering, the choices abound, and the stakes are high. Data engineers are the architects of the digital age, tasked with crafting the data foundations upon which businesses build their futures. …

NoSQL

3 min read

Essential Considerations for Data Engineers When Selecting a NoSQL Database
Essential Considerations for Data Engineers When Selecting a NoSQL Database
NoSQL

3 min read


Pinned

2022 : Modern Data Stack

You might have seen multiple posts around this subject as time keeps evolving and bringing changes into tech stack, however this includes recent discovery in data processing frameworks, visualization tools, ETL tools, Development notebooks, Data catalog..etc Over the time, we might have come across different terms like ETL, ELT, Reverse…

Datastack

5 min read

2022 : Modern Data Stack
2022 : Modern Data Stack
Datastack

5 min read


Pinned

DuckDB: Primer on the subject and fascinating highlights

Throughout our data engineering journey, we’ve come across a myriad of database management systems (DBMS). But what sets DuckDB apart from the rest? And is it worth delving into? Let’s embark on a quest for answers. What’s DuckDB? The original purpose behind DuckDB’s creation was to empower analytical query workloads and facilitate…

Duckdb

4 min read

DuckDB: Primer on the subject and fascinating highlights
DuckDB: Primer on the subject and fascinating highlights
Duckdb

4 min read


Sep 4

Key Database Compaction Strategies Used In Distributed System

In the realm of distributed database systems, the adoption of compaction strategies plays a pivotal role in the effective management of data storage. As the data landscape continues to evolve, a multitude of innovative compaction strategies have emerged, each catering to specific database technologies and their unique demands. …

Compaction

3 min read

Key Database Compaction Strategies Used In Distributed System
Key Database Compaction Strategies Used In Distributed System
Compaction

3 min read


Apr 3

Apache Paimon: A fresh face joins the fray

Recently, few people might have heard about Apache Paimon. Undergoing incubation at the Apache Software Foundation (ASF), Apache Paimon is being sponsored by the Apache Incubator. Apache Paimon, with its key features built around datalake storage with ACID characteristics and support for DML operations has joined the space with other…

Paimon

3 min read

Apache Paimon: A fresh face joins the fray
Apache Paimon: A fresh face joins the fray
Paimon

3 min read


Jun 28, 2022

Can Snowpark supersede Databricks and AWS EMR?

Introduction Ever since the introduction of Hadoop, computing power and storage are seen as two different entities (unlike OLTP databases where computing and storage are tightly coupled). For example, Map Reduce, Spark, Flink, Storm are frameworks to process big data in distributed environment, whereas HDFS, Cloud storage (S3, Azure Blob, Google…

Snowflake

3 min read

Can Snowpark supersede Databricks and AWS EMR?
Can Snowpark supersede Databricks and AWS EMR?
Snowflake

3 min read


Jun 13, 2022

Deep Dive into Windowing concepts in Apache Flink

Windows plays a major role and also defined as a core part in processing infinite streams. Windows split the incoming stream into buckets of finite size, over which required transformations can be applied. Though there are many components in windows that includes trigger, evictor, allowedLateness, sideOutputLateData…etc, this post focusses on…

Windows

4 min read

Deep Dive into Windowing concepts in Apache Flink
Deep Dive into Windowing concepts in Apache Flink
Windows

4 min read


Apr 16, 2022

Delta Lake Clones: Systematic Approach for Testing, Sharing data

Let’s begin with some issues faced in data engineering projects, followed by usage of Delta Lake clones and let’s take a final step by resolving issues. What’s clone in Delta Lake? It’s just a replica of a source table at a given point in time. In other database terminology, we…

Databricks

4 min read

Delta Lake Clones: Systematic Approach for Testing, Sharing data
Delta Lake Clones: Systematic Approach for Testing, Sharing data
Databricks

4 min read


Mar 16, 2022

Simplify ETL Pipelines using Delta Live Tables

Consider a common scenario of data engineering pipeline where raw data needs to be cleansed, transformed or aggregated before writing to a target system. For this case, usually we create 3-4 tables to store raw data, cleansed data, transformed data and aggregated data respectively. In order to implement, this needs…

Databricks

6 min read

Simplify ETL Pipelines using Delta Live Tables
Simplify ETL Pipelines using Delta Live Tables
Databricks

6 min read


Feb 14, 2022

Expedite Spark Processing using Parquet Bloom Filter

So, what’s Bloom filter? Bloom filter index is a space-efficient probabilistic data structure that is used to test whether an element is a member of set. It skips the values of chosen columns, particularly for fields containing arbitrary text. A Bloom filter can tell you if a key is in a set and with…

Databricks

3 min read

Expedite Spark Processing using Parquet Bloom Filter
Expedite Spark Processing using Parquet Bloom Filter
Databricks

3 min read

Balachandar Paulraj

Balachandar Paulraj

296 Followers

Big Data Habitue. Current stint at PlayStation. https://www.linkedin.com/in/balachandar-paulraj-b8a26727

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams