6 posts tagged with "apache-spark"

Spark tag description

The Life Cycle of a Spark Application ( Outside )

February 7, 2025 · 3 min read

Data Engineer

In this blog, we will go in-depth on the overall life cycle of Spark Applications from outside the actual Spark code. Before going ahead, I recommend reading the Execution Modes of the Spark application.

Spark Execution Modes

February 6, 2025 · 4 min read

Vibhavari Bellutagi

Data Engineer

In this post, we will discuss the different execution modes available in Apache Spark. Apache Spark provides three execution modes to run Spark applications. These execution modes are: Cluster mode, Client mode, and Local mode. Each of these modes has its own use case and is suitable for different scenarios.

Under the hood of a Spark job

January 21, 2025 · 5 min read

Vibhavari Bellutagi

Data Engineer

Understanding the internal execution flow of a Spark application is key to optimizing performance and debugging. This blog dives into the details of Spark jobs, stages, and tasks, providing a thorough exploration of how Spark handles distributed execution.

Handling Nulls in Spark

January 13, 2025 · 11 min read

Vibhavari Bellutagi

Data Engineer

In SQL null or Null is a special marker used to indicate that a data value does not exist in the database. A null should not be confused with a value of 0. A null indicates a lack of a value, which is not the same as a zero value.

For example: Consider the question "How many books does Krishna own?" The answer may be zero (we know that he owns none) or null (we do not know how many he owns).

Let's deep dive into handling nulls in Spark.

Columns and Expressions

January 10, 2025 · 4 min read

Vibhavari Bellutagi

Data Engineer

Apache Spark's Column and Expression play a big role in making your pipeline more efficient. In this blog we will look into ALL the possible ways to select columns, use built-in functions and perform calculations with column objects and expressions in PySpark. So, whether you build an ETL pipeline or doing exploratory data analysis, these techniques methods will come in handy.

Introduction to Apache Spark

January 1, 2025 · 5 min read

Vibhavari Bellutagi

Data Engineer

Welcome to my Apache Spark series! I’ll dive deep into Apache Spark, from basics to advanced concepts. This series is about learning, exploring, and sharing—documenting my journey to mastering Apache Spark ( again ) while sharing insights, challenges, and tips.

In this first post, we’ll cover the fundamentals of Apache Spark, its history, and why it’s a game-changer in data engineering.

Find all the blogs in the series here.