Skip to main content

6 posts tagged with "apache-spark"

Spark tag description

View All Tags

Spark Execution Modes

· 4 min read
Vibhavari Bellutagi
Data Engineer

In this post, we will discuss the different execution modes available in Apache Spark. Apache Spark provides three execution modes to run Spark applications. These execution modes are: Cluster mode, Client mode, and Local mode. Each of these modes has its own use case and is suitable for different scenarios.

Handling Nulls in Spark

· 11 min read
Vibhavari Bellutagi
Data Engineer

In SQL null or Null is a special marker used to indicate that a data value does not exist in the database. A null should not be confused with a value of 0. A null indicates a lack of a value, which is not the same as a zero value.

For example: Consider the question "How many books does Krishna own?" The answer may be zero (we know that he owns none) or null (we do not know how many he owns).

Let's deep dive into handling nulls in Spark.

Columns and Expressions

· 4 min read
Vibhavari Bellutagi
Data Engineer

Apache Spark's Column and Expression play a big role in making your pipeline more efficient. In this blog we will look into ALL the possible ways to select columns, use built-in functions and perform calculations with column objects and expressions in PySpark. So, whether you build an ETL pipeline or doing exploratory data analysis, these techniques methods will come in handy.

Introduction to Apache Spark

· 5 min read
Vibhavari Bellutagi
Data Engineer

Welcome to my Apache Spark series! I’ll dive deep into Apache Spark, from basics to advanced concepts. This series is about learning, exploring, and sharing—documenting my journey to mastering Apache Spark ( again ) while sharing insights, challenges, and tips.

In this first post, we’ll cover the fundamentals of Apache Spark, its history, and why it’s a game-changer in data engineering.

Find all the blogs in the series here.