The hidden cost of shuffle – MapReduce

In the previous post, Introduction to batch processing – MapReduce, I introduced the MapReduce framework and gave a high-level rundown of its execution flow. Today, I will focus on the details of the execution flow, like the infamous shuffle. My goal for this post is to cover what a shuffle is, and how it can impact the performance of data pipelines. The Cost ofContinue reading… The hidden cost of shuffle – MapReduce

Things you’re probably not using in Python 3 – but should

Many people started switching their Python versions from 2 to 3 as a result of Python EOL. Unfortunately, most Python 3 I find still looks like Python 2, but with parentheses (even I am guilty of that in my code examples in previous posts – Introduction to web scraping with Python). Below, I show some examples of exciting features you can only use inContinue reading… Things you’re probably not using in Python 3 – but should