Welcome to the third edition of the Dunder Data Challenge series designed to help you learn python, data science, and machine learning. Begin working on any of the challenges directly in a Jupyter Notebook courtesy of Binder (mybinder.org).
This challenge is going to be fairly difficult, but should answer a question that many pandas users face — What is the best way to perform a groupby that does many custom aggregations? In this context, a ‘custom aggregation’ is defined as one that is not directly available to use from pandas and one that you must write a custom function.
In Dunder Data Challenge #1, a single aggregation, which required a custom grouping function, was the desired result. In this challenge, you’ll need to return several aggregations when grouping. There are a few different solutions to this problem, but depending on how you arrive at your solution, there could arise enormous performance differences. I am looking for a compact, readable solution with very good performance.
Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.
I frequently post my python data science thoughts on social media. Follow me!
If you have a group at your company looking to learn directly from an expert who understands how to teach and motivate students, let me know by filling out the form on this page.
In this challenge, you will be working with some mock sales data found in the sales.csv file. It contains 200,000 rows and 9 columns. Here are the first five rows.
There are many aggregations that you will need to return and it will take some time to understand what they are and how to return them. The following definitions for two time periods will be used throughout the aggregations.
I will now list all the aggregations that are expected to be returned. Each bullet point represents a single column. Use the first word after the bullet point as the new column name.
For every country and region, return the following:
Upon registration, you'll get access to the following free courses: