Minimally Sufficient Pandas Cheat Sheet

pandas Jan 03, 2020

This article summarizes the very detailed guide presented in Minimally Sufficient Pandas.

Begin Mastering Data Science Now for Free!

Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.

What is Minimally Sufficient Pandas?

It is a small subset of the library that is sufficient to accomplish nearly everything that it has to offer.
It allows you to focus on doing data analysis and not the syntax

How will Minimally Sufficient Pandas benefit you?

All common data analysis tasks will use the same syntax
Fewer commands will be easier to commit to memory
Your code will be easier to understand by others and by you
It will be easier to put Pandas code in production
It reduces the chance of landing on a Pandas bug.

Specific Guidance

Selecting a Single Column of Data

Use the brackets and not dot notation to select a single column of data because the dot notation cannot column names with spaces, those that collide with DataFrame methods and when the column name is a variable.

>>> df[‘colname’] # do this
>>> df.colname    # not that

The deprecated `ix` indexer

The ix indexer is ambiguous and confusing (and now deprecated) as it allows selection by both label and integer location. Every trace of ix should be removed and replaced with the explicit loc or iloc indexers.

Selection with at and iat

The at and iat indexers give a small increase in performance when selecting a single DataFrame cell. Use NumPy arrays if your application relies on performance for selecting a single cell of data and not at or iat.

`read_csv` vs `read_table`

The only difference between these two functions is the default delimiter. Use read_csv for all cases as read_table is deprecated.

`isna` vs `isnull` and `notna` vs `notnull`

isna is an alias of isnull and notna is an alias of notnull. Use isna and notna as they end with ‘na’ like the other missing value methods fillna and dropna.

Arithmetic and Comparison Operators vs Methods

Use the operators( +, *, >, <=, etc..) and not their corresponding methods ( add, mul, gt, le, etc…) in all cases except when absolutely necessary such as when you need to change the direction of the alignment.

Builtin Python functions vs Pandas methods with the same name

Use the Pandas method over any built-in Python function with the same name.

Standardizing `groupby aggregation`

There are a few different syntaxes available to do a groupby aggregation. Use df.groupby('grouping column').agg({'aggregating column': 'aggregating function'}) as it can handle more complex cases.

Handling a MultiIndex

A DataFrame with a MultiIndex offers little benefit over one with a single-level index. I advise against using them. Instead, flatten them after a call to groupbyby renaming columns and resetting the index.

The equivalency of `groupby` aggregation and `pivot_table`

A groupby aggregation and a pivot_table produce the same exact data with a different shape. Use gropuby when you want to continue an analysis and pivot_table when you want to compare groups.

The equivalency of pivot_table and pd.crosstab

The pivot_table method and the crosstab function are very similar. Only use crosstab when finding the relative frequency.

pivot vs pivot_table

The pivot method pivots data without aggregating. It is possible to duplicate its functionality with pivot_table by selecting an aggregation function. Consider using only pivot_table and not pivot.

The similarity between melt and stack

Both the melt and stack methods reshape the data in a very similar manner. Use melt over stack because it allows you to rename columns and it avoids a MultiIndex.

The similarity between `pivot` and unstack

Both pivot and unstack work reshape data similarly but from above, pivot_table can handle all cases that pivot can, so I suggest using it over both of the others.

Best of the DataFrame API

The above examples are the most common areas of Pandas where multiple options are available to its users. There are many other attributes and methods that are not discussed. Below, I provide a categorized list of the minimum amount of DataFrame attributes and methods that can accomplish nearly all of your data analysis tasks. It reduces the number from over 240 to less than 80.

Attributes

columns
dtypes
index
shape
T
values

Aggregation Methods

These result in a single value for each column

all
any
count
describe
idxmax
idxmin
max
mean
median
min
mode
nunique
sum
std
var

Non-Aggretaion Statistical Methods

abs
clip
corr
cov
cummax
cummin
cumprod
cumsum
diff
nlargest
nsmallest
pct_change
prod
quantile
rank
round

Subset Selection

head
iloc
loc
tail

Missing Value Handling

dropna
fillna
interpolate
isna
notna

Grouping

expanding
groupby
pivot_table
resample
rolling

Joining Data

append
merge

Other

asfreq
astype
copy
drop
drop_duplicates
equals
isin
melt
plot
rename
replace
reset_index
sample
select_dtypes
shift
sort_index
sort_values
to_csv
to_json
to_sql

Functions

pd.concat
pd.crosstab
pd.cut
pd.qcut
pd.read_csv
pd.read_json
pd.read_sql
pd.to_datetime
pd.to_timedelta

Minimally Sufficient Pandas Cheat Sheet

Begin Mastering Data Science Now for Free!

What is Minimally Sufficient Pandas?

How will Minimally Sufficient Pandas benefit you?

Specific Guidance

Selecting a Single Column of Data

The deprecated `ix` indexer

Selection with at and iat

`read_csv` vs `read_table`

`isna` vs `isnull` and `notna` vs `notnull`

Arithmetic and Comparison Operators vs Methods

Builtin Python functions vs Pandas methods with the same name

Standardizing `groupby aggregation`

Handling a MultiIndex

The equivalency of `groupby` aggregation and `pivot_table`

The equivalency of pivot_table and pd.crosstab

pivot vs pivot_table

The similarity between melt and stack

The similarity between `pivot` and unstack

Best of the DataFrame API

Attributes

Aggregation Methods

Non-Aggretaion Statistical Methods

Subset Selection

Grouping

Joining Data

Other

Functions

Master Data Analysis with Python

Start learning Data Science using Python with our free courses!

Register for a free account

Minimally Sufficient Pandas Cheat Sheet

Begin Mastering Data Science Now for Free!

What is Minimally Sufficient Pandas?

How will Minimally Sufficient Pandas benefit you?

Specific Guidance

Selecting a Single Column of Data

The deprecated ix indexer

Selection with at and iat

read_csv vs read_table

isna vs isnull and notna vs notnull

Arithmetic and Comparison Operators vs Methods

Builtin Python functions vs Pandas methods with the same name

Standardizing groupby aggregation

Handling a MultiIndex

The equivalency of groupby aggregation and pivot_table

The equivalency of pivot_table and pd.crosstab

pivot vs pivot_table

The similarity between melt and stack

The similarity between pivot and unstack

Best of the DataFrame API

Attributes

Aggregation Methods

Non-Aggretaion Statistical Methods

Subset Selection

Grouping

Joining Data

Other

Functions

Master Data Analysis with Python

Start learning Data Science using Python with our free courses!

Register for a free account

The deprecated `ix` indexer

`read_csv` vs `read_table`

`isna` vs `isnull` and `notna` vs `notnull`

Standardizing `groupby aggregation`

The equivalency of `groupby` aggregation and `pivot_table`

The similarity between `pivot` and unstack