In this tutorial, game 7 of the 2016 NBA finals will be animated with Matplotlib one shot at a time within a Jupyter Notebook. This tutorial integrates many different topics including:
Only a portion of the code has been included in this post. Please visit the official Dunder Data Github organization’s Matplotlib Tutorials page to download the complete tutorial.
Take my free Intro to Pandas course to begin your journey mastering data analysis with Python.
Below is the video that is created at the very end of the tutorial.
Game 7 between the Cleveland Cavaliers and the Golden State Warriors held on June 19, 2016 was very exciting with the outcome being determined at the very end. I have seen shot-positional data before and thought it would be a fun challenge to animate the entire game using Matplotlib.
The NBA has a great website at nba.com with most of its data on the stats page. Once you have navigated to the stats page, click on the scores tab and then use the calendar to select the date June 19, 2016. You should see the image below:
The NBA tracks almost every piece of data imaginable during the game. In this tutorial, we will look at the shot positional data, which consists of the following:
From the scores page above, click on Box Score. You will get many traditional stats for both teams.
There is a message above the box score that informs us how to find the Shot Chart. Click the number 82 in the FGA column to get the shot chart for the Cleveland Cavaliers. All the field goals attempted (shots) by them for the entire game are displayed in the following image.
The following section is created from a great blog post by Greg Reda on how to reverse engineer an API.
Often times, we will have to scrape the raw HTML to get the data we want, but in this case, the data is fetched through an internal API that the NBA maintains. As far as I know, they do not show how it is publicly used. So, in order to get the data, we will have to take a peak into the requests being made as the page is loading.
All browsers have tools for web developers. In Google Chrome we can find the developer tools by right-clicking the page, selecting More Tools, and then Developer Tools.
Click on the Network tab in the developer tools and select XHR below it. XHR stands for XMLHttpRequest and is used to fetch XML or JSON data.
The area is probably going to be blank when you first visit it as seen in the image above. Refresh the page and all the XHR requests will show up.
There are a lot of requests, but the one we are looking for is the shotchartdetail. Click on it, and then click on preview and reveal the parameters.
Immerse yourself in my comprehensive path for mastering data science and machine learning with Python. Purchase the All Access Pass to get lifetime access to all current and future courses. Some of the courses it contains:
Eli Uriegas has put together a document containing all the possible API endpoints and their parameters. Since this is a not a public API, it is subject to change, and you might need to repeat the procedure above to ensure you make the right request.
Go back to the Headers tab and copy and paste the entire Request URL into a new browser tab.
Your browser should return the request as JSON data.
The above URL returns the shot chart data for just the Cleveland Cavaliers. If we modify if by changing the TeamID parameter to equal 0, we will get data for both teams. We assign this URL to a variable.
game7_url = '''http://stats.nba.com/stats/shotchartdetail?\
AheadBehind=&CFID=&CFPARAMS=&ClutchTime=\
...
&VsPlayerID4=&VsPlayerID5=&VsTeamID='''
The Requests third-party library makes it easy to get data from the web. In order to make this particular request, you will need to provide a header. It appears that providing the User-Agent is enough for our request to be accepted.
To find the header values, look back at the network tab in the developer tools under the Headers tab. Scroll down until you see Request Headers. You can put these values into a dictionary and pass this to the get
function. Below, we only use the user agent. Find out more from this Stack Overflow post.
import requests
headers = {'User-Agent': '<your user agent>'}
r = requests.get(base_url, params=params, headers=headers)
If you are enjoying this article, consider purchasing the All Access Pass! which includes all my current and future material for one low price.
Use the json
method from the requests object to retrieve the data. The results will be given to you in a Python dictionary, which we can pass to the Pandas DataFrame constructor.
json_data = r.json()
result = json_data['resultSets'][0]import pandas as pd
columns = result['headers']
data = result['rowSet']
df_shots = pd.DataFrame(data=data, columns=columns)
df_shots.head()
Find the count of each type of shot.
>>> df_shots['SHOT_TYPE'].value_counts()2PT Field Goal 99
3PT Field Goal 66
Name: SHOT_TYPE, dtype: int64
Unfortunately, this API does not have the free throws, which means we will need to make another request to a different endpoint. If we look at the free throws attempted (FTA) column in the box score, there is no link to click on. We need to go hunting on a different page.
If you go back to the page that just had the score of the game, you will see a link for play by play. This section has a description for all the events in the game including free throws.
By completing the same procedure as above, we locate the desired endpoint as ‘playbyplayv2’ along with the following URL:
play_by_play_url = '''https://stats.nba.com/stats/playbyplayv2?\
EndPeriod=10&EndRange=55800&GameID=0041500407\
&RangeType=2&Season=2015-16\
&SeasonType=Playoffs&StartPeriod=1\
&StartRange=0'''
We replicate our previous work (along with some transformations not shown) to get the play by play data in a Pandas DataFrame.
r = requests.get(pbp_base_url, params=pbp_params, headers=headers)
pbp_json_data = r.json()
...
df_pbp.head()
This play by play data has every single recorded play during the game. By manually looking through the data, I noticed that when EVENTMSGTYPE was equal to 3, a free throw was attempted. Let’s create a new DataFrame of just free throws.
df_ft = df_pbp[df_pbp['EVENTMSGTYPE'] == 3]
df_ft.head()
Let’s get all of our data in one single DataFrame.
>>> df_shots_all = pd.concat([df_shots2, df_ft2], sort=False,
ignore_index=True)
We can map the shot type to points with a dictionary and create a new column. We multiply by the shot made flag to only track points if the shot was made.
>>> points = {'Free Throw': 1,
'2PT Field Goal': 2,
'3PT Field Goal': 3}>>> df_shots_all['POINTS'] = df_shots_all['SHOT_TYPE'] \
.replace(points) *
df_shots_all['SHOT_MADE_FLAG']
Lets do a sanity check, and calculate all the points that were scored in the game.
>>> df_shots_all.groupby('TEAM_NAME').agg({'POINTS': 'sum'})
Finally, let’s work on visualizing our data. The columns LOC_X and LOC_Y contain the point on the court where the shot was taken.
>>> df_shots_all.plot('LOC_X', 'LOC_Y', kind='scatter')
Looking at the range of LOC_X and LOC_Y columns it seems apparent that the dimensions are feet multiplied by 10 since a basketball court is 94ft long by 50ft wide.
There are negative values for LOC_X, and more investigation yields that the basket is located at LOC_X equal to 0. The actual basket is 4 feet from the edge of the court (-40 for LOC_X). This means that the max value for LOC_X would be 900.
The same values for LOC_X and LOC_Y are used regardless of the team. To animate a game, we need to translate the shots of one team over to the other side of the court. For the visiting team we subtract the x-axis location from 900 to move it to the other side of the court. Note, we also transposed our data so that our court will be wide and not long.
is_home = df_shots_all2['HOME_TEAM'] == 1
x = df_shots_all2['LOC_Y'] + 40
x = np.where(is_home, x, 900 - x)df_shots_all2['LOC_X_NEW'] = x
df_shots_all2['LOC_Y_NEW'] = df_shots_all2['LOC_X']
A great tutorial on creating an NBA court with shots can be viewed here. Matplotlib lines and patches are used to create the court.
from matplotlib.patches import Arc, Circledef create_court():
# Set-up figure
fig = plt.figure(figsize=(16, 8))
ax = fig.add_axes([.2, .1, .6, .8], frame_on=False,
xticks=[], yticks=[])
# Draw the borders of the court
ax.set_xlim(-20, 960)
ax.vlines([0, 940], -250, 250)
ax.hlines([-250, 250], 0, 940)
ax.hlines([-80, -80, 80, 80], [0, 750] * 2, [190, 940] * 2)
ax.hlines([-60, -60, 60, 60], [0, 750] * 2, [190, 940] * 2)
ax.vlines([190, 750], -80, 80)
ax.vlines(470, -250, 250)
ax.vlines([40, 900], -30, 30) # Add the three point arc, free throw circle,
# midcourt circle and backboard and rim
ax.add_patch(Arc((190, 0), 120, 120, theta1=-90, theta2=90)
ax.add_patch(Arc((190, 0), 120, 120, theta1=90, theta2=-90)
ax.add_patch(Arc((750, 0), 120, 120, theta1=90, theta2=-90))
ax.hlines([-220, -220, 220, 220], [0, 800] * 2, [140, 940] * 2))
ax.add_patch(Arc((892.5, 0), 475, 475, theta1=112.5,
theta2=-112.5)
ax.add_patch(Arc((47.5, 0), 15, 15, theta1=0, theta2=360)
ax.add_patch(Arc((892.5, 0), 15, 15, theta1=0, theta2=360)
ax.add_patch(Circle((470, 0), 60, facecolor='none', lw=2))
# Text for score, time and decsription
ax.text(20, 270, f"{full_home_team} 0" ,
fontsize=16, fontweight='bold', label='home')
ax.text(680, 270, f"{full_visitor_team} 0",
fontsize=16, fontweight='bold', label='visitor')
ax.text(0, -270, "Q:1 12:00", fontsize= 14, label='time')
ax.text(200, -270, "", fontsize=14, label='description')
return fig, axfig, ax = create_court()
A Matplotlib scatterplot displays each team’s shots as a different color. Filled in circles are shots that were made. Matplotlib allows us to use strings to refer to column names. To take advantage of this, we create two new columns FACECOLOR and EDGECOLOR and simply use their names for the face and edge color parameters.
fig, ax = create_court()missed = df_shots_all2['SHOT_MADE_FLAG'] == 0
edgecolor = df_shots_all2['HOME_TEAM'].replace({0: 'r', 1:'b'})
facecolor = edgecolor.copy()
facecolor[missed] = 'none'df_shots_all2['FACECOLOR'] = facecolor
df_shots_all2['EDGECOLOR'] = edgecolor
ax.scatter('LOC_X_NEW', 'LOC_Y_NEW', marker='o', s=120,
facecolors='FACECOLOR', edgecolors='EDGECOLOR',
lw=2, data=df_shots_all2)
We can add player images to our visualization after every shot they take. When clicking on individual player images, I discovered the pattern for finding the correct player image for the correct year. You need to have the team ID, year, and player ID.
unique_players = df_shots_all2[['TEAM_ID', 'PLAYER_ID']]
.drop_duplicates()
unique_players.head()
We use the imread
Matplotlib function to convert an image into a NumPy array of RGBA values and store them in a dictionary. A new Matplotlib Axes object is created for each team to display the players. The below image shows the result after setting a player image for each team.
To create animations in Matplotlib inside the Jupyter notebook, two things must occur. First, the magic command %matplotlib notebook
must be run. This changes matplotlib’s backend to allow for interactive plots. Second, the FuncAnimation
function must be imported which utilizes a user-defined function to control the objects in the plot.
The main parameter passed toFuncAnimation
is a function that gets called once for each frame and is used to update the figure. The function update
is created to do the following:
def update(frame_number): # remove old player image data
im_left.set_data([[]])
im_right.set_data([[]]) # get next shot data
current_row = df_shots_all2.iloc[frame_number - 1] # plot shot as a one-point scatter plot
ax.scatter('LOC_X_NEW', 'LOC_Y_NEW', marker='o', s=120,
facecolors='FACECOLOR', edgecolors='EDGECOLOR',
lw=2, data=current_row); # update scores
team_type = 'home' if current_row['HOME_TEAM'] == 1 else \
'visitor'
scores[team_type] += current_row['POINTS']
texts['home'].set_text(f"{full_home_team} {scores['home']}")
texts['visitor'].set_text(f"{full_visitor_team} "
"{scores['visitor']}") # update time remaining
per = current_row['PERIOD']
mr = current_row['MINUTES_REMAINING']
sr = current_row['SECONDS_REMAINING']
texts['time'].set_text(f"Q:{per} {mr:02d}:{sr:02d}")
texts['description'].set_text(current_row['DESCRIPTION'])
# get player image data
team_id = current_row['TEAM_ID']
player_id = current_row['PLAYER_ID']
image_array = player_image_data[team_id][player_id] if team_type == 'home':
im_left.set_data(image_array)
else:
im_right.set_data(image_array)animation = FuncAnimation(fig, func=update,
frames=len(df_shots_all2) + 1,
init_func=init, interval=10, repeat=False)
Notice that the result was saved to the animation
variable. In order to save the animation to a file, the save
method is called which needs a file name and a MovieWriter
instance. If no writer is provided it will default to the writer in the 'animation.writer'
value in the rcParams
dictionary.
# my default movie writer
>>> plt.rcParams['animation.writer']
'ffmpeg'
Matplotlib can show you a list of all the available writers by importing the helper variable writers
.
>>> from matplotlib.animation import writers
>>> writers.list()
['pillow', 'ffmpeg', 'ffmpeg_file', 'imagemagick', 'imagemagick_file', 'html']
We can then import one of the writer classes directly, create an instance of it and pass it to the save
method. Here we choose to create our movie with 3 frames every 4 seconds.
>>> from matplotlib.animation import FFMpegWriter
>>> writer = FFMpegWriter(fps=.75)
>>> animation.save('nba_game_animation.mp4', writer=writer)
After successfully saving the file, you can embed it directly into the notebook by using the video html tag. We will useHTML
from the IPython.display
module to do the actual embedding.
from IPython.display import HTML
HTML("""<video width="600" height="450" controls>
<source src="nba_game_animation.mp4" type="video/mp4">
</video>""")
The actual notebook used for this tutorial contains quite a bit more code than what appeared during this post. You should be able to replicate this with any other NBA game.
This tutorial covered a wide range of topics including:
FuncAnimation
with a custom function that updates the figure each frameImmerse yourself in my comprehensive path for mastering data science and machine learning with Python. Purchase the All Access Pass to get lifetime access to all current and future courses. Some of the courses it contains:
Upon registration, you'll get access to the following free courses: