An introduction to Football (Soccer) data analysis using Python

By Indranil Ghosh

Elevator Pitch

Many people want to start with football data analysis and this talk (accompanied by Jupyter notebooks maintaining a pedagogic flow of Python codes) introduces these concepts hands-on (accessing data, visualization, and analysis) to those interested.


This talk introduces the following concepts on football data analysis:

  1. I will start my talk addressing how to get open access football event data using the statsbomb API using Python [3min],

  2. The next thing I will talk about is drawing a football pitch using the mplsoccer Python module, so that we can start making most of our football data visualizations on this pitch [3 min],

  3. I will then talk about simple data visualizations like drawing shot maps, pass maps and their corresponding heat maps [7 min],

  4. Next I will teach how to visualize a passing network on this pitch of a particular team during a particular game. We will further advance our knowledge by analyzing this pass network using the NetworkX python module that is usually used in complex network analysis in mathematics. We will learn how to calculate pass degree distributions of each player, find out which player was the most central in that pass network by calculating the “centrality” of each player node, and so on [8 min],

  5. After that, I will teach how to implement computational geometric concepts like Convex Hulls, Voronoi diagrams, and Delaunay triangulations using the Python package scipy.spatial and mplsoccer on open access football tracking data so that we can analyze how many passes were available to a player at a particular instance of a game, or how a group of players broke down space on the pitch at a particular instance, etc. [7 min], and

  6. Finally I will talk about how to analyze Expected Goals (xG) using open data from statsbomb [5 min].

  7. I will end my talk by guiding the audience to the references I used for starting with football (soccer) data analysis [2 min].


  1. Github repo to the Jupyter Notebooks.

  2. Blog Website (still under development).

  3. I have presented Python-related works at Scipy India (2018), Pycon Australia (2020), PyCode Conference (2020) and SLAS conference (2019). I have also given multiple talks at conferences related to the R programming language and/or conferences related to open source. I have been public speaking/presenting at international and/or national conferences since 2018 and I would say that I have good presentation skills. Whenever I learn new interesting topics I try sharing the same with the Python community as a way of disseminating useful knowledge.