The NFL data is stored in CSVs in a way that’s meant to be uploaded to a relational database. For a given play (that is, for every play of the game), it calculates: “what is the probability that the team with the ball (on offense at the beginning of the given play) will win the football game.” Based on the choices made in each play of all that historic NFL data, the model then has a good idea of how every single play of a game has an effect on the eventual outcome of that game.Plays in one table, players in another table, fourth downs in a third table, punts in a fourth, and so on. The next step is the tricky part: serializing the model.The option that would result in the highest winning percentage chance (taking into account the chances of that play succeeding) is then bot-approved! “It has to be pretty fast, because one of the improvements we made recently was to make the bot tweet out what it would do in that scenario if it were the coach.” Now the bot makes and tweets out a prediction just before every play, then follows it up with a more in-depth analysis shortly after the fact.

“The data says: ‘here’s what you could be doing to win more games, as it turns out, and you’re not doing it.'” The 4th Down Bot is written almost entirely in Python.

The model it runs on was trained using scikit-learn, a regularized logistic regression, to analyze around 13 years of NFL play-by-play data.

With the ball only one yard away from converting into a first down, Pats’ coach Bill Belichick decided to go for it. Tom Brady’s pass was caught by wide receiver Julian Edelman—but he was brought down a yard short of the first down.