kaggle——NFL Big Data Bowl 2020 Official Starter Notebook

Introduction

In this competition you will predict how many yards a team will gain on a rushing play in an NFL regular season game.

在这场比赛中，你可以预测在NFL的常规赛中，一支球队在一场激烈的比赛中可以获得多少码的优势。

You will loop through a series of rushing plays; for each play, you'll receive the position, velocity, orientation, and more for all 22 players on the field at the moment of handing the ball off to the rusher, along with many other features such as teams, stadium, weather conditions, etc.

你将循环播放一系列匆忙的比赛;对于每一场比赛，在将球交给抢断者的那一刻，你会收到场上所有22名球员的位置、速度、方位等信息，以及许多其他特征，如球队、体育场、天气状况等。

You'll use this information to predict how many yards the team will gain on the play as a cumulative probability distribution. Once you make that prediction, you can move on to the next rushing play.

您将使用这些信息来预测球队在比赛中将获得多少码的累积概率分布。一旦你做出了这样的预测，你就可以进入下一个冲刺阶段。

This competition is different from most Kaggle Competitions in that:

本次比赛与大多数Kaggle比赛的不同之处在于:

You can only submit from Kaggle Notebooks, and you may not use other data sources, GPU, or internet access.
您只能通过Kaggle笔记本提交，并且不能使用其他数据源、GPU或internet访问。
This is a two-stage competition. In Stage One you can edit your Notebooks and improve your model, where Public Leaderboard scores are based on your predictions on rushing plays from the first few weeks of the 2019 regular season. At the beginning of Stage Two, your Notebooks are locked, and we will re-run your Notebooks over the following several weeks, scoring them based on their predictions relative to live data as the 2019 regular season unfolds.
这是一个两阶段的比赛。在第一阶段，你可以编辑你的笔记并改进你的模型，公共排行榜分数是基于你对2019年常规赛前几周的仓促比赛的预测。在第二阶段开始时，你的笔记本被锁定，我们将在接下来的几周内重新运行你的笔记本，根据他们对2019年常规赛展开时的实时数据的预测为他们打分。
You must use our custom kaggle.competitions.nflrush Python module. The purpose of this module is to control the flow of information to ensure that you are not using future data to make predictions for the current rushing play. If you do not use this module properly, your code may fail when it is re-run in Stage Two.
您必须使用我们自定义的kaggle.competitions.nflrush Python模块。这个模块的目的是控制信息的流动，以确保您没有使用未来的数据来预测当前的急速播放。如果您没有正确地使用这个模块，您的代码在第二阶段重新运行时可能会失败。

在这个初学者笔记本中，我们将展示如何使用`nflrush`模块来获取训练数据、获取测试特性和做出预测，并编写提交文件。

TL;DR: End-to-End Usage Example

from kaggle.competitions import nflrush

env = nflrush.make_env()

# Training data is in the competition dataset as usual

train_df = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2020/train.csv', low_memory=False)

train_my_model(train_df)

for (test_df, sample_prediction_df) in env.iter_test():

  predictions_df = make_my_predictions(test_df, sample_prediction_df)

  env.predict(predictions_df)

env.write_submission_file()

Note that train_my_model and make_my_predictions are functions you need to write for the above example to work.

注意，train_my_model和make_my_prediction是您需要编写的函数，以便上面的示例能够工作。

In-depth Introduction

First let's import the module and create an environment.

from kaggle.competitions import nflrush

import pandas as pd

# You can only call make_env() once, so don't lose it!

env = nflrush.make_env()

Training data is in the competition dataset as usual

train_df = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2020/train.csv', low_memory=False)

train_df

`iter_test` function

Generator which loops through each rushing play in the test set and provides the observations at TimeHandoff just like the training set. Once you call predict to make your yardage prediction, you can continue on to the next play.

生成器循环遍历测试集中的每个快速游戏，并在“时间切换”时提供观察结果，就像训练集一样。一旦您调用predict进行码数预测，您就可以继续进行下一个游戏。

Yields:

While there are more rushing play(s) and predict was called successfully since the last yield, yields a tuple of:
虽然有更多的匆忙的play(s)和predict被称为成功的，因为最后的产量，产生一个元组:
- test_df: DataFrame with player and game observations for the next rushing play.
  
  DataFrame与玩家和游戏观察，为下一冲发挥。
- sample_prediction_df: DataFrame with an example yardage prediction. Intended to be filled in and passed back to the predict function.
  
  带有示例码数预测的DataFrame。需要填写并返回到' predict '函数。
If predict has not been called successfully since the last yield, prints an error and yields None.

如果“predict”自上一个yield之后没有被成功调用，那么打印一个错误并产生“None”。

# You can only iterate through a result from `env.iter_test()` once

# so be careful not to lose it once you start iterating.

iter_test = env.iter_test()

Let's get the data for the first test play and check it out.

(test_df, sample_prediction_df) = next(iter_test)

test_df

Note how our predictions need to take the form of a cumulative probability distribution over the range of possible yardages. Each column indicates the probability that the team gains <= that many yards on the play. For example, the value for Yards-2 should be your prediction for the probability that the team gains at most -2 yards, and Yard10 is the probability that the team gains at most 10 yards. Theoretically, Yards99 should equal 1.0.

sample_prediction_df

The sample prediction here just predicts that exactly 3 yards were gained on the play.

这里的样本预测只预测了在比赛中获得了3码。

sample_prediction_df[sample_prediction_df.columns[98:108]]

请注意，如果我们试图在不预测当前剧本的情况下继续进行下一个测试剧本，将会得到一个错误。

next(iter_test)

`predict` function

Stores your predictions for the current rushing play. Expects the same format as you saw in sample_prediction_df returned from the iter_test generator.

储存你对当前匆忙游戏的预测。期望与您在从“iter_test”生成器返回的“sample_prediction_df”中看到的格式相同。

Args:

predictions_df: DataFrame必须具有与sample_prediction_df 相同的格式。

This function will raise an Exception if not called after a successful iteration of the iter_test generator.

如果在iter_test生成器成功迭代后未调用该函数，则该函数将引发异常。

让我们使用iter_test提供的示例进行一个虚拟预测。

env.predict(sample_prediction_df)

Main Loop

Let's loop through all the remaining plays in the test set generator and make the default prediction for each. The iter_test generator will simply stop returning values once you've reached the end.

让我们循环遍历测试集生成器中的所有剩余剧本，并为每个剧本做出默认预测。iter_test生成器将在您到达结束时停止返回值。

When writing your own Notebooks, be sure to write robust code that makes as few assumptions about the iter_test/predict loop as possible. For example, the number of iterations will change during Stage Two of the competition, since you'll be tested on rushing plays which hadn't even occurred when you wrote your code. There may also be players in the updated test set who never appeared in any Stage One training or test data.

在编写自己的笔记本时，一定要编写健壮的代码，尽可能少地假设iter_test/predict循环。例如，在竞赛的第二阶段，迭代的数量会发生变化，因为您将在匆忙的剧本上进行测试，而这些剧本甚至在您编写代码时还没有出现。在更新的测试集中也可能有从未出现在第一阶段训练或测试数据中的玩家。

You may assume that the structure of sample_prediction_df will not change in this competition.

您可以假设sample_prediction_df的结构在这次比赛中不会改变。

for (test_df, sample_prediction_df) in iter_test:

    env.predict(sample_prediction_df)

`write_submission_file` function

Writes your predictions to a CSV file (submission.csv) in the Notebook's output directory.

将您的预测写到一个CSV文件中(' submission.csv ')。

You must call this function and not generate your own submission.csv file manually.

你必须调用这个函数，不能生成你自己的提交。手动的csv文件。

Can only be called once you've completed the entire iter_test/predict loop.

只能在完成整个iter_test/predict循环后调用。

env.write_submission_file()

# We've got a submission file!

import os

print([filename for filename in os.listdir('/kaggle/working') if '.csv' in filename])

As indicated by the helper message, calling write_submission_file on its own does not make a submission to the competition. It merely tells the module to write the submission.csv file as part of the Notebook's output. To make a submission to the competition, you'll have to Commit your Notebook and find the generated submission.csv file in that Notebook Version's Output tab (note this is outside of the Notebook Editor), then click "Submit to Competition".

如助手消息所示，单独调用write_submission_filenot 提交给竞赛。它只是告诉模块写“提交”。作为笔记本输出的一部分的csv文件。要向竞赛提交，你必须Commit 你的笔记本并找到生成的submission.csv文件在该笔记本版本的输出选项卡中(注意这是笔记本编辑器的_outside_)，然后单击“Submit to Competition”。

When we re-run your Notebook during Stage Two, we will run the Notebook Version(s) (generated when you hit "Commit") linked to your chosen Submission(s).

当我们在第二阶段重新运行您的笔记本时，我们将运行与您选择的提交链接的笔记本版本(当您点击“提交”时生成)。

Restart the Notebook to run your code again

In order to combat cheating, you are only allowed to call make_env or iterate through iter_test once per Notebook run. However, while you're iterating on your model it's reasonable to try something out, change the model a bit, and try it again.

为了防止作弊，你只能在每次运行笔记本时调用一次make_env或者遍历iter_test。然而，当您在您的模型上迭代时，尝试一些东西是合理的，稍微改变一下模型，然后再试一次。

Unfortunately, if you try to simply re-run the code, or even refresh the browser page, you'll still be running on the same Notebook execution session you had been running before, and the nflrush module will still throw errors.

不幸的是，如果您尝试简单地重新运行代码，甚至刷新浏览器页面，您仍然会在之前运行的相同的Notebook执行会话上运行，而nflrush模块仍然会抛出错误。

To get around this, you need to explicitly restart your Notebook execution session, which you can do by clicking "Run"->"Restart Session" in the Notebook Editor's menu bar at the top.

为了解决这个问题，您需要显式地重新启动您的笔记本执行会话，您可以通过在顶部的笔记本编辑器菜单栏中clicking “Run”—>“restart session”来实现。