kaggle——NFL Big Data Bowl 2020 Official Starter Notebook
Introduction
In this competition you will predict how many yards a team will gain on a rushing play in an NFL regular season game.
在这场比赛中,你可以预测在NFL的常规赛中,一支球队在一场激烈的比赛中可以获得多少码的优势。
You will loop through a series of rushing plays; for each play, you'll receive the position, velocity, orientation, and more for all 22 players on the field at the moment of handing the ball off to the rusher, along with many other features such as teams, stadium, weather conditions, etc.
你将循环播放一系列匆忙的比赛;对于每一场比赛,在将球交给抢断者的那一刻,你会收到场上所有22名球员的位置、速度、方位等信息,以及许多其他特征,如球队、体育场、天气状况等。
You'll use this information to predict how many yards the team will gain on the play as a cumulative probability distribution. Once you make that prediction, you can move on to the next rushing play.
您将使用这些信息来预测球队在比赛中将获得多少码的累积概率分布。一旦你做出了这样的预测,你就可以进入下一个冲刺阶段。
This competition is different from most Kaggle Competitions in that:
本次比赛与大多数Kaggle比赛的不同之处在于:
- You can only submit from Kaggle Notebooks, and you may not use other data sources, GPU, or internet access.
- 您只能通过Kaggle笔记本提交,并且不能使用其他数据源、GPU或internet访问。
- This is a two-stage competition. In Stage One you can edit your Notebooks and improve your model, where Public Leaderboard scores are based on your predictions on rushing plays from the first few weeks of the 2019 regular season. At the beginning of Stage Two, your Notebooks are locked, and we will re-run your Notebooks over the following several weeks, scoring them based on their predictions relative to live data as the 2019 regular season unfolds.
- 这是一个两阶段的比赛。在第一阶段,你可以编辑你的笔记并改进你的模型,公共排行榜分数是基于你对2019年常规赛前几周的仓促比赛的预测。在第二阶段开始时,你的笔记本被锁定,我们将在接下来的几周内重新运行你的笔记本,根据他们对2019年常规赛展开时的实时数据的预测为他们打分。
- You must use our custom
kaggle.competitions.nflrush
Python module. The purpose of this module is to control the flow of information to ensure that you are not using future data to make predictions for the current rushing play. If you do not use this module properly, your code may fail when it is re-run in Stage Two. - 您必须使用我们自定义的
kaggle.competitions.nflrush
Python模块。这个模块的目的是控制信息的流动,以确保您没有使用未来的数据来预测当前的急速播放。如果您没有正确地使用这个模块,您的代码在第二阶段重新运行时可能会失败。
在这个初学者笔记本中,我们将展示如何使用nflrush
模块来获取训练数据、获取测试特性和做出预测,并编写提交文件。
TL;DR: End-to-End Usage Example
from kaggle.competitions import nflrush
env = nflrush.make_env()
# Training data is in the competition dataset as usual
train_df = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2020/train.csv', low_memory=False)
train_my_model(train_df)
for (test_df, sample_prediction_df) in env.iter_test():
predictions_df = make_my_predictions(test_df, sample_prediction_df)
env.predict(predictions_df)
env.write_submission_file()
Note that train_my_model
and make_my_predictions
are functions you need to write for the above example to work.
注意,train_my_model
和make_my_prediction
是您需要编写的函数,以便上面的示例能够工作。
In-depth Introduction
First let's import the module and create an environment.
from kaggle.competitions import nflrush
import pandas as pd
# You can only call make_env() once, so don't lose it!
env = nflrush.make_env()
Training data is in the competition dataset as usual
train_df = pd.read_csv('/kaggle/input/nfl-big-data-bowl-2020/train.csv', low_memory=False)
train_df
iter_test
function
Generator which loops through each rushing play in the test set and provides the observations at TimeHandoff
just like the training set. Once you call predict
to make your yardage prediction, you can continue on to the next play.
生成器循环遍历测试集中的每个快速游戏,并在“时间切换”时提供观察结果,就像训练集一样。一旦您调用predict
进行码数预测,您就可以继续进行下一个游戏。
Yields:
While there are more rushing play(s) and
predict
was called successfully since the last yield, yields a tuple of:虽然有更多的匆忙的play(s)和
predict
被称为成功的,因为最后的产量,产生一个元组:test_df
: DataFrame with player and game observations for the next rushing play. DataFrame与玩家和游戏观察,为下一冲发挥。
sample_prediction_df
: DataFrame with an example yardage prediction. Intended to be filled in and passed back to thepredict
function.带有示例码数预测的DataFrame。需要填写并返回到' predict '函数。
If
predict
has not been called successfully since the last yield, prints an error and yieldsNone
.如果“predict”自上一个yield之后没有被成功调用,那么打印一个错误并产生“None”。
# You can only iterate through a result from `env.iter_test()` once
# so be careful not to lose it once you start iterating.
iter_test = env.iter_test()
Let's get the data for the first test play and check it out.
(test_df, sample_prediction_df) = next(iter_test)
test_df
Note how our predictions need to take the form of a cumulative probability distribution over the range of possible yardages. Each column indicates the probability that the team gains <= that many yards on the play. For example, the value for Yards-2
should be your prediction for the probability that the team gains at most -2 yards, and Yard10
is the probability that the team gains at most 10 yards. Theoretically, Yards99
should equal 1.0
.
sample_prediction_df
The sample prediction here just predicts that exactly 3 yards were gained on the play.
这里的样本预测只预测了在比赛中获得了3码。
sample_prediction_df[sample_prediction_df.columns[98:108]]
请注意,如果我们试图在不预测当前剧本的情况下继续进行下一个测试剧本,将会得到一个错误。
next(iter_test)
predict
function
Stores your predictions for the current rushing play. Expects the same format as you saw in sample_prediction_df
returned from the iter_test
generator.
储存你对当前匆忙游戏的预测。期望与您在从“iter_test”生成器返回的“sample_prediction_df”中看到的格式相同。
Args:
predictions_df
: DataFrame必须具有与sample_prediction_df
相同的格式。
This function will raise an Exception if not called after a successful iteration of the iter_test
generator.
如果在iter_test
生成器成功迭代后未调用该函数,则该函数将引发异常。
让我们使用iter_test
提供的示例进行一个虚拟预测。
env.predict(sample_prediction_df)
Main Loop
Let's loop through all the remaining plays in the test set generator and make the default prediction for each. The iter_test
generator will simply stop returning values once you've reached the end.
让我们循环遍历测试集生成器中的所有剩余剧本,并为每个剧本做出默认预测。iter_test
生成器将在您到达结束时停止返回值。
When writing your own Notebooks, be sure to write robust code that makes as few assumptions about the iter_test
/predict
loop as possible. For example, the number of iterations will change during Stage Two of the competition, since you'll be tested on rushing plays which hadn't even occurred when you wrote your code. There may also be players in the updated test set who never appeared in any Stage One training or test data.
在编写自己的笔记本时,一定要编写健壮的代码,尽可能少地假设iter_test
/predict
循环。例如,在竞赛的第二阶段,迭代的数量会发生变化,因为您将在匆忙的剧本上进行测试,而这些剧本甚至在您编写代码时还没有出现。在更新的测试集中也可能有从未出现在第一阶段训练或测试数据中的玩家。
You may assume that the structure of sample_prediction_df
will not change in this competition.
您可以假设sample_prediction_df
的结构在这次比赛中不会改变。
for (test_df, sample_prediction_df) in iter_test:
env.predict(sample_prediction_df)
write_submission_file
function
Writes your predictions to a CSV file (submission.csv
) in the Notebook's output directory.
将您的预测写到一个CSV文件中(' submission.csv ')。
You must call this function and not generate your own submission.csv
file manually.
你必须调用这个函数,不能生成你自己的提交。手动的csv文件。
Can only be called once you've completed the entire iter_test
/predict
loop.
只能在完成整个iter_test
/predict
循环后调用。
env.write_submission_file()
# We've got a submission file!
import os
print([filename for filename in os.listdir('/kaggle/working') if '.csv' in filename])
As indicated by the helper message, calling write_submission_file
on its own does not make a submission to the competition. It merely tells the module to write the submission.csv
file as part of the Notebook's output. To make a submission to the competition, you'll have to Commit your Notebook and find the generated submission.csv
file in that Notebook Version's Output tab (note this is outside of the Notebook Editor), then click "Submit to Competition".
如助手消息所示,单独调用write_submission_file
not 提交给竞赛。它只是告诉模块写“提交”。作为笔记本输出的一部分的csv文件。要向竞赛提交,你必须Commit 你的笔记本并找到生成的submission.csv
文件在该笔记本版本的输出选项卡中(注意这是笔记本编辑器的_outside_),然后单击“Submit to Competition”。
When we re-run your Notebook during Stage Two, we will run the Notebook Version(s) (generated when you hit "Commit") linked to your chosen Submission(s).
当我们在第二阶段重新运行您的笔记本时,我们将运行与您选择的提交链接的笔记本版本(当您点击“提交”时生成)。
Restart the Notebook to run your code again
In order to combat cheating, you are only allowed to call make_env
or iterate through iter_test
once per Notebook run. However, while you're iterating on your model it's reasonable to try something out, change the model a bit, and try it again.
为了防止作弊,你只能在每次运行笔记本时调用一次make_env
或者遍历iter_test
。然而,当您在您的模型上迭代时,尝试一些东西是合理的,稍微改变一下模型,然后再试一次。
Unfortunately, if you try to simply re-run the code, or even refresh the browser page, you'll still be running on the same Notebook execution session you had been running before, and the nflrush
module will still throw errors.
不幸的是,如果您尝试简单地重新运行代码,甚至刷新浏览器页面,您仍然会在之前运行的相同的Notebook执行会话上运行,而nflrush
模块仍然会抛出错误。
To get around this, you need to explicitly restart your Notebook execution session, which you can do by clicking "Run"->"Restart Session" in the Notebook Editor's menu bar at the top.
为了解决这个问题,您需要显式地重新启动您的笔记本执行会话,您可以通过在顶部的笔记本编辑器菜单栏中clicking “Run”—>“restart session”来实现。
kaggle——NFL Big Data Bowl 2020 Official Starter Notebook的更多相关文章
- Kaggle——NFL Big Data Bowl
neural networks + feature engineering for the win 导入需要的库 import numpy as np import pandas as pd impo ...
- Competing in a data science contest without reading the data
Competing in a data science contest without reading the data Machine learning competitions have beco ...
- Popular Deep Learning Tools – a review
Popular Deep Learning Tools – a review Deep Learning is the hottest trend now in AI and Machine Lear ...
- Spring Boot Reference Guide
Spring Boot Reference Guide Authors Phillip Webb, Dave Syer, Josh Long, Stéphane Nicoll, Rob Winch, ...
- Spring Boot文档
本文来自于springboot官方文档 地址:https://docs.spring.io/spring-boot/docs/current/reference/html/ Spring Boot参考 ...
- Decision Boundaries for Deep Learning and other Machine Learning classifiers
Decision Boundaries for Deep Learning and other Machine Learning classifiers H2O, one of the leading ...
- java框架之SpringBoot(1)-入门
简介 Spring Boot 用来简化 Spring 应用开发,约定大于配置,去繁从简,just run 就能创建一个独立的.产品级别的应用. 背景: J2EE 笨重的开发.繁多的配置.低下的开发效率 ...
- 普通程序员转型AI免费教程整合,零基础也可自学
普通程序员转型AI免费教程整合,零基础也可自学 本文告诉通过什么样的顺序进行学习以及在哪儿可以找到他们.可以通过自学的方式掌握机器学习科学家的基础技能,并在论文.工作甚至日常生活中快速应用. 可以先看 ...
- 解决:CentOS下的 error while loading shared libraries: libmysqlclient.so.16: cannot open shared object file: No such file or dir
进入别人的centos,输入命令 mysql mysqladm都会报错,缺少这个共享库 libmysqlclient.so.16 . 查找下,一般都是ldconfig 没有找到共享库的位置,或者 软链 ...
随机推荐
- 2020牛客寒假算法基础集训营4 G音乐鉴赏
题目描述 作为“音乐鉴赏”课的任课老师,你的课程作为刷学分好课一直受到广泛欢迎.但这一学期,学校制定了新的标准,你的课的优秀率(分数超过90分的人数)被限制在10%以下! 为了应对这个调整,你要求所有 ...
- Android自定义View——自定义ViewPager
第一部分:自定义ViewGroup的使用,手势识别器和Scroller滑动 第二部分:处理滑动监听,处理滑动冲突,增加ViewPager的指示器 常见的滑动冲突:外部滑动方向和内部滑动方向不一 ...
- Q4:Median of Two Sorted Arrays
4. Median of Two Sorted Arrays 官方的链接:4. Median of Two Sorted Arrays Description : There are two sort ...
- Navicat Premium 12.0.18 安装与激活
Navicat Premium 12.0.18中文版 百度云链接:https://pan.baidu.com/s/1HHOOlQbbWAL-MlI908n4MQ 提取码:k9w6 1.下载好后双击运行 ...
- Nginx复习
Nginx基本概念 是什么,做什么事情 高性能的HTTP和反向代理web服务器,特点占有内存小,并发能力强, Nginx专为性能优化而开发,最高支持50000个并发连接数 反向代理 正向代理 在客户 ...
- PHP SeasLog实现高性能日志记录
https://www.jianshu.com/p/b5c01eb49df0 windows 安装 注意查看上面的信息 我标注了几个关键点 然后下载自己对应的 https://windows.php ...
- CSU 1425 NUDT校赛 I题 Prime Summation
这个题本来有希望在比赛里面出了的 当时也想着用递推 因为后面的数明显是由前面的推过来的 但是在计算的时候 因为判重的问题 ...很无语.我打算用一个tot[i]来存i的总种树,tot[i]+=tot[ ...
- MySQL--基础SQL--DDL
1.创建数据库 CREATE DATABASE dbname 例: CREATE DATABASE test 2.选择要操作的数据库 USE dbname 例: USE test 3.删除数据库 DR ...
- JavaScript学习总结(一)
概述 前端三剑客,html.css.js. 这三种语言基本是前端开发必备的东西,那么你知道这三种语言分别负责的功能是什么吗? html:负责了一个页面的结构 css:负责页面的样式 JavaScrip ...
- git push报错! [rejected] master -> master (non-fast-forward) error: failed to push some refs to 'https://gitee.com/XXX.git
git pull origin master --allow-unrelated-histories //把远程仓库和本地同步,消除差异 git add . git commit -m"X ...