2018-02-figure-skating-analysis
所属分类:自动编程
开发工具:Jupyter Notebook
文件大小:1542KB
下载次数:0
上传日期:2018-06-05 19:54:52
上 传 者:
sh-1993
说明: 支持2018年2月8日BuzzFeed新闻文章“The Edge”的数据、代码和方法
(The data, code, and methodologies supporting the February 8, 2018 BuzzFeed News article, "The Edge.")
文件列表:
Makefile (253, 2018-06-06)
data (0, 2018-06-06)
data\processed (0, 2018-06-06)
data\processed\figure-skating-goe-adj-2016-17.csv (14102, 2018-06-06)
data\processed\figure-skating-goe-adj-2017-18.csv (14162, 2018-06-06)
data\processed\ice-dancing-goe-adj-2016-17.csv (14912, 2018-06-06)
data\processed\ice-dancing-goe-adj-2017-18.csv (14869, 2018-06-06)
data\processed\judge-country.csv (4659, 2018-06-06)
data\processed\judge-goe.csv (2358556, 2018-06-06)
data\processed\judges.csv (429039, 2018-06-06)
data\raw (0, 2018-06-06)
data\raw\judge-scores.csv (3922314, 2018-06-06)
data\raw\judged-aspects.csv (1420124, 2018-06-06)
data\raw\performances.csv (212490, 2018-06-06)
notebooks (0, 2018-06-06)
notebooks\home-country-preference-analysis.ipynb (135718, 2018-06-06)
notebooks\ice-dance-team-scoring.ipynb (93989, 2018-06-06)
notebooks\isu-deviation-points.ipynb (106519, 2018-06-06)
notebooks\progressive-skate-america-2016-example.ipynb (15313, 2018-06-06)
notebooks\translate-goe.ipynb (31556, 2018-06-06)
requirements.txt (29, 2018-06-06)
# Data Analysis: Do figure skating judges score their home countries more favorably?
A closer look at the data, code, and methodologies supporting the BuzzFeed News article, "[The Edge](http://www.buzzfeed.com/johntemplon/the-edge)," published February 8, 2018. Please read that article, which contains important context, before proceeding.
## Glossary
This repository uses the following definitions:
- __ISU__: The abbreviation for the [International Skating Union](http://www.isu.org/), the group that runs figure skating's highest-level international competitions.
- __Protocol PDF__: The final score sheet released at the end of each competition. ([Example](http://www.isuresults.com/results/season1718/gpf1718/gpf2017_protocol.pdf?).)
- __Program__: One segment of a figure skating competition. International figure skating competitions are divided into Short and Free programs.
- __Elements__: The individual technical pieces of a program, such as jumps and spins. Each judge provides a score for each individual element known as a "Grade of Execution."
- __Components__: For each performance, each judge provides scores for five "components": Skating Skills, Transitions, Performance, Composition, and Interpretation.
- __Aspect__: This is the term used by this analysis to refer to an individual component or element.
- __Scale of Values__: The ISU's conversion table between the "Grade of Execution" and the points awarded for each type of element. See below for more details.
- __Home country__: The country the judge or skater represents in competition. _Note:_ This is not necessarily a judge or skater's country of birth.
- __Home-country preference__: The difference in the number of points a judge awards to skaters from their home country versus those from other countries. Higher home-country scores do not in and of themselves show a judge is deliberately trying to raise a compatriot’s standing; the scores could reflect a preference for a regional style of skating, for example, or an inclination toward skaters the judge has taken special note of, or even just patriotism.
## Data
### Raw Scoring Data
The raw data for this project was collected from the protocol PDFs for every major international competition from October 2016 through December 2017. You can find a list of those 17 competitions below.
The raw data is contained in the following three "[tidy](http://vita.had.co.nz/papers/tidy-data.html)" CSV files:
- `performances.csv`: Metadata about each performance.
- `judged-aspects.csv`: Metadata about the elements and components of each performance.
- `judge-scores.csv`: Each judge's score for each element and component of each performance.
You can find a list of all the fields and their definitions, as well as the scripts used to extract the data from the PDFs, in [`github.com/BuzzFeedNews/figure-skating-scores`](https://github.com/BuzzFeedNews/figure-skating-scores).
### Competitions Included
2016–17 season:
- ISU GP 2016 Progressive Skate America (Oct. 20-23, 2016)
- ISU GP 2016 Skate Canada International (Oct. 27-30, 2016)
- ISU GP Rostelecom Cup 2016 (Nov. 4-6, 2016)
- ISU GP Trophee de France 2016 (Nov. 11-13, 2016)
- ISU GP Audi Cup of China 2016 (Nov. 17-20, 2016)
- ISU GP NHK Trophy 2016 (Nov. 25-27, 2016)
- ISU Grand Prix of Figure Skating Final 2016 (Dec. 8-11, 2016)
- ISU European Figure Skating Championships 2017 (Jan. 23-29, 2017)
- ISU Four Continents Championships 2017 (Feb. 14-19, 2017)
- ISU World Figure Skating Championships 2017 (Mar. 27 - Apr. 2, 2017)
2017–18 season:
- ISU GP Rostelecom Cup 2017 (Oct. 20-22, 2017)
- ISU GP 2017 Skate Canada International (Oct. 27-29, 2017)
- ISU GP Audi Cup of China 2017 (Nov. 3-5, 2017)
- ISU GP NHK Trophy 2017 (Nov. 10-12, 2017)
- ISU GP Internationaux de France de Patinage 2017 (Nov. 17-19, 2017)
- ISU GP 2017 Bridgestone Skate America (Nov. 24-26, 2017)
- Grand Prix Final 2017 Senior and Junior (Dec. 7-10, 2017)
### Processed Data
#### Translations of "Grades of Execution"
The protocol PDFs include a rating for each element, between `-3` and `+3`, given by each judge. This **Grade of Execution** is translated by the **Scale of Values** to the actual number of points awarded for that element. (More difficult elements receive more points for the same Grade of Execution.) The [`translate-goe` notebook](./notebooks/translate-goe.ipynb) includes the code BuzzFeed News used to translate each Grade of Execution for each element. That notebook produces the [`judge-goe.csv`](data/processed/judge-goe.csv) file, which contains the translated values for each score.
Each season of figure skating and ice dancing uses a slightly different Scale of Values, which are documented in a series of PDFs published by the ISU. For easier analysis, BuzzFeed News converted these PDFs to CSV files:
- 2016-17 Figure Skating: [PDF](http://www.isu.org/docman-documents-links/isu-files/documents-communications/isu-communications/459-2000-sptc-sov-and-goe-2016-2017-revised-july-14/file), [CSV](./data/processed/figure-skating-goe-adj-2016-17.csv)
- 2017-18 Figure Skating: [PDF](http://www.isu.org/docman-documents-links/isu-files/documents-communications/isu-communications/14352-isu-communication-2089/file), [CSV](./data/processed/figure-skating-goe-adj-2017-18.csv)
- 2016-17 Ice Dancing: [PDF](http://isu.org/docman-documents-links/isu-files/documents-communications/isu-communications/476-isu-communication-2015/file), [CSV](./data/processed/ice-dancing-goe-adj-2016-17.csv)
- 2017-18 Ice Dancing: [PDF](http://www.isu.org/docman-documents-links/isu-files/documents-communications/isu-communications/589-isu-communication-2094/file), [CSV](./data/processed/ice-dancing-goe-adj-2017-18.csv)
#### Judge Names and Home Countries
The ISU posts the names of each judge (for each program) on a series of HTML pages on its website. BuzzFeed News collected all of the judge names from those pages and standardized that data. You can find the results in [`judges.csv`](data/processed/judges.csv).
On the ISU's website, judges are often labeled simply as members of the ISU instead of their home country. BuzzFeed News used the ISU's lists of officials for [the 2017-18 season](https://www.isu.org/communications/12127-isu-communication-2111/file) and [the 2016-17 season](https://www.isu.org/docman-documents-links/isu-files/documents-communications/isu-communications/490-2027-list-officials-fs-id-sys-2016-2017-updated-oct-6-rev/file) to identify each judge's home country. You can find the results in [`judge-country.csv`](data/processed/judge-country.csv).
## Analyses
The analysis of home-country preference can be found in [this Jupyter Notebook](./notebooks/home-country-preference-analysis.ipynb). The notebook takes a reader through the process of:
- Loading the data
- Calculating the total points awarded by each judge
- Calculating the difference between a judge's total points and the average total points awarded by the other judges for the same performance
- Calculating that judges have a roughly 3.4-point home-country preference, overall, and that the difference is statistically significant
- Calculating overall home-country preferences by country
- Finding that 27 individual judges exhibit a highly statistically significant home-country preference
In developing the methodology, BuzzFeed News consulted with three statisticians. Two — [Jay Emerson (Yale)](http://www.stat.yale.edu/~jay/) and [Eric Zizewitz (Dartmouth)](https://www.dartmouth.edu/~ericz/) — have worked closely with figure skating data in the past. The third, [Abraham Wyner (Penn)](https://statistics.wharton.upenn.edu/profile/ajw/), has extensive experience applying similar analyses to data from other sports.
In addition to the home-country preference analysis, BuzzFeed News also recreated the ISU's "Deviation Points" system for identifying outlier judgments, the rules for which can be found in [ISU Communication 20***](http://www.isu.org/communications/593-isu-communication-20***/file). That code can be found in [the `isu-deviation-points` notebook](./notebooks/isu-deviation-points.ipynb).
A fourth notebook — [`progressive-skate-america-2016-example`](./notebooks/progressive-skate-america-2016-example.ipynb) — provides details about Maira Abasova's scores at the ISU GP Progressive Skate America 2016.
_Update, February 18, 2018:_ This repository now also includes a fifth notebook — [`ice-dance-team-scoring`](./notebooks/ice-dance-team-scoring.ipynb) — supporting a second BuzzFeed News article, "[In Bitter Ice Dancing Rivalry Judges Favor Their Own](https://www.buzzfeed.com/johntemplon/in-bitter-ice-dancing-rivalry-judges-favor-their-own)." That notebook uses the same data, and much of the same code, as the analyses above.
## Technical Notes
All of the analyses above are coded in Python 3, using the libraries listed in [`requirements.txt`](./requirements.txt).
The individual-judge analysis in the [`home-country-preference-analysis` notebook](./notebooks/home-country-preference-analysis.ipynb) uses bootstrapping to test for statistical significance. Depending on your computer's processing power, this step could take at least a few hours to run. If you would like to run it more quickly, you can change the number of simulations in `find_judge_prob`, but doing so will decrease the accuracy of those calculations.
## Licensing
All code in this repository is available under the [MIT License](https://opensource.org/licenses/MIT). All data files are available under the [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) (CC BY 4.0) license.
## Questions / Feedback
Contact John Templon at [john.templon@buzzfeed.com](mailto:john.templon@buzzfeed.com).
Looking for more from BuzzFeed News? [Click here for a list of our open-sourced projects, data, and code.](https://github.com/BuzzFeedNews/everything)
近期下载者:
相关文件:
收藏者: