2017-12-eeoc-harassment-charges
所属分类:数据挖掘/数据仓库
开发工具:Jupyter Notebook
文件大小:10304KB
下载次数:0
上传日期:2017-12-05 19:09:21
上 传 者:
sh-1993
说明: BuzzFeed新闻文章的数据和分析,“我们获得了关于工作场所性骚扰20年的政府数据...
(Data and analysis for the BuzzFeed News article, "We Got Government Data On 20 Years Of Workplace Sexual Harassment Claims. These Charts Break It Down," published Dec. 5, 2017.)
文件列表:
data (0, 2017-12-06)
data\CES_avg_hrly_earnings.csv (53610, 2017-12-06)
data\CES_total_employment.csv (74947, 2017-12-06)
data\CES_women_employment.csv (53071, 2017-12-06)
data\SH Charge Receipts.xlsx (9346511, 2017-12-06)
data\graphics_info.csv (7587, 2017-12-06)
data\naics_sectors.csv (994, 2017-12-06)
data\oes-natsector_M2016_dl.xlsx (2336116, 2017-12-06)
notebooks (0, 2017-12-06)
notebooks\01-merge-bls-data.ipynb (93591, 2017-12-06)
notebooks\02-analyze-eeoc-claims.ipynb (70302, 2017-12-06)
output (0, 2017-12-06)
output\bls_sector_metrics.csv (2139, 2017-12-06)
output\ces_industry_metrics.csv (50046, 2017-12-06)
output\d3_claims_by_industry.csv (324247, 2017-12-06)
output\d3_claims_by_sector.csv (7913, 2017-12-06)
output\d3_information_naics_bls_info.csv (9421, 2017-12-06)
output\graphics_info_bls.csv (9421, 2017-12-06)
# Sexual Harassment Charges Analysis — Oct. 1995 to Sept. 2016
This repository contains data, analytic code, and findings that support portions of the BuzzFeed News article, "[We Got Government Data On 20 Years Of Workplace Sexual Harassment Claims. These Charts Break It Down](https://www.buzzfeed.com/lamvo/eeoc-sexual-harassment-data?utm_term=.mlLxr4pR7#.reAEQ4VD1)," published Dec. 5, 2017. Please read that article, which contains important context and details, before proceeding.
## Data
### Sexual harassment charges
Anonymized data of sexual harassment charges filed to the [U.S. Equal Employment Opportunity Commission](https://www.eeoc.gov/) (EEOC) were provided by a spokesperson from the commission.
Data includes the following header:
- `CHARGE_FILING_DATE`
- `CP_SEX`
- `CP_NATIONAL_ORIGIN`
- `CP_DOB`
- `HISPANIC_CP`
- `CP_RACE_STRING`
- `R_NAICS_CODE`
- `R_NAICS_DESCRIPTION`
- `R_NUMBER_OF_EMPLOYEES`
- `R_TYPE`
Regarding the data, the following notes were provided by an EEOC spokesperson:
> CP_National_Origin: We greatly expanded the national origin options in 2008. Prior to that, this field will most likely be blank or “Other National Origin”.
> CP Hispanic_CP: This field is populated with “Y” if the Charging Party has identified themselves as Hispanic but, like the National Origin, this field was added in 2008.
> CP_Race_String: Charging Parties may select multiple races that they identify with. This string includes all selected races as a string of codes. See below for code decryption.
R_Type: This is the basic type of respondent (Private, State/Local Agency, School, etc.)
> Race Codes:
- A — Asian or Pacific Islander - Obsolete
- B — Black or African American
- H — Native Hawaiian or Other Pacific Islander
- I — American Indian or Alaska Native
- N — Unable to Obtain Information from Charging Party
- O — Other Race - Obsolete
- S — Asian
- W — White
- Z — Charging Party Declined to Provide
### Economic data
The industry and sector metrics — on the total workforce, female workforce, and average hourly earnings — use seasonally-adjusted summary data from the [Bureau of Labor Statistics](https://www.bls.gov) (BLS).
No single BLS dataset contains those metrics for every industry and sector. The numbers were chiefly sourced from the [Current Employment Statistics](https://www.bls.gov/ces/) survey and [Occupational Employment Statistics](https://www.bls.gov/oes/current/oes_stru.htm) program, in that order of preference.
The Current Employment Survey data can be accessed through [this data portal](https://data.bls.gov/cgi-bin/dsrv?ce). The Occupational Employment Statistics data come from the "National industry-specific and by ownership" download [here](https://www.bls.gov/oes/tables.htm), specifically `natsector_M2016_dl.xlsx`.
For one sector (agriculture), workforce gender was sourced from the [Current Population Survey](https://www.bls.gov/cps/cpsaat11.htm).
### NAICS descriptions
NAICS sector descriptions came from the [Census Bureau](https://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2017), and were supplemented by [this guide to "NAICS Supersectors" for the CES data](https://www.bls.gov/sae/saesuper.htm).
## Code
This repository uses Python code to process the data. That code can be found in the following two notebooks:
### [`01-merge-bls-data.ipynb`](notebooks/01-merge-bls-data.ipynb)
- Merges the economic data described above, and combines it with the NAICS descriptions
- Merges that data with additional, manually curated information needed for the graphics
### [`02-analyze-eeoc-claims.ipynb`](notebooks/02-analyze-eeoc-claims.ipynb)
- Combines the three separate spreadsheets supplied by the EEOC into one data table
- Aggregates the claims by industry and sector
- Merges the industry aggregates with the economic data described above
- Calculates the gender distribution of EEOC claims, to support this passage: "Overall, 83% of the claims were were filed by women, and 15% by men. The remainder did not specify a gender."
- Generates summary data for the article's graphics: `d3_claims_by_industry.csv` and `d3_claims_by_sector.csv`
## Feedback / Questions?
Contact Lam Thuy Vo at lam.vo@buzzfeed.com.
Looking for more from BuzzFeed News? [Click here for a list of our open-sourced projects, data, and code](https://github.com/BuzzFeedNews/everything).
近期下载者:
相关文件:
收藏者: