2017-12-eeoc-harassment-charges

所属分类:数据挖掘/数据仓库
开发工具:Jupyter Notebook
文件大小:10304KB
下载次数:0
上传日期:2017-12-05 19:09:21
上 传 者sh-1993
说明:  BuzzFeed新闻文章的数据和分析,“我们获得了关于工作场所性骚扰20年的政府数据...
(Data and analysis for the BuzzFeed News article, "We Got Government Data On 20 Years Of Workplace Sexual Harassment Claims. These Charts Break It Down," published Dec. 5, 2017.)

文件列表:
data (0, 2017-12-06)
data\CES_avg_hrly_earnings.csv (53610, 2017-12-06)
data\CES_total_employment.csv (74947, 2017-12-06)
data\CES_women_employment.csv (53071, 2017-12-06)
data\SH Charge Receipts.xlsx (9346511, 2017-12-06)
data\graphics_info.csv (7587, 2017-12-06)
data\naics_sectors.csv (994, 2017-12-06)
data\oes-natsector_M2016_dl.xlsx (2336116, 2017-12-06)
notebooks (0, 2017-12-06)
notebooks\01-merge-bls-data.ipynb (93591, 2017-12-06)
notebooks\02-analyze-eeoc-claims.ipynb (70302, 2017-12-06)
output (0, 2017-12-06)
output\bls_sector_metrics.csv (2139, 2017-12-06)
output\ces_industry_metrics.csv (50046, 2017-12-06)
output\d3_claims_by_industry.csv (324247, 2017-12-06)
output\d3_claims_by_sector.csv (7913, 2017-12-06)
output\d3_information_naics_bls_info.csv (9421, 2017-12-06)
output\graphics_info_bls.csv (9421, 2017-12-06)

# Sexual Harassment Charges Analysis — Oct. 1995 to Sept. 2016 This repository contains data, analytic code, and findings that support portions of the BuzzFeed News article, "[We Got Government Data On 20 Years Of Workplace Sexual Harassment Claims. These Charts Break It Down](https://www.buzzfeed.com/lamvo/eeoc-sexual-harassment-data?utm_term=.mlLxr4pR7#.reAEQ4VD1)," published Dec. 5, 2017. Please read that article, which contains important context and details, before proceeding. ## Data ### Sexual harassment charges Anonymized data of sexual harassment charges filed to the [U.S. Equal Employment Opportunity Commission](https://www.eeoc.gov/) (EEOC) were provided by a spokesperson from the commission. Data includes the following header: - `CHARGE_FILING_DATE` - `CP_SEX` - `CP_NATIONAL_ORIGIN` - `CP_DOB` - `HISPANIC_CP` - `CP_RACE_STRING` - `R_NAICS_CODE` - `R_NAICS_DESCRIPTION` - `R_NUMBER_OF_EMPLOYEES` - `R_TYPE` Regarding the data, the following notes were provided by an EEOC spokesperson: > CP_National_Origin: We greatly expanded the national origin options in 2008. Prior to that, this field will most likely be blank or “Other National Origin”. > CP Hispanic_CP: This field is populated with “Y” if the Charging Party has identified themselves as Hispanic but, like the National Origin, this field was added in 2008. > CP_Race_String: Charging Parties may select multiple races that they identify with. This string includes all selected races as a string of codes. See below for code decryption. R_Type: This is the basic type of respondent (Private, State/Local Agency, School, etc.) > Race Codes: - A — Asian or Pacific Islander - Obsolete - B — Black or African American - H — Native Hawaiian or Other Pacific Islander - I — American Indian or Alaska Native - N — Unable to Obtain Information from Charging Party - O — Other Race - Obsolete - S — Asian - W — White - Z — Charging Party Declined to Provide ### Economic data The industry and sector metrics — on the total workforce, female workforce, and average hourly earnings — use seasonally-adjusted summary data from the [Bureau of Labor Statistics](https://www.bls.gov) (BLS). No single BLS dataset contains those metrics for every industry and sector. The numbers were chiefly sourced from the [Current Employment Statistics](https://www.bls.gov/ces/) survey and [Occupational Employment Statistics](https://www.bls.gov/oes/current/oes_stru.htm) program, in that order of preference. The Current Employment Survey data can be accessed through [this data portal](https://data.bls.gov/cgi-bin/dsrv?ce). The Occupational Employment Statistics data come from the "National industry-specific and by ownership" download [here](https://www.bls.gov/oes/tables.htm), specifically `natsector_M2016_dl.xlsx`. For one sector (agriculture), workforce gender was sourced from the [Current Population Survey](https://www.bls.gov/cps/cpsaat11.htm). ### NAICS descriptions NAICS sector descriptions came from the [Census Bureau](https://www.census.gov/cgi-bin/sssd/naics/naicsrch?chart=2017), and were supplemented by [this guide to "NAICS Supersectors" for the CES data](https://www.bls.gov/sae/saesuper.htm). ## Code This repository uses Python code to process the data. That code can be found in the following two notebooks: ### [`01-merge-bls-data.ipynb`](notebooks/01-merge-bls-data.ipynb) - Merges the economic data described above, and combines it with the NAICS descriptions - Merges that data with additional, manually curated information needed for the graphics ### [`02-analyze-eeoc-claims.ipynb`](notebooks/02-analyze-eeoc-claims.ipynb) - Combines the three separate spreadsheets supplied by the EEOC into one data table - Aggregates the claims by industry and sector - Merges the industry aggregates with the economic data described above - Calculates the gender distribution of EEOC claims, to support this passage: "Overall, 83% of the claims were were filed by women, and 15% by men. The remainder did not specify a gender." - Generates summary data for the article's graphics: `d3_claims_by_industry.csv` and `d3_claims_by_sector.csv` ## Feedback / Questions? Contact Lam Thuy Vo at lam.vo@buzzfeed.com. Looking for more from BuzzFeed News? [Click here for a list of our open-sourced projects, data, and code](https://github.com/BuzzFeedNews/everything).

近期下载者

相关文件


收藏者