matlab接收字符代码-InsightDataEngineering:InsightDataEngineering

  • q8_123527
    了解作者
  • 14.8KB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-04-14 04:04
    上传日期
matlab接收字符代码目录 介绍 您是一位为政治顾问工作的数据工程师,其客户是资金短缺的政治候选人。 他们要求帮助分析竞选捐款中的忠诚度趋势,即确定重复捐助者的领域并计算他们的支出额。 联邦选举委员会定期发布竞选捐款,尽管您不想从这些文件中提取特定捐助者(因为将这些信息用于筹款或商业目的是非法的),但您想要确定可能是以下来源的区域(邮政编码)重复广告系列贡献。 挑战总结 对于这个挑战,我们要求您制作一个文件,列出多年的单个竞选捐款,确定哪些来自重复捐助者,计算一些值,然后将结果提取到单个输出文件repeat_donors.txt 。 对于每个收件人,邮政编码和日历年,计算来自重复捐助者的捐款的以下三个值: 收到的总美元 收到的捐款总数 给定百分位数的捐赠金额 政治顾问主要对多年来捐款的捐助者感兴趣,他们担心数据中可能存在离群值。 因此,他们要求您的程序允许使用可变百分位数。 这样,程序可以在一次运行中计算中位数(或第50个百分位数),而在另一次运行中计算第99个百分位数。 另一位开发人员已负责通过仪表板构建图形用户界面,该仪表板显示了有关重复捐赠者的最新指标。 您在项目中的角色是在数
InsightDataEngineering-master.zip
内容介绍
# Table of Contents 1. [Introduction](README.md#introduction) 2. [Challenge summary](README.md#challenge-summary) 3. [Details of challenge](README.md#details-of-challenge) 4. [Input files](README.md#input-files) 5. [Output file](README.md#output-file) 6. [Percentile computation](README.md#percentile-computation) 7. [Example](README.md#example) 8. [Writing clean, scalable and well-tested code](README.md#writing-clean-scalable-and-well-tested-code) 9. [Repo directory structure](README.md#repo-directory-structure) 10. [Testing your directory structure and output format](README.md#testing-your-directory-structure-and-output-format) 11. [Instructions to submit your solution](README.md#instructions-to-submit-your-solution) 12. [FAQ](README.md#faq) # Introduction You’re a data engineer working for political consultants whose clients are cash-strapped political candidates. They've asked for help analyzing loyalty trends in campaign contributions, namely identifying areas of repeat donors and calculating how much they're spending. The Federal Election Commission regularly publishes campaign contributions, and while you don’t want to pull specific donors from those files — because using that information for fundraising or commercial purposes is illegal — you want to identify areas (zip codes) that could be sources of repeat campaign contributions. # Challenge summary For this challenge, we're asking you to take a file listing individual campaign contributions for multiple years, determine which ones came from repeat donors, calculate a few values and distill the results into a single output file, `repeat_donors.txt`. For each recipient, zip code and calendar year, calculate these three values for contributions coming from repeat donors: * total dollars received * total number of contributions received * donation amount in a given percentile The political consultants, who are primarily interested in donors who have contributed in multiple years, are concerned about possible outliers in the data. So they have asked that your program allow for a variable percentile. That way the program could calculate the median (or the 50th percentile) in one run and the 99th percentile in another. Another developer has been placed in charge of building the graphical user interface with a dashboard showing the latest metrics on repeat donors, among other things. Your role on the project is to work on the data pipeline that will hand off the information to the front-end. As the backend data engineer, you do **not** need to display the data or work on the dashboard but you do need to provide the information. You can assume there is another process that takes what is written to the output file and sends it to the front-end. If we were building this pipeline in real life, we’d probably have another mechanism to send the output to the GUI rather than writing to a file. However, for the purposes of grading this challenge, we just want you to write the output to files. # Details of challenge You’re given two input files. 1. `percentile.txt`, holds a single value -- the percentile value (1-100) that your program will be asked to calculate. 2. `itcont.txt`, has a line for each campaign contribution that was made on a particular date from a donor to a political campaign, committee or other similar entity. Out of the many fields listed on the pipe-delimited lines of `itcont.txt` file, you’re primarily interested in the contributor's name, zip code associated with the donor, amount contributed, date of the transaction and ID of the recipient. #### Identifying repeat donors For the purposes of this challenge, if a donor had previously contributed to any recipient listed in the `itcont.txt` file in any prior calendar year, that donor is considered a repeat donor. Also, for the purposes of this challenge, you can assume two contributions are from the same donor if the names and zip codes are identical. #### Calculations Each line of `itcont.txt` should be treated as a record. Your code should process each line as if that record was sequentially streaming into your program. In other words, your program processes every line of `itcont.txt` in the same order as it is listed in the file. For each record that you identify as coming from a donor who has contributed to a campaign in a prior calendar year, calculate the running percentile of contributions from repeat donors, total number of transactions from repeat donors and total amount of donations streaming in from repeat donors so far for that calendar year, recipient and zip code. Write the calculated fields out onto a pipe-delimited line and then print it to an output file named `repeat_donors.txt` in the same order as the donation appeared in the input file. ## Input files The Federal Election Commission provides data files stretching back years and is [regularly updated](http://classic.fec.gov/finance/disclosure/ftpdet.shtml). For the purposes of this challenge, we’re interested in individual contributions. While you're welcome to run your program using the data files found at the FEC's website, you should not assume that we'll be testing your program on any of those data files or that the lines will be in the same order as what can be found in those files. Our test data files, however, will conform to the data dictionary [as described by the FEC](http://classic.fec.gov/finance/disclosure/metadata/DataDictionaryContributionsbyIndividuals.shtml). Also, while there are many fields in the file that may be interesting, below are the ones that you’ll need to complete this challenge: * `CMTE_ID`: identifies the flier, which for our purposes is the recipient of this contribution * `NAME`: name of the donor * `ZIP_CODE`: zip code of the contributor (we only want the first five digits/characters) * `TRANSACTION_DT`: date of the transaction * `TRANSACTION_AMT`: amount of the transaction * `OTHER_ID`: a field that denotes whether contribution came from a person or an entity ### Input file considerations Here are some considerations to keep in mind: 1. While the data dictionary has the `ZIP_CODE` occupying nine characters, for the purposes of the challenge, we only consider the first five characters of the field as the zip code 2. Because the data set doesn't contain a unique donor id, you should use the combination of `NAME` and `ZIP_CODE` (again, first five digits) to identify a unique donor 3. For the purposes of this challenge, you can assume the input file follows the data dictionary noted by the FEC for the 2015-current election years, although you should not assume the year field holds any particular value 4. The transactions noted in the input file are not in any particular order, and in fact, can be out of order chronologically 5. Because we are only interested in individual contributions, we only want records that have the field, `OTHER_ID`, set to empty. If the `OTHER_ID` field contains any other value, you should completely ignore and skip the entire record 6. Other situations you can completely ignore and skip an entire record: * If `TRANSACTION_DT` is an invalid date (e.g., empty, malformed) * If `ZIP_CODE` is an invalid zip code (i.e., empty, fewer than five digits) * If the `NAME` is an invalid name (e.g., empty, malformed) * If any lines in the input file contains empty cells in the `CMTE_ID` or `TRANSACTION_AMT` fields Except for the considerations noted above with respect to `CMTE_ID`, `NAME`, `ZIP_CODE`, `TRANSACTION_DT`, `TRANSACTION_AMT`, `OTHER_ID`, data in any of the other fields (whether the data is valid, malformed, or empty) should not affect your processing. That is, as long as the previously noted considerations apply, you should process the record as if it was a valid, newly arriving transaction. (For instance, campaigns sometimes retransmit transactions as amendments, however, for the purposes of this challenge, you can ignore that distinction and treat all of the
评论
    相关推荐
    • Matlab合集
      冈萨雷斯数字图像处理MATLAB版.中文版+数字图像处理第二版中文版(冈萨雷斯)+MATLAB-R2014a完全自学一本通+MATLAB R2016a完全自学一本通 素材文件+[模式识别与智能计算:MATLAB技术实现(第2版)].杨淑莹.扫描版
    • MATLAB教程
      MATLAB教程MATLAB教程MATLAB教程MATLAB教程MATLAB教程MATLAB教程
    • MATLAB
      MATLAB 该项目是在matlab上完成的,涉及创建和移动宇宙飞船和机器人。 太空飞船和机器人是使用Matlab中的简单几何形状创建的,并通过连续变换矩阵进行移动。 这个项目教我如何使用变换矩阵(旋转,平移等)的概念...
    • MATLAB基础
      一本学习matlab的一本好书
    • MATLAB编译器
      基于MATLAB 2018b版本介绍MATLAB编译器。介绍如何利用编译器将MATLAB代码编译为独立应用程序或组件,并在没有安装MATLAB的计算机上进行部署。
    • matlabruntime
      通过奇点容器运行您的matlab项目 可以在没有MATLAB的容器中运行matlab代码。 为此,我们首先需要通过Matlab编译器在本地构建相应的Matlab代码的独立应用程序,然后使用具有Matlab运行时( 的容器)运行该应用程序 )...
    • matlab实现
      matlab实现 matlab实现matlab实现matlab实现matlab实现
    • matlab 教程
      matlab 信号处理资料,里面包含信号处理pdf文档,一些杂乱的程序和命令等
    • matlab教程
      matlab教程,PPT格式,包含 matlab基本知识、matlab入门、matlab作图、线性规划、无约束优化、非线性规划、统计工具箱、差值、微分方程等多项知识点,并且每个知识点独立成为PPT,内还含有matlab信号处理详解等文档...
    • matlab简介
      1.MATLAB 开发环境 1.1 MATLAB 的视窗环境 进入MATLAB之后,会看到一个视窗MATLAB Command Window称为指令视窗,它是你键入指令的地方同时 MATLAB也将计算结果显示在此。 1.2 简易计算 我们先从MATLAB的...