• u2_259629
    了解作者
  • 7.8MB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-06-13 05:47
    上传日期
SQL家庭作业-员工数据库:两部分之谜 背景 这是一个美丽的春天,距您被聘为Pewlett Hackard的新数据工程师已有两个星期。 您的第一个主要任务是对1980年代和1990年代公司员工的研究项目。 从那个时期起,雇员数据库中剩下的全部是六个CSV文件。 在此分配中,您将设计表以将数据保存在CSV中,将CSV导入到SQL数据库中,并回答有关数据的问题。 换句话说,您将执行: 数据工程 数据分析 注意:您可能会听到用术语“数据建模”代替“数据工程”,但是它们是相同的术语。 数据工程是更现代的措辞,而不是数据建模。 在你开始之前 为此项目创建一个名为sql-challenge的新存储库。 不要将此作业添加到现有存储库中。 将新的存储库克隆到您的计算机。 在本地git存储库中,为SQL挑战创建一个目录。 使用文件夹名称来对应挑战: EmployeeSQL 。 将文件添加到此文件夹
sql-challenge-main.zip
  • sql-challenge-main
  • EmployeeSQL
  • data
  • employees.csv
    14.6MB
  • titles.csv
    154B
  • dept_manager.csv
    326B
  • salaries.csv
    3.9MB
  • departments.csv
    210B
  • dept_emp.csv
    4MB
  • data-analysis.sql
    1.1KB
  • ERD-table-schema.png
    81KB
  • table-schemata.sql
    1.9KB
  • README.md
    5.3KB
内容介绍
# SQL Homework - Employee Database: A Mystery in Two Parts ## Background It is a beautiful spring day, and it is two weeks since you have been hired as a new data engineer at Pewlett Hackard. Your first major task is a research project on employees of the corporation from the 1980s and 1990s. All that remain of the database of employees from that period are six CSV files. In this assignment, you will design the tables to hold data in the CSVs, import the CSVs into a SQL database, and answer questions about the data. In other words, you will perform: 1. Data Engineering 3. Data Analysis Note: You may hear the term "Data Modeling" in place of "Data Engineering," but they are the same terms. Data Engineering is the more modern wording instead of Data Modeling. ### Before You Begin 1. Create a new repository for this project called `sql-challenge`. **Do not add this homework to an existing repository**. 2. Clone the new repository to your computer. 3. Inside your local git repository, create a directory for the SQL challenge. Use a folder name to correspond to the challenge: **EmployeeSQL**. 4. Add your files to this folder. 5. Push the above changes to GitHub. ## Instructions #### Data Modeling Inspect the CSVs and sketch out an ERD of the tables. Feel free to use a tool like [http://www.quickdatabasediagrams.com](http://www.quickdatabasediagrams.com). #### Data Engineering * Use the information you have to create a table schema for each of the six CSV files. Remember to specify data types, primary keys, foreign keys, and other constraints. * For the primary keys check to see if the column is unique, otherwise create a [composite key](https://en.wikipedia.org/wiki/Compound_key). Which takes to primary keys in order to uniquely identify a row. * Be sure to create tables in the correct order to handle foreign keys. * Import each CSV file into the corresponding SQL table. **Note** be sure to import the data in the same order that the tables were created and account for the headers when importing to avoid errors. #### Data Analysis Once you have a complete database, do the following: 1. List the following details of each employee: employee number, last name, first name, sex, and salary. 2. List first name, last name, and hire date for employees who were hired in 1986. 3. List the manager of each department with the following information: department number, department name, the manager's employee number, last name, first name. 4. List the department of each employee with the following information: employee number, last name, first name, and department name. 5. List first name, last name, and sex for employees whose first name is "Hercules" and last names begin with "B." 6. List all employees in the Sales department, including their employee number, last name, first name, and department name. 7. List all employees in the Sales and Development departments, including their employee number, last name, first name, and department name. 8. In descending order, list the frequency count of employee last names, i.e., how many employees share each last name. ## Bonus (Optional) As you examine the data, you are overcome with a creeping suspicion that the dataset is fake. You surmise that your boss handed you spurious data in order to test the data engineering skills of a new employee. To confirm your hunch, you decide to take the following steps to generate a visualization of the data, with which you will confront your boss: 1. Import the SQL database into Pandas. (Yes, you could read the CSVs directly in Pandas, but you are, after all, trying to prove your technical mettle.) This step may require some research. Feel free to use the code below to get started. Be sure to make any necessary modifications for your username, password, host, port, and database name: ```sql from sqlalchemy import create_engine engine = create_engine('postgresql://localhost:5432/<your_db_name>') connection = engine.connect() ``` * Consult [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/latest/core/engines.html#postgresql) for more information. * If using a password, do not upload your password to your GitHub repository. See [https://www.youtube.com/watch?v=2uaTPmNvH0I](https://www.youtube.com/watch?v=2uaTPmNvH0I) and [https://help.github.com/en/github/using-git/ignoring-files](https://help.github.com/en/github/using-git/ignoring-files) for more information. 2. Create a histogram to visualize the most common salary ranges for employees. 3. Create a bar chart of average salary by title. ## Epilogue Evidence in hand, you march into your boss's office and present the visualization. With a sly grin, your boss thanks you for your work. On your way out of the office, you hear the words, "Search your ID number." You look down at your badge to see that your employee ID number is 499942. ## Submission * Create an image file of your ERD. * Create a `.sql` file of your table schemata. * Create a `.sql` file of your queries. * (Optional) Create a Jupyter Notebook of the bonus analysis. * Create and upload a repository with the above files to GitHub and post a link on BootCamp Spot. * Ensure your repository has regular commits (i.e. 20+ commits) and a thorough README.md file ### Copyright © 2021 Trilogy Education Services, LLC, a 2U, Inc. brand. Confidential and Proprietary. All Rights Reserved.
评论
    相关推荐
    • GitLearning
      GitLearning 此存储库供git学习更多评论
    • git-em-all:ClonePull尽可能快地存储git存储库的数组
      git-em-all :warning: 工作正在进行中 尽可能快地克隆/拉取git repostories数组。 安装 npm install git-em-all 用法 var gitEmAll = require ( 'git-em-all' ) 贡献 欢迎捐款! 请先阅读。 执照
    • challenge-git
      离群工程Git挑战 在Outlier,您将能够提供功能和修复程序而不会引起冲突和其他版本控制麻烦。 保持代码库整洁的重要工具是git rebase 。 这项挑战将测试您对...将您的新存储库设置为源: git remote set-url origin $
    • challenge-git
      离群工程Git挑战 在Outlier,您将能够提供功能和修复程序而不会引起冲突和其他版本控制麻烦。 保持代码库整洁的重要工具是git rebase 。 这项挑战将测试您对...将您的新存储库设置为源: git remote set-url origin $
    • git-troubles
      git reset file_name HEAD 重置文件内容(对该文件的所有更改都将丢失) git checkout -- file_name 将整个存储库重置为上一次提交的状态(所有本地更改都将丢失) git reset --hard HEAD 删除最后的提交,但...
    • test:Git 存储库测试
      测试 这是文件自述文件,我将做一些更改以学习如何使用 GitGit 存储库测试
    • weijie:git项目存储
      weijie git项目存储
    • 挑战git
      离群工程Git挑战 在Outlier,您将能够提供功能和修复程序而不会引起冲突和其他版本控制麻烦。 保持代码库干净的重要工具是git rebase 。...将您的新存储库设置为源: git remote set-url origin ${y
    • git-challenge
      离群工程Git挑战 在Outlier,您将能够提供功能和修复程序而不会引起冲突和其他版本控制麻烦。 保持代码库整洁的重要工具是git rebase 。 这项挑战将测试您对...将您的新存储库设置为源: git remote set-url origin $
    • 挑战git
      离群工程Git挑战 在Outlier,您将能够提供功能和修复程序而不会引起冲突和其他版本控制麻烦。 保持代码库干净的重要工具是git rebase 。...将您的新存储库设置为源: git remote set-url origin ${y