RecruitInfoCrawlAndDisplay
所属分类:数据采集/爬虫
开发工具:JavaScript
文件大小:7706KB
下载次数:0
上传日期:2018-01-26 07:02:49
上 传 者:
sh-1993
说明: 前程无忧、智联招聘等招聘信息的抓取与分析,采用Scrapy- Redis+Django+MySQL+Celery+HTML5+JavaScript+Echart等技术。
(The capture and analysis of recruitment information such as 51job and Zhilian recruitment uses Scrapy Redis+Django+MySQL+Celery+HTML5+JavaScript+Echart and other technologies.)
文件列表:
JobEvaluating (0, 2018-01-26)
JobEvaluating\.idea (0, 2018-01-26)
JobEvaluating\.idea\JobEvaluating.iml (1362, 2018-01-26)
JobEvaluating\.idea\libraries (0, 2018-01-26)
JobEvaluating\.idea\libraries\R_User_Library.xml (123, 2018-01-26)
JobEvaluating\.idea\misc.xml (205, 2018-01-26)
JobEvaluating\.idea\modules.xml (278, 2018-01-26)
JobEvaluating\.idea\workspace.xml (18156, 2018-01-26)
JobEvaluating\App (0, 2018-01-26)
JobEvaluating\App\__init__.py (0, 2018-01-26)
JobEvaluating\App\admin.py (63, 2018-01-26)
JobEvaluating\App\apps.py (122, 2018-01-26)
JobEvaluating\App\handlesql.py (13981, 2018-01-26)
JobEvaluating\App\migrations (0, 2018-01-26)
JobEvaluating\App\migrations\0001_initial.py (2073, 2018-01-26)
JobEvaluating\App\migrations\0002_auto_20180125_2310.py (446, 2018-01-26)
JobEvaluating\App\migrations\__init__.py (0, 2018-01-26)
JobEvaluating\App\models.py (1347, 2018-01-26)
JobEvaluating\App\tasks.py (1628, 2018-01-26)
JobEvaluating\App\tests.py (60, 2018-01-26)
JobEvaluating\App\views.py (3370, 2018-01-26)
JobEvaluating\JobCrawl (0, 2018-01-26)
JobEvaluating\JobCrawl\.idea (0, 2018-01-26)
JobEvaluating\JobCrawl\.idea\.name (8, 2018-01-26)
JobEvaluating\JobCrawl\.idea\JobCrawl.iml (398, 2018-01-26)
JobEvaluating\JobCrawl\.idea\encodings.xml (159, 2018-01-26)
JobEvaluating\JobCrawl\.idea\misc.xml (687, 2018-01-26)
JobEvaluating\JobCrawl\.idea\modules.xml (268, 2018-01-26)
JobEvaluating\JobCrawl\.idea\workspace.xml (34956, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\__init__.py (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\commands (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\commands\__init__.py (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\commands\crawlall.py (1478, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib\__init__.py (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib\google_cache.py (2386, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib\rotate_useragent.py (2863, 2018-01-26)
... ...
# 基于Web爬虫的IT行业需求信息分析系统
------
临近毕业,找工作不易,所以对行业内的招聘信息做了个简单的分析,主要面向学历、城市、工作要求关键字等进行数据分析,并将其结果展示在Web端。该项目主要采用Python27、Django、Scrapy、Redis、Celery、Mysql、jieba分词、echarts、Bootstrap、jQuery等,其中Redis、Celery主要将Scrapy框架和Django框架结合,起到每隔一段时间更新相关分析图。
## 环境依赖
> * Python环境: Django1.8以上+Beatuifulsoup4(4.5.1)+Celery(3.1.25)+Django-celery(3.1.17)+lxml+MysqlDB+redis(2.10.5)+Scrapy(1.2.0)+scrapy-redis(0.6.3)+Unipath(1.1)+Twisted(16.6.0)
> * 数据库环境:MySQL5以上、Redis 3以上
> * 操作系统: Windows XP以上
## 数据来源
> * [百才招聘网](http://wuhan.baicai.com/)
> * [中华英才网](http://www.chinahr.com/wuhan/)
> * [前程无忧](http://www.51job.com/)
> * [智联招聘](https://www.zhaopin.com/)
## 信息格式
**网址**、**工作**、**工作类别**、**平均月薪**、**公司名称**、**工作地点**、**工作经验**、**学历**、**学位**、**职业描述**
## 项目架构图
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/frame.png)
## 部署方式
### 1. 安装所需要的库
+ bs4
+ scrapy
+ redis
+ scrapy-redis
+ pywin32
+ jieba
+ MySQLdb
+ django
+ celery(3.1.25)[ **windows不支持4**]
+ unipath
+ django-celery
## 2. 启动相关服务
> **Redis服务**
> **MySQL服务**
### 3. 配置相关环境
+ 配置数据库环境
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/mysql_django.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/scrapy_databases.png)
+ 配置爬虫定时执行时间
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/celery_django.png)
+ 压入初始URL到Redis中(JobCrawl/lpush.py)
### 4. 启动程序
+ 启动定时任务
+ **celery -A JobEvaluating worker --loglevel=INFO**
+ **celery -A JobEvaluating beat -s celerybeat-schedule**
+ 同步数据库
+ **python manage.py makemigrations**
+ **python manage.py migrate**
+ 启动Django(python manage.py runserver)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/index.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/index1.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/index2.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/display.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/search.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/analyse.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/asks.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/job_hot.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/lan.png)
![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/salary.png)
近期下载者:
相关文件:
收藏者: