RecruitInfoCrawlAndDisplay

所属分类:数据采集/爬虫
开发工具:JavaScript
文件大小:7706KB
下载次数:0
上传日期:2018-01-26 07:02:49
上 传 者sh-1993
说明:  前程无忧、智联招聘等招聘信息的抓取与分析,采用Scrapy- Redis+Django+MySQL+Celery+HTML5+JavaScript+Echart等技术。
(The capture and analysis of recruitment information such as 51job and Zhilian recruitment uses Scrapy Redis+Django+MySQL+Celery+HTML5+JavaScript+Echart and other technologies.)

文件列表:
JobEvaluating (0, 2018-01-26)
JobEvaluating\.idea (0, 2018-01-26)
JobEvaluating\.idea\JobEvaluating.iml (1362, 2018-01-26)
JobEvaluating\.idea\libraries (0, 2018-01-26)
JobEvaluating\.idea\libraries\R_User_Library.xml (123, 2018-01-26)
JobEvaluating\.idea\misc.xml (205, 2018-01-26)
JobEvaluating\.idea\modules.xml (278, 2018-01-26)
JobEvaluating\.idea\workspace.xml (18156, 2018-01-26)
JobEvaluating\App (0, 2018-01-26)
JobEvaluating\App\__init__.py (0, 2018-01-26)
JobEvaluating\App\admin.py (63, 2018-01-26)
JobEvaluating\App\apps.py (122, 2018-01-26)
JobEvaluating\App\handlesql.py (13981, 2018-01-26)
JobEvaluating\App\migrations (0, 2018-01-26)
JobEvaluating\App\migrations\0001_initial.py (2073, 2018-01-26)
JobEvaluating\App\migrations\0002_auto_20180125_2310.py (446, 2018-01-26)
JobEvaluating\App\migrations\__init__.py (0, 2018-01-26)
JobEvaluating\App\models.py (1347, 2018-01-26)
JobEvaluating\App\tasks.py (1628, 2018-01-26)
JobEvaluating\App\tests.py (60, 2018-01-26)
JobEvaluating\App\views.py (3370, 2018-01-26)
JobEvaluating\JobCrawl (0, 2018-01-26)
JobEvaluating\JobCrawl\.idea (0, 2018-01-26)
JobEvaluating\JobCrawl\.idea\.name (8, 2018-01-26)
JobEvaluating\JobCrawl\.idea\JobCrawl.iml (398, 2018-01-26)
JobEvaluating\JobCrawl\.idea\encodings.xml (159, 2018-01-26)
JobEvaluating\JobCrawl\.idea\misc.xml (687, 2018-01-26)
JobEvaluating\JobCrawl\.idea\modules.xml (268, 2018-01-26)
JobEvaluating\JobCrawl\.idea\workspace.xml (34956, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\__init__.py (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\commands (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\commands\__init__.py (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\commands\crawlall.py (1478, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib\__init__.py (0, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib\google_cache.py (2386, 2018-01-26)
JobEvaluating\JobCrawl\JobCrawl\contrib\rotate_useragent.py (2863, 2018-01-26)
... ...

# 基于Web爬虫的IT行业需求信息分析系统 ------ 临近毕业,找工作不易,所以对行业内的招聘信息做了个简单的分析,主要面向学历、城市、工作要求关键字等进行数据分析,并将其结果展示在Web端。该项目主要采用Python27、Django、Scrapy、Redis、Celery、Mysql、jieba分词、echarts、Bootstrap、jQuery等,其中Redis、Celery主要将Scrapy框架和Django框架结合,起到每隔一段时间更新相关分析图。 ## 环境依赖 > * Python环境: Django1.8以上+Beatuifulsoup4(4.5.1)+Celery(3.1.25)+Django-celery(3.1.17)+lxml+MysqlDB+redis(2.10.5)+Scrapy(1.2.0)+scrapy-redis(0.6.3)+Unipath(1.1)+Twisted(16.6.0) > * 数据库环境:MySQL5以上、Redis 3以上 > * 操作系统: Windows XP以上 ## 数据来源 > * [百才招聘网](http://wuhan.baicai.com/) > * [中华英才网](http://www.chinahr.com/wuhan/) > * [前程无忧](http://www.51job.com/) > * [智联招聘](https://www.zhaopin.com/) ## 信息格式 **网址**、**工作**、**工作类别**、**平均月薪**、**公司名称**、**工作地点**、**工作经验**、**学历**、**学位**、**职业描述** ## 项目架构图 ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/frame.png) ## 部署方式 ### 1. 安装所需要的库 + bs4 + scrapy + redis + scrapy-redis + pywin32 + jieba + MySQLdb + django + celery(3.1.25)[ **windows不支持4**] + unipath + django-celery ## 2. 启动相关服务 > **Redis服务** > **MySQL服务** ### 3. 配置相关环境 + 配置数据库环境 ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/mysql_django.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/scrapy_databases.png) + 配置爬虫定时执行时间 ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/celery_django.png) + 压入初始URL到Redis中(JobCrawl/lpush.py) ### 4. 启动程序 + 启动定时任务 + **celery -A JobEvaluating worker --loglevel=INFO** + **celery -A JobEvaluating beat -s celerybeat-schedule** + 同步数据库 + **python manage.py makemigrations** + **python manage.py migrate** + 启动Django(python manage.py runserver) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/index.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/index1.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/index2.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/display.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/search.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/analyse.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/asks.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/job_hot.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/lan.png) ![](https://github.com/CaryXiang/Information-Analysis-system-of-IT-Industry-requirement-based-on-Web-crawler/blob/master/imgs/salary.png)

近期下载者

相关文件


收藏者