BaiduIndexSpider

所属分类:数据采集/爬虫
开发工具:Java
文件大小:163KB
下载次数:0
上传日期:2022-09-01 23:07:03
上 传 者sh-1993
说明:  百度指数爬虫 [http: index.baidu.com v2 main index.html# trend 北京房价 words=北京房价("北京房价"为例)](http: index.baidu.com v2 main index.html# trend %E5%8C%97%E4%BA%AC%E6%88%BF%E4%BB%B7 words=%E5%8C%97%E4%BA%AC%E6%88%BF%E4%BB%B7%EF%BC%88%22%E5%8C%97%E4%BA%AC%E6%88%BF%E4%BB%B7%22%E4%B8%BA%E4%BE%8B%EF%BC%89)
(Baidu Index Crawler [http: index. baidu.com v2 main index. html # trend Beijing housing prices words=Beijing housing prices ("Beijing housing prices" as an example)] (http: index. baidu.com v2 main index. html # trend% E5% 8C% 97% E4% BA% AC% E6% 88% BF% E4% BB% B7 words=% E5% 8C% 97% E4% BA% AC% E6% 88% BF% E4% BB% B7% B7% EF% BC% 88% 22% E5% 8C% 97% E4% BA% AC% E6% 88% BF% E4% BB% B7% 22% E4% B8% BA% E4% BE% 8B% EF% BC% 89))

文件列表:
.idea (0, 2020-03-25)
.idea\compiler.xml (729, 2020-03-25)
.idea\encodings.xml (172, 2020-03-25)
.idea\libraries (0, 2020-03-25)
.idea\libraries\Maven__antlr_antlr_2_7_7.xml (450, 2020-03-25)
.idea\libraries\Maven__asm_asm_3_3_1.xml (428, 2020-03-25)
.idea\libraries\Maven__cglib_cglib_2_2_2.xml (450, 2020-03-25)
.idea\libraries\Maven__ch_qos_logback_logback_classic_1_2_3.xml (556, 2020-03-25)
.idea\libraries\Maven__ch_qos_logback_logback_core_1_2_3.xml (535, 2020-03-25)
.idea\libraries\Maven__com_codeborne_phantomjsdriver_1_3_0.xml (552, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_classmate_1_3_4.xml (510, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_jackson_core_jackson_annotations_2_9_0.xml (632, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_jackson_core_jackson_core_2_9_6.xml (583, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_jackson_core_jackson_databind_2_9_6.xml (611, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_jackson_datatype_jackson_datatype_jdk8_2_9_6.xml (662, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_jackson_datatype_jackson_datatype_jsr310_2_9_6.xml (676, 2020-03-25)
.idea\libraries\Maven__com_fasterxml_jackson_module_jackson_module_parameter_names_2_9_6.xml (717, 2020-03-25)
.idea\libraries\Maven__com_github_jai_imageio_jai_imageio_core_1_3_1.xml (595, 2020-03-25)
.idea\libraries\Maven__com_google_code_findbugs_jsr305_1_3_9.xml (533, 2020-03-25)
.idea\libraries\Maven__com_google_code_gson_gson_2_8_5.xml (503, 2020-03-25)
.idea\libraries\Maven__com_google_errorprone_error_prone_annotations_2_1_3.xml (640, 2020-03-25)
.idea\libraries\Maven__com_google_guava_guava_23_6_jre.xml (515, 2020-03-25)
.idea\libraries\Maven__com_google_j2objc_j2objc_annotations_1_1.xml (575, 2020-03-25)
.idea\libraries\Maven__com_hankcs_hanlp_portable_1_7_0.xml (533, 2020-03-25)
.idea\libraries\Maven__com_huaban_jieba_analysis_1_0_2.xml (533, 2020-03-25)
.idea\libraries\Maven__com_jayway_jsonpath_json_path_2_4_0.xml (534, 2020-03-25)
.idea\libraries\Maven__com_lowagie_itext_2_1_7.xml (474, 2020-03-25)
.idea\libraries\Maven__com_squareup_okhttp3_okhttp_3_9_1.xml (517, 2020-03-25)
.idea\libraries\Maven__com_squareup_okio_okio_1_13_0.xml (498, 2020-03-25)
.idea\libraries\Maven__com_vaadin_external_google_android_json_0_0_20131108_vaadin1.xml (688, 2020-03-25)
.idea\libraries\Maven__com_zaxxer_HikariCP_2_7_9.xml (491, 2020-03-25)
.idea\libraries\Maven__commons_beanutils_commons_beanutils_1_9_2.xml (582, 2020-03-25)
.idea\libraries\Maven__commons_codec_commons_codec_1_11.xml (531, 2020-03-25)
.idea\libraries\Maven__commons_collections_commons_collections_3_2_1.xml (604, 2020-03-25)
.idea\libraries\Maven__commons_io_commons_io_2_4.xml (491, 2020-03-25)
.idea\libraries\Maven__commons_logging_commons_logging_1_2.xml (546, 2020-03-25)
.idea\libraries\Maven__commons_net_commons_net_3_6.xml (502, 2020-03-25)
.idea\libraries\Maven__dom4j_dom4j_1_6_1.xml (450, 2020-03-25)
.idea\libraries\Maven__io_micrometer_micrometer_core_1_0_6.xml (552, 2020-03-25)
... ...

# 解析百度指数的爬虫 / BaiduIndexSpider ## 当前版本(Version): + 1.0.1 + publish datetime:2019-05-16 15:23 ## 开源协议(LGPL, GNU Lesser General Public License ) + 基于MIT + 1.MIT是和BSD一样宽范的许可协议,作者只想保留版权,而无任何其它的限制。 + 即 你必须在你的发行版里包含原许可协议的声明,无论你是以二进制发布的还是以源代码发布的。 + 2.MIT协议又称麻省理工学院许可证,最初由麻省理工学院开发。 + 3.被授权人权利:1、被授权人有权利使用、复制、修改、合并、出版发行、散布、再授权及贩售软件及软件的副本。 + 4.被授权人可根据程式的需要修改授权条款为适当的内容。 + 5.被授权人义务:【在软件和软件的所有副本中都必须包含版权声明和许可声明】。 ## 基本原理(Basic Principle) + 基本原理 + 通过模拟鼠标滑动来解析百度指数网页的数据,存放于HTML页面中 + 通过ChromeDriver控制浏览器获取上一步骤中js解析生成的数据 + 针对上一版本进行改进 + 按照完全面向对象的程序设计方法将脚本模块化 + 含 Point,Mouse,BaiduIndex,BaiduIndexTask 四个类 + 采用了JavaScript ES6的语法,如class、let等。 + 本脚本仅支持 IE8+的浏览器 + 优先支持 Chrome 浏览器 + 针对不同电脑终端的屏幕尺寸,其(resolveBaiduIndexByJs.js)的鼠标坐标参数需要自行计算 + 项目特色:基本原理的实现过程即已提供了一种新的爬虫思路。 ## 依赖项(Dependency) + 安装浏览器(推荐:Chrome) + 安装 WebDriver(本项目推荐安装:ChromeDriver) + 安装完成后修改[BrowserDriverSpiderUtil.java](https://github.com/Johnny-ZTSD/BaiduIndexSpider/blob/master/src/main/java/cn/johnnyzen/util/spider/BrowserDriverSpiderUtil.java)中属性systemPropertyValueOfBrowser的WebDriver.exe的存放路径 + [注意] 最好是安装与浏览器版本相对应/兼容的 WebDriver ## 项目使用方式(Method of use) + 从本项目中下载核心[BaiduIndexService.java](https://github.com/Johnny-ZTSD/BaiduIndexSpider/blob/master/src/main/java/cn/johnnyzen/app/spider/BaiduIndexService.java)类 + 从本项目中下载该类中import所需要其他依赖类 ## 测试(Test) + [视频演示链接:百度网盘] + link: [https://pan.baidu.com/s/1iQbWHfT5_SKA3omK9nFgYg](https://pan.baidu.com/s/1iQbWHfT5_SKA3omK9nFgYg) + code: 5gpi ``` java @Test public void resolveBaiduIndexValuesTest() { String query = "北京房价"; // WebDriver webDriver =BaiduIndex.loadBaiduIndexPageByWebDriver(query); // System.out.println(webDriver.findElement(By.cssSelector("html")).getText()); Print.print(BaiduIndexService.resolveBaiduIndexValues(query));//获取解析的数据 } ``` ``` // output BaiduIndex{ keyword='北京房价', date=java.util.GregorianCalendar[time=1555372800000,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2019,MONTH=3,WEEK_OF_YEAR=16,WEEK_OF_MONTH=3,DAY_OF_MONTH=16,DAY_OF_YEAR=106,DAY_OF_WEEK=3,DAY_OF_WEEK_IN_MONTH=3,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0], indexValue=2569 } BaiduIndex{ keyword='北京房价', date=java.util.GregorianCalendar[time=1555459200000,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2019,MONTH=3,WEEK_OF_YEAR=16,WEEK_OF_MONTH=3,DAY_OF_MONTH=17,DAY_OF_YEAR=107,DAY_OF_WEEK=4,DAY_OF_WEEK_IN_MONTH=3,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0], indexValue=2311 } BaiduIndex{ keyword='北京房价', date=java.util.GregorianCalendar[time=1555545600000,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2019,MONTH=3,WEEK_OF_YEAR=16,WEEK_OF_MONTH=3,DAY_OF_MONTH=18,DAY_OF_YEAR=108,DAY_OF_WEEK=5,DAY_OF_WEEK_IN_MONTH=3,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0], indexValue=2318 } BaiduIndex{ keyword='北京房价', date=java.util.GregorianCalendar[time=1555632000000,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2019,MONTH=3,WEEK_OF_YEAR=16,WEEK_OF_MONTH=3,DAY_OF_MONTH=19,DAY_OF_YEAR=109,DAY_OF_WEEK=6,DAY_OF_WEEK_IN_MONTH=3,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0], indexValue=2207 } //... more BaiduIndex{ keyword='北京房价', date=java.util.GregorianCalendar[time=1557878400000,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="UTC",offset=0,dstSavings=0,useDaylight=false,transitions=0,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2019,MONTH=4,WEEK_OF_YEAR=20,WEEK_OF_MONTH=3,DAY_OF_MONTH=15,DAY_OF_YEAR=135,DAY_OF_WEEK=4,DAY_OF_WEEK_IN_MONTH=3,AM_PM=0,HOUR=0,HOUR_OF_DAY=0,MINUTE=0,SECOND=0,MILLISECOND=0,ZONE_OFFSET=0,DST_OFFSET=0], indexValue=2142 } ``` ## 作者(Author) + Johnny Zen + [github] https://github.com/Johnny-ZTSD ## 项目联系邮箱(Email) + johnnyztsd@gmail.com

近期下载者

相关文件


收藏者