wechat-crawlers 联合开发网

Pudn.com > 下载中心 > 微信小程序 > wechat-crawlers

wechat-crawlers

所属分类：微信小程序
开发工具：Python
文件大小：0KB
下载次数：0
上传日期：2023-06-27 05:37:41
上传者：sh-1993

说明：微信数据“爬取”，借助[uiautomator2](<a href="https://github.com/openatx/uiautomator2#stop-an- app)%E5%8A%A0Android%E7%9C%9F%E6%9C%BA%E5%AE%9E%E7%8E%B0%E3%80%82">https://github.com/openatx/uiautomator2#stop- an-app)加Android真机实现。</a> , stars:8, update:2020-07-15 07:19:19

文件列表:

CONFIG.ini (364, 2020-07-15)
connect-test.py (518, 2020-07-15)
doc/ (0, 2020-07-15)
doc/just-do-it.md (1195, 2020-07-15)
example/ (0, 2020-07-15)
example/connect-test.png (130833, 2020-07-15)
example/result-1594507938241.jpg (44776, 2020-07-15)
example/result-1594507943103.jpg (160204, 2020-07-15)
example/result-1594507948130.jpg (28243, 2020-07-15)
example/result-1594507953723.jpg (6174, 2020-07-15)
example/result.txt (4069, 2020-07-15)
example/wc_moment_list.txt (5616, 2020-07-15)
example/wc_moment_list_wordcloud.txt (2815, 2020-07-15)
example/wc_moment_list_wordcloud_bak.txt (2815, 2020-07-15)
example/weditor.png (158412, 2020-07-15)
requirement.txt (110, 2020-07-15)
res/ (0, 2020-07-15)
res/baidu_stopwords.txt (9131, 2020-07-15)
res/cn_stopwords.txt (4717, 2020-07-15)
res/hit_stopwords.txt (5273, 2020-07-15)
res/images/ (0, 2020-07-15)
res/images/baidu.jpg (40427, 2020-07-15)
res/images/juejin.jpg (48306, 2020-07-15)
res/images/moments.jpg (57808, 2020-07-15)
res/images/wechat.jpg (4705, 2020-07-15)
res/scu_stopwords.txt (7597, 2020-07-15)
res/simhei.ttf (10044356, 2020-07-15)
res/stopwords.txt (30600, 2020-07-15)
wc_moments.py (11947, 2020-07-15)
wc_wordcloud.py (8377, 2020-07-15)

# WeChat Crawlers Just some crawlers for WeChat by [uiautomator2](https://github.com/openatx/uiautomator2). 一次小小的尝试，借助跨平台自动化测试工具[uiautomator2](https://github.com/openatx/uiautomator2)结合[weditor](https://github.com/openatx/weditor),实现微信相关的数据爬取，结合数据及数据分析实践朋友圈云词生成等应用。本仓库使用**windows**环境搭配**Android真机**来实现抓取特定**APP**如(微信)的数据，所以您可能需要安装**Python 3.7+**、**Java 8**、**Node.js 10+**已及**Android SDK**等必要环境或依赖，由于笔者工作需要使用**React Native**开发**APP**，所以已安装上述环境环境或依赖，此处不做过多的介绍,请参考**[just-do-it.md](./doc/just-do-it.md)**。 ## 效果可查看**example**目录下文件，**wc_moment_list.txt**和**wc_moment_list_wordcloud.txt**分别为爬取的源数据和处理过后的数据，**result-XX.jpg**为输出的词云结果。**wc_moment_list_wordcloud_bak.txt**的生成是因为我发现源数据中存在一些无意义的数据，导致结果不准确，修正数据之后修改**needManual**配置，重新运行生成新的结果同时不存了上一次的结果。 | moments | juejin |baidu | | ----------------------------------- | ------------------------------------------ |--------------- | | ![moments](https://cdn.jsdelivr.net/gh/hu-qi/wechat-crawlers/example/result-1594507948130.jpg) | ![juejin](https://cdn.jsdelivr.net/gh/hu-qi/wechat-crawlers/example/result-1594507943103.jpg) |![baidu](https://cdn.jsdelivr.net/gh/hu-qi/wechat-crawlers/example/result-1594507938241.jpg) | ## 配置说明 ``` [APPINFO] platformName = 'Android' # 平台 deviceName = '621da8320804' # 设备名称，通过adb devices获取 appPackage = 'com.tencent.mm' # APP包名 appActivity = '.ui.LauncherUI' # APP入口 [WECHAT] wechatUser = '******' # 微信账号(最好手机号) wechatPWd = '**********' # 微信密码 wechatWho = 'Hugi66' # 目标用户微信账号 [WORDCLOUD] needManual= # 是否需要人工修改源数据 stopwords=stopwords # 停词文件默认采用stopwords.txt wordsCount=180 # 显示词的数量，默认180 inputImg=juejin # 单张模板图片，默认juejin.jpg fontFamily=simhei.ttf # 字体，默认simhei.ttf MODE=single # 生成模式，单张或全部，默认single，可选all ``` ## wc_moments.py 朋友圈数据抓取。先借助[https://github.com/openatx/weditor](https://github.com/openatx/weditor)开启一个可在浏览器端访问**APP**内部节点的服务，通过获取节点、编辑atx脚本等为**APP**的一些自动化脚本做准备,再使用[uiautomator2](https://github.com/openatx/uiautomator2)来执行具体的流程并获取、储存数据。 ## wc_wordcloud.py 词云生成。本项目中默认处理上一步获取的朋友圈数据，根据词频等生成关键词，也可单独使用。默认背景图片存放在**res/images**目录，单独使用时需新建**data**目录并存放**wc_moment_list.txt**，可指定背景图片或全部生成。 ## TODO - [x] 抓取朋友圈数据 - [x] 生成关键词词云 - [ ] 滑动拼图验证码 ## 公众号-胡琦 ![huqi](https://www.fashaoge.com/img/weixinCode.jpg)

近期下载者：

相关文件：

评论：[我要评论] [举报此文件]

收藏者：