Correted-ProgrammableWeb-dataset

所属分类:数据挖掘/数据仓库
开发工具:Jupyter Notebook
文件大小:11429KB
下载次数:0
上传日期:2021-09-27 02:52:22
上 传 者sh-1993
说明:  可编程Web服务生态系统的数据校正和演化分析
(Data Correction and Evolution Analysis of the ProgrammableWeb Service Ecosystem)

文件列表:
crawler (0, 2021-09-27)
crawler\__init__.py (0, 2021-09-27)
crawler\crawler.ipynb (80945, 2021-09-27)
crawler\data_objects.py (4294, 2021-09-27)
crawler\mashup_crawler.ipynb (31393, 2021-09-27)
data (0, 2021-09-27)
data\api_nodes_estimator.csv (2701651, 2021-09-27)
data\m-a_edges.csv (525331, 2021-09-27)
data\mashup_nodes_estimator.csv (898033, 2021-09-27)
data\raw (0, 2021-09-27)
data\raw\accessibility (0, 2021-09-27)
data\raw\accessibility\api_accessibility (0, 2021-09-27)
data\raw\accessibility\api_accessibility\api_version_accessbiliby-1.txt (6175036, 2021-09-27)
data\raw\accessibility\api_accessibility\api_version_accessbiliby-2.txt (5673184, 2021-09-27)
data\raw\accessibility\api_accessibility\api_version_accessbiliby-3.txt (5142386, 2021-09-27)
data\raw\accessibility\api_accessibility\api_version_accessbiliby-4.txt (4539876, 2021-09-27)
data\raw\accessibility\api_accessibility\api_version_accessbiliby-5.txt (1916781, 2021-09-27)
data\raw\accessibility\mashup_accessibility.txt (954901, 2021-09-27)
data\raw\all_pairs.txt (1509821, 2021-09-27)
data\raw\api_mashup (0, 2021-09-27)
data\raw\api_mashup\active_apis_data.txt (20150975, 2021-09-27)
data\raw\api_mashup\active_mashups_data.txt (17845491, 2021-09-27)
data\raw\api_mashup\deadpool_apis_data.txt (1879907, 2021-09-27)
data\raw\api_mashup\deadpool_mashups_data.txt (4302091, 2021-09-27)
data\split_nodes.csv (2030, 2021-09-27)
data\tranfer_nodes.csv (699, 2021-09-27)
visualization (0, 2021-09-27)
visualization\visualization.ipynb (6192, 2021-09-27)
visualization\visualization2.ipynb (219166, 2021-09-27)
visualization\visualization3.ipynb (176521, 2021-09-27)

# Corrected ProgrammableWeb dataset ## Introduction This repo contains the data and utils used in *Data Correction and Evolution Analysis of the ProgrammableWeb Service Ecosystem*. Here is the content of each folder: - `crawler`: The crawlers to get the data from ProgrammableWeb, including API crawler and Mashup crawler. - `data`: The formatted datasets which are used in the paper are listed here. Its subfolder `raw` contains the raw data fetched by the crawlers in json type. - `visualization` folder contains some of the statistics results and visualization process. ## Data We put our raw and corrected datasets in the `data` folder. ### Formatted Dataset The raw data has been formatted in csv type. Some universal column names and their meanings are as follows: | Column name | Meaning | | ----------- | ---------------------------- | | tp | Type. API or Mashup | | url | The URL of an API or Mashup | | name | The API or Mashup's title | | st | Submit date | | et | Corrected dead date | | oet | Dead date provided in PW | | c | Category | | oac | Corrected accessibility | | ac | Accessibility provided in PW | - `api_nodes_estimator.csv` is the API data. Each line represents an API. - `m-a_edges.csv` is the data of Mashups invoking APIs. - `mashup_nodes_estimator.csv` is the Mashup data. - `split_nodes.csv` is the split API data. - `transfer_nodes.csv` is the transferred API data. ### Raw data - `active_apis_data.txt`, `deadpool_apis_data.txt`, `active_mashups_data.txt`, `deadpool_mashups_data.txt` are APIs and Mashups that are marked as active or dead on the ProgrammableWeb. - `accessibility` subfolder contains APIs' accessibility and Mashups' accessibility. These data are collected by visiting the Homepage URL or API Endpoint or API Portal. With the dataset we can know if an service is still working. - `all_pairs.txt`: A Mashup may invoke many APIs, and we collect the co-work APIs data and the frequencies as API pairs. ### Citation If you make use of this code, we appreciate it if you can cite our paper as follows: ``` @article{liu2021data, title={Data correction and evolution analysis of the ProgrammableWeb service ecosystem}, author={Liu, Mingyi and Tu, Zhiying and Zhu, Yeqi and Xu, Xiaofei and Wang, Zhongjie and Sheng, Quan Z}, journal={Journal of Systems and Software}, pages={111066}, year={2021}, publisher={Elsevier} } ```

近期下载者

相关文件


收藏者