1684039001
所属分类:其他
开发工具:Others
文件大小:38KB
下载次数:0
上传日期:2023-10-11 10:07:36
上 传 者:
rainjack
说明: 2018年国际AIOps挑战赛KPI时序异常检测比赛基于OpenMLDB部署的工程化部署实践方案
(this is open source code no need pay money you can use it by free.)
文件列表:
code (0, 2023-05-14)
code\netman2018kpianomalydetection (0, 2023-05-14)
code\netman2018kpianomalydetection\src (0, 2023-05-14)
code\netman2018kpianomalydetection\src\compute_offline_feats.py (2466, 2023-05-14)
code\netman2018kpianomalydetection\src\metrics (0, 2023-05-14)
code\netman2018kpianomalydetection\src\metrics\metrics.py (7376, 2023-05-14)
code\netman2018kpianomalydetection\src\test (0, 2023-05-14)
code\netman2018kpianomalydetection\src\test\test_pipeline.py (2969, 2023-05-14)
code\netman2018kpianomalydetection\src\test\test_online_deployments.py (3740, 2023-05-14)
code\netman2018kpianomalydetection\src\preprocessing_train_test.py (5448, 2023-05-14)
code\netman2018kpianomalydetection\src\create_offline_table.py (2727, 2023-05-14)
code\netman2018kpianomalydetection\src\utils (0, 2023-05-14)
code\netman2018kpianomalydetection\src\utils\configs.py (3404, 2023-05-14)
code\netman2018kpianomalydetection\src\utils\metric_utils.py (6809, 2023-05-14)
code\netman2018kpianomalydetection\src\utils\triton_utils.py (3861, 2023-05-14)
code\netman2018kpianomalydetection\src\utils\io_utils.py (5787, 2023-05-14)
code\netman2018kpianomalydetection\src\utils\logger.py (4557, 2023-05-14)
code\netman2018kpianomalydetection\src\deploy_triton_server.py (1041, 2023-05-14)
code\netman2018kpianomalydetection\src\start_openmldb_cluster.sh (565, 2023-05-14)
code\netman2018kpianomalydetection\src\train_xgb.py (8040, 2023-05-14)
code\netman2018kpianomalydetection\src\deploy_inference_pipeline.py (4612, 2023-05-14)
code\netman2018kpianomalydetection\src\sql_feature_engineering.sql (1324, 2023-05-14)
code\netman2018kpianomalydetection\src\explore_data.ipynb (42613, 2023-05-14)
code\netman2018kpianomalydetection\src\.vscode (0, 2023-05-14)
code\netman2018kpianomalydetection\src\.vscode\launch.json (309, 2023-05-14)
code\netman2018kpianomalydetection\src\start_triton_xgb_server.sh (850, 2023-05-14)
code\netman2018kpianomalydetection\src\deploy_realtime_fe.py (2793, 2023-05-14)
## 2018年AIOps国际挑战赛KPI异常检测工程化部署方法实践
---
### **作者简介**
作者:鱼丸粗面(zhuoyin94@163.com)。整体采用了[此项目](https://github.com/MichaelYin1994/python-style-guide)的编码规范,逐步完善中(20220617)。
---
### **系统环境与组件依赖**
**系统环境:**
- Ubuntu 20.04 LTS
- GPU: NVIDIA Corporation GP104GL [Quadro P5000](16G)
- CPU: Intel CoreTM i9-9920X CPU @ 3.50GHz × 24
- RAM: 94G
- CUDA: 11.4
- swap: 96G
**开源组件:**
- OpenMLDB: 用于stream形式数据的实时特征工程。
- cAdivisor: 用于容器状态监控。
- Triton inference server: 用于serving XGBoost模型与DL-based模型。
- Prometheus + Grafana: 用于容器状态的可视化监控。
---
### **源代码说明**
- **start_openmldb_cluster.sh**:启动openmldb的docker container service,注意提前修改`/work/openmldb/conf/taskmanager.properties`的`spark.master`的local线程数,采用多线程写入。
- **preprocessing_train_test.py**:对原始比赛数据进行重新预处理、数据切分、重新存储。
- **create_offline_table.py**:创建离线表 && 将离线数据导入数据库中。
- **compute_offline_feats.py**:读取sql脚本 && 执行sql脚本,创建离线特征组。
- **train_xgb.py**:读取离线特征组 && 训练XGBoost模型 && 导出模型基本参数。
- **deploy_realtime_fe.py**:读取sql脚本(与offline feats相同) && 创建在线表 && 导入在线数据 && 部署在线特征脚本。
- **deploy_triton_server.py && start_triton_xgb_server.sh**:生成Inference Server的配置文件 && 部署Triton Inference Server后端。
- **deploy_inference_pipeline.py**:Flask部署Inference Pipeline,Flask Server接收数据 --> Preprocessing --> Openmldb特征工程 && 实时数据插入 --> Postprocessing --> 返回inference结果。
### **Todo List**
- 配置的yaml文件进行统一管理
- XGBoost Early Stopping的官方Metric的njit实现
- 单元测试部分对于部署的特征工程的正确性测试
- Triton inference server的XGBoost模型inference测试
- 刨除Openmldb的window特征,特征工程对原始数据的前处理和后处理脚本部分
---
### **References**
[1] https://github.com/MichaelYin1994/tianchi-pakdd-aiops-2021
[2] Lam, Siu Kwan, Antoine Pitrou, and Stanley Seibert. "Numba: A llvm-based python jit compiler." Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC. 2015.
[3] https://github.com/johannfaouzi/pyts
[4] https://github.com/blue-yonder/tsfresh
[5] Goldstein M, Dengel A. Histogram-based outlier score (hbos): A fast unsupervised anomaly detection algorithm[J]. KI-2012: Poster and Demo Track, 2012: 59-63.
[6] Bu, Jiahao, et al. "Rapid deployment of anomaly detection models for large number of emerging kpi streams." 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 2018.
[7] Ma M, Zhang S, Pei D, et al. Robust and rapid adaption for concept drift in software system anomaly detection[C]//2018 IEEE 29th International Symposium on Software Reliability Engineering (ISSRE). IEEE, 2018: 13-24.
[8] Li, Zhihan, et al. "Robust and rapid clustering of kpis for large-scale anomaly detection." 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). IEEE, 2018.
[9] Li, Zeyan, Wenxiao Chen, and Dan Pei. "Robust and unsupervised kpi anomaly detection based on conditional variational autoencoder." 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 2018.
[10] Liu, Dapeng, et al. "Opprentice: Towards practical and automatic anomaly detection through machine learning." Proceedings of the 2015 Internet Measurement Conference. 2015.
[11] Zhao, Nengwen, et al. "Label-less: A semi-automatic labelling tool for kpi anomalies." IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2019.
近期下载者:
相关文件:
收藏者: