data-hub-airflow-image
所属分类:其他
开发工具:Makefile
文件大小:0KB
下载次数:0
上传日期:2023-10-03 15:45:23
上 传 者:
sh-1993
说明: 创建用于运行数据中心管道的自定义气流映像,
(create custom airflow image used for running data-hub pipeline,)
文件列表:
Dockerfile (827, 2023-12-18)
Jenkinsfile (1795, 2023-12-18)
Jenkinsfile.update-repo-list (594, 2023-12-18)
Makefile (1901, 2023-12-18)
clone.sh (443, 2023-12-18)
maintainers.txt (20, 2023-12-18)
repo-list.json (746, 2023-12-18)
requirements.build.txt (25, 2023-12-18)
requirements.txt (35, 2023-12-18)
scripts/ (0, 2023-12-18)
scripts/install_dag_in_docker.sh (580, 2023-12-18)
# eLife's Data Hub Airflow Image
Create custom airflow image which is deployed in K8s and used for running data hub data pipelines.
The image created contains a set of airflow dag which are installed in the image after their git repos (listed in the `repo-list.json`) are cloned and copied into the docker image.
## How the image is created
The [clone.sh](https://github.com/elifesciences/data-hub-airflow-image/blob/master/clone.sh) clones each of the git repos specified in the [repo-list.json](https://github.com/elifesciences/data-hub-airflow-image/blob/master/repo-list.json) file into the directory specified for the particular repo
in the `repo-list.json`.
During the image build,
- The cloned git repos are copied over into the docker image, as well as the scripts in `scripts`.
- The script `scripts/install_dag_in_docker.sh` recursively invokes the `install.sh` scripts in the root directory of each of the cloned data pipeline git repos.
The `install.sh` in each of the cloned git repos should
- Install the required python packages for the data pipeline
- Copy over the dags files into appropriate dags directory,
- Copy over the data pipeline application files to the appropriate directory
- install the cloned data pipeline application as a python package
- The `scripts/worker.sh` is used to create web server that can be used to serve log files created by airflow task execution workers.
This should be run when the worker pod/docker container is created
## CI/CD
Points to note:
- Every merge to the `develop` branch creates and pushes an image to Docker Hub.
- It also triggers another CI pipeline that re-deploys the running data-hub application in the staging environment using this latest created image
- To create and deploy an image into data hub production environment, create a release of the github repo.
- The git commit ref for each of the repo in the repo list is typically updated by another CI pipeline which is expected to be invoked whenever there is a merge to `develop` branch in each of the data pipelines git repos.
近期下载者:
相关文件:
收藏者: