data-hub-airflow-image

所属分类:其他
开发工具:Makefile
文件大小:0KB
下载次数:0
上传日期:2023-10-03 15:45:23
上 传 者sh-1993
说明:  创建用于运行数据中心管道的自定义气流映像,
(create custom airflow image used for running data-hub pipeline,)

文件列表:
Dockerfile (827, 2023-12-18)
Jenkinsfile (1795, 2023-12-18)
Jenkinsfile.update-repo-list (594, 2023-12-18)
Makefile (1901, 2023-12-18)
clone.sh (443, 2023-12-18)
maintainers.txt (20, 2023-12-18)
repo-list.json (746, 2023-12-18)
requirements.build.txt (25, 2023-12-18)
requirements.txt (35, 2023-12-18)
scripts/ (0, 2023-12-18)
scripts/install_dag_in_docker.sh (580, 2023-12-18)

# eLife's Data Hub Airflow Image Create custom airflow image which is deployed in K8s and used for running data hub data pipelines. The image created contains a set of airflow dag which are installed in the image after their git repos (listed in the `repo-list.json`) are cloned and copied into the docker image. ## How the image is created The [clone.sh](https://github.com/elifesciences/data-hub-airflow-image/blob/master/clone.sh) clones each of the git repos specified in the [repo-list.json](https://github.com/elifesciences/data-hub-airflow-image/blob/master/repo-list.json) file into the directory specified for the particular repo in the `repo-list.json`. During the image build, - The cloned git repos are copied over into the docker image, as well as the scripts in `scripts`. - The script `scripts/install_dag_in_docker.sh` recursively invokes the `install.sh` scripts in the root directory of each of the cloned data pipeline git repos. The `install.sh` in each of the cloned git repos should - Install the required python packages for the data pipeline - Copy over the dags files into appropriate dags directory, - Copy over the data pipeline application files to the appropriate directory - install the cloned data pipeline application as a python package - The `scripts/worker.sh` is used to create web server that can be used to serve log files created by airflow task execution workers. This should be run when the worker pod/docker container is created ## CI/CD Points to note: - Every merge to the `develop` branch creates and pushes an image to Docker Hub. - It also triggers another CI pipeline that re-deploys the running data-hub application in the staging environment using this latest created image - To create and deploy an image into data hub production environment, create a release of the github repo. - The git commit ref for each of the repo in the repo list is typically updated by another CI pipeline which is expected to be invoked whenever there is a merge to `develop` branch in each of the data pipelines git repos.

近期下载者

相关文件


收藏者