crew.aws
所属分类:云计算
开发工具:R
文件大小:0KB
下载次数:0
上传日期:2023-10-11 17:33:08
上 传 者:
sh-1993
说明: AWS Batch的{crew}启动器插件,
(A {crew} launcher plugin for AWS Batch,)
文件列表:
.Rbuildignore (667, 2023-12-13)
.lintr (200, 2023-12-13)
CODE_OF_CONDUCT.md (5219, 2023-12-13)
CONTRIBUTING.md (3871, 2023-12-13)
DESCRIPTION (1612, 2023-12-13)
LICENSE (51, 2023-12-13)
LICENSE.md (1080, 2023-12-13)
NAMESPACE (819, 2023-12-13)
NEWS.md (39, 2023-12-13)
R/ (0, 2023-12-13)
R/crew_aws_batch_monitor.R (34447, 2023-12-13)
R/crew_controller_aws_batch.R (3605, 2023-12-13)
R/crew_launcher_aws_batch.R (21856, 2023-12-13)
R/crew_package.R (1279, 2023-12-13)
R/utils_aws_batch.R (2431, 2023-12-13)
R/utils_logic.R (359, 2023-12-13)
_pkgdown.yml (370, 2023-12-13)
crew.aws.batch.Rproj (303, 2023-12-13)
inst/ (0, 2023-12-13)
inst/WORDLIST (982, 2023-12-13)
man/ (0, 2023-12-13)
man/crew.aws.batch-package.Rd (1007, 2023-12-13)
man/crew_aws_batch_launch.Rd (561, 2023-12-13)
... ...
# crew.aws.batch: a crew launcher plugin for AWS Batch
[![CRAN](https://www.r-pkg.org/badges/version/crew.aws.batch)](https://CRAN.R-project.org/package=crew.aws.batch)
[![status](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#Active)
[![check](https://github.com/wlandau/crew.aws.batch/workflows/check/badge.svg)](https://github.com/wlandau/crew.aws.batch/actions?query=workflow%3Acheck)
[![codecov](https://codecov.io/gh/wlandau/crew.aws.batch/branch/main/graph/badge.svg)](https://app.codecov.io/gh/wlandau/crew.aws.batch)
[![lint](https://github.com/wlandau/crew.aws.batch/workflows/lint/badge.svg)](https://github.com/wlandau/crew.aws.batch/actions?query=workflow%3Alint)
In computationally demanding analysis projects, statisticians and data
scientists asynchronously deploy long-running tasks to distributed
systems, ranging from traditional clusters to cloud services. The
`crew.aws.batch` package extends the
[`mirai`](https://github.com/shikokuchuo/mirai)-powered ‘crew’ package
with a worker launcher plugin for [AWS
Batch](https://aws.amazon.com/batch/). Inspiration also comes from
packages [`mirai`](https://github.com/shikokuchuo/mirai),
[`future`](https://future.futureverse.org/),
[`rrq`](https://mrc-ide.github.io/rrq/),
[`clustermq`](https://mschubert.github.io/clustermq/), and
[`batchtools`](https://mllg.github.io/batchtools/).
# Installation
| Type | Source | Command |
|-------------|------------|--------------------------------------------------------------------------------|
| Release | CRAN | `install.packages("crew.aws.batch")` |
| Development | GitHub | `remotes::install_github("wlandau/crew.aws.batch")` |
| Development | R-universe | `install.packages("crew.aws.batch", repos = "https://wlandau.r-universe.dev")` |
# Documentation
Please see
for
documentation, including a full function reference and usage tutorial.
# Prerequisites
`crew.aws.batch` launches [AWS Batch](https://aws.amazon.com/batch/)
jobs to run [`crew`](https://wlandau.github.io/crew/) workers. This
comes with a set of special requirements:
1. Understand [AWS Batch](https://aws.amazon.com/batch/) and its
[official documentation](https://aws.amazon.com/batch/).
2. Your [job
definitions](https://docs.aws.amazon.com/batch/latest/userguide/job_definitions.html)
must each have [Docker](https://www.docker.com/)-compatible
container image with R and `crew.aws.batch` installed. You may wish
to inherit from an existing
[rocker](https://github.com/rocker-org/rocker-versioned2) image.
3. In the [compute
environment](https://docs.aws.amazon.com/batch/latest/userguide/compute_environments.html),
the [security
group](https://docs.aws.amazon.com/vpc/latest/userguide/security-groups.html)
must permit all inbound and outbound TCP traffic within itself.[^1]
The controller and the workers must run in this security group so
they can communicate within the firewalled local network.[^2] If
your security group ID is `sg-00000` and belongs to
[VPC](https://aws.amazon.com/vpc/) `vpc-00000`, then your inbound
and outbound rules may look something like this:
![](./man/figures/inbound.png)
![](./man/figures/outbound.png)
``` r
client <- paws.compute::ec2()
groups <- client$describe_security_groups(GroupIds = "sg-00000")
str(groups$SecurityGroups[[1L]])
#> List of 8
#> $ Description : chr "Allow TCP traffic on ephemeral ports"
#> $ GroupName : chr "self-pointing-group"
#> $ IpPermissions :List of 1
#> ..$ :List of 7
#> .. ..$ FromPort : num 1024
#> .. ..$ IpProtocol : chr "tcp"
#> .. ..$ IpRanges : list()
#> .. ..$ Ipv6Ranges : list()
#> .. ..$ PrefixListIds : list()
#> .. ..$ ToPort : num 65535
#> .. ..$ UserIdGroupPairs:List of 1
#> .. .. ..$ :List of 7
#> .. .. .. ..$ Description : chr "Accept traffic from other jobs in group."
#> .. .. .. ..$ GroupId : chr "sg-00000"
#> .. .. .. ..$ GroupName : chr(0)
#> .. .. .. ..$ PeeringStatus : chr(0)
#> .. .. .. ..$ UserId : chr "CENSORED"
#> .. .. .. ..$ VpcId : chr(0)
#> .. .. .. ..$ VpcPeeringConnectionId: chr(0)
#> $ OwnerId : chr "CENSORED"
#> $ GroupId : chr "sg-00000"
#> $ IpPermissionsEgress:List of 1
#> ..$ :List of 7
#> .. ..$ FromPort : num 1024
#> .. ..$ IpProtocol : chr "tcp"
#> .. ..$ IpRanges : list()
#> .. ..$ Ipv6Ranges : list()
#> .. ..$ PrefixListIds : list()
#> .. ..$ ToPort : num 65535
#> .. ..$ UserIdGroupPairs:List of 1
#> .. .. ..$ :List of 7
#> .. .. .. ..$ Description : chr "Allow traffic to other jobs in group."
#> .. .. .. ..$ GroupId : chr "sg-00000"
#> .. .. .. ..$ GroupName : chr(0)
#> .. .. .. ..$ PeeringStatus : chr(0)
#> .. .. .. ..$ UserId : chr "CENSORED"
#> .. .. .. ..$ VpcId : chr(0)
#> .. .. .. ..$ VpcPeeringConnectionId: chr(0)
#> $ Tags : list()
#> $ VpcId : chr "vpc-00000"
```
# Job management
With `crew.aws.batch`, your `crew` controller automatically submits jobs
to AWS Batch. These jobs may fail or linger for any number of reasons,
which could impede work and increase costs. So before you use
`crew_controller_aws_batch()`, please learn how to monitor and terminate
AWS Batch jobs manually.
`crew.aws.batch` defines a “monitor” class to help you take control of
jobs and job definitions. Create a monitor object with
`crew_aws_batch_monitor()`. You will need to supply a job definition
name and a job queue name.
``` r
monitor <- crew_aws_batch_monitor(
job_definition = "YOUR_JOB_DEFINITION_NAME",
job_queue = "YOUR_JOB_QUEUE_NAME"
)
```
The job definition may or may not exist at this point. If it does not
exist, you can register with `register()`, an oversimplified
limited-scope method which creates container-based job definitions with
the `"awslogs"` log driver (for CloudWatch).[^3] Below, your container
image can be as simple as a Docker Hub identifier (like
`"alpine:latest:`) or a full URI of an ECR image.[^4]
``` r
monitor$register(
image = "AWS_ACCOUNT_ID.dkr.ecr.AWS_REGION.amazonaws.com/ECR_REPOSITORY_NAME:IMAGE_TAG",
platform_capabilities = "EC2",
memory_units = "gigabytes",
memory = 8,
cpus = 2
)
```
You can submit individual AWS Batch jobs to test your computing
environment.
``` r
job1 <- monitor$submit(name = "job1", command = c("echo", "hello\nworld"))
job2 <- monitor$submit(name = "job2", command = c("echo", "job\nsubmitted"))
job2
#> # A tibble: 1 × 3
#> name id arn
#>
#> 1 job2 c38d55ad-4a86-4371-9994-6ea8882f5726 arn:aws:batch:us-east-2:0…
```
Method `status()` checks the status of an individual job.
``` r
monitor$status(id = job2$id)
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#>
#> 1 job2 c38d55ad-4a86-43… arn:… runnable NA 1.70e12 NA NA
```
The `jobs()` method gets the status of all the jobs within the job queue
and job definition you originally supplied to
`crew_aws_batch_monitor()`. This may include many more jobs than the
ones you submitted during the life cycle of the current `monitor`
object.
``` r
monitor$jobs()
#> # A tibble: 2 × 8
#> name id arn status reason created started stopped
#>
#> 1 job1 653df636-ac74-43… arn:… succeeded Essen… 1.70e12 1.70e12 1.70e12
#> 2 job2 c38d55ad-4a86-43… arn:… runnable NA 1.70e12 NA NA
```
The [job
state](https://docs.aws.amazon.com/batch/latest/userguide/job_states.html)
can be `"submitted"`, `"pending"`, `"runnable"`, `"starting"`,
`"running"`, `"succeeded"`, or `"failed"`. The monitor has a method for
each job state to get only the jobs with that state.
``` r
monitor$succeeded()
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#>
#> 1 job1 653df636-ac74-43… arn:… succeeded NA 1.70e12 1.70e12 1.70e12
```
In addition, there is an `active()` method for just states
`"submitted"`, `"pending"`, `"runnable"`, `"starting"`, and `"running"`,
and there is an `inactive()` method for just the `"succeeded"` and
`"failed"` states.
``` r
monitor$inactive()
#> # A tibble: 1 × 8
#> name id arn status reason created started stopped
#>
#> 1 job1 653df636-ac74-43… arn:… succeeded NA 1.70e12 1.70e12 1.70e12
```
To terminate a job, use the `terminate()` method. This has the effect of
both canceling and terminating the job, although you may not see the
change right away if the job is currently `"runnable"`. Manually
terminated jobs are listed as failed.
``` r
monitor$terminate(id = job2$id)
```
To get the CloudWatch logs of a job, use the `log()` method. This method
returns a `tibble` with the log messages and numeric timestamps.
``` r
log <- monitor$log(id = job1$id)
log
#> # A tibble: 2 × 3
#> message timestamp ingestion_time
#>
#> 1 hello 1702068378163 1702068378245
#> 2 world 1702068378163 1702068378245
```
If the log messages are too long to conveniently view in the `tibble`,
you can print them to your screen with `cat()` or `writeLines()`.
``` r
writeLines(log$message)
#> hello
#> world
```
# Using `crew` with AWS Batch workers
To start using `crew.aws.batch` in earnest, first create a controller
object. Also supply the names of your job queue and job definition, as
well as any optional flags and settings you may need. If you do not
already have a job definition, the “monitor” object above can help you
create one (see above).
``` r
library(crew.aws.batch)
controller <- crew_controller_aws_batch(
name = "my_workflow", # for informative job names
workers = 16,
tasks_max = 2, # to avoid reaching wall time limits
seconds_launch = 600, # to allow a 10-minute startup window
seconds_idle = 60, # to release resources when they are not needed
processes = NULL, # See the "Asynchronous worker management" section below.
aws_batch_job_definition = "YOUR_JOB_DEFINITION_NAME",
aws_batch_job_queue = "YOUR_JOB_QUEUE_NAME"
)
controller$start()
```
At this point, usage is exactly the same as basic
[`crew`](https://wlandau.github.io/crew). The `push()` method submits
tasks and auto-scales [AWS Batch](https://aws.amazon.com/batch/) workers
to meet demand.
``` r
controller$push(name = "do work", command = do_work())
```
The `pop()` method retrieves available tasks.
``` r
controller$pop()
#> # A tibble: 1 × 11
#> name command result seconds seed error trace warni… launc… worker insta…
#>
#> 1 do work … do_work… 0 1.56e8 NA NA NA 79e71c… 1 7686b2…
#> # … with abbreviated variable names warnings, launcher, instance
```
Remember to terminate the controller when you are done.
``` r
controller$terminate()
```
# Asynchronous worker management
HTTP requests to submit and terminate jobs may take up to 1 or 2
seconds, and this overhead may be burdensome if there are many workers.
To run these requests asynchronously, set the `processes` argument of
`crew_controller_aws_batch()` to the number of local `mirai` daemons you
want to process the requests. These processes will start on
`controller$start()` and end on `controller$terminate()` or when your
local R session ends. `controller$launcher$async$errors()` shows the
most recent error messages generated on launch or termination for all
workers.
# Troubleshooting
`processes = NULL` disables async and makes launch/termination errors
immediate and easier to see. You may also wish to set
`options(paws.log_level = 3L)` to increase the verbosity of `paws`
messages.
# Thanks
- [Charlie Gao](https://github.com/shikokuchuo) created
[`mirai`](https://github.com/shikokuchuo/mirai) and
[`nanonext`](https://github.com/shikokuchuo/nanonext) and graciously
accommodated the complicated and demanding feature requests that made
`crew` and its ecosystem possible.
- Thanks to [Henrik Bengtsson](https://github.com/HenrikBengtsson),
[David Kretch](https://github.com/davidkretch), [Adam
Banker](https://github.com/adambanker), and [Michael
Schubert](https://github.com/mschubert) for edifying conversations
about cloud computing in R.
# Code of Conduct
Please note that the `crew` project is released with a [Contributor Code
of
Conduct](https://github.com/wlandau/crew/blob/main/CODE_OF_CONDUCT.md).
By contributing to this project, you agree to abide by its terms.
# Citation
``` r
citation("crew.aws.batch")
To cite package 'crew.aws.batch' in publications use:
Landau WM (????). _crew.aws.batch: A Crew Launcher Plugin for AWS
Batch_. R package version 0.0.1,
https://github.com/wlandau/crew.aws.batch,
.
A BibTeX entry for LaTeX users is
@Manual{,
title = {crew.aws.batch: A Crew Launcher Plugin for AWS Batch},
author = {William Michael Landau},
note = {R package version 0.0.1,
https://github.com/wlandau/crew.aws.batch},
url = {https://wlandau.github.io/crew.aws.batch/},
}
```
[^1]: If you already know the TCP port you will supply to `port`
argument of `crew_controller_aws_batch()`, you can restrict the port
range to only use that port number.
[^2]: Please read about the
[risks](https://wlandau.github.io/crew/articles/risks.html) and keep
TLS encryption turned on (default:
`tls = crew_tls(mode = "automatic")`). Please understand and comply
with all the security policies of your organization.
[^3]: The log group supplied to `crew_aws_batch_monitor()` must be
valid. The default is `"/aws/batch/log"`, which may not exist if
your system administrator has a custom logging policy.
[^4]: For the `crew` controller, you will definitely want an image with
R and `crew` installed. For the purposes of testing the monitor,
`"alpine:latest"` will work.
近期下载者:
相关文件:
收藏者: