Create a Task

Tasks allow the same assembler to be run with different combinations of parameters. You can also think of a task as a command bundle that groups different command line flags together into a simpler interface. This guide describes using tasks in your image.

The a biobox interface requires that a container be called with a task parameter. This follows the name of the biobox when using docker run.

docker run [OPTIONS] BIOBOX_NAME TASK

You should use tasks to describe the different ways your software can be run. Using genome assembly as an example, there might be a way to run an assembler that create large contigs but at the same time may contain errors. There may also be a separate way to run an assembler more carefully which results in smaller more correct contigs. These different ways of running the same software are the reason tasks are used.

Each biobox should provide a default task which should be the set of command line flags that work best in most situations. The task should later be provided to your run command:

Example

In this velvet example we'll create a script that provides two tasks. We'll define this in a file called Taskfile. It is important for the next sections that you place this file in the same directory you have placed the Dockerfile of the preceding section. You can see the default task contains the commands to run velvet along with environment variables.

default: velveth ${TMP_DIR} 31 -fastq.gz ${READS} && velvetg ${TMP_DIR} -cov_cutoff auto
careful: velveth ${TMP_DIR} 91 -fastq.gz ${READS} && velvetg ${TMP_DIR} -cov_cutoff 10

The second task has a larger kmer size and sets a low assembly coverage cutoff. The Taskfile is added in your Dockerfile with an ADD command and now looks like this:

FROM ubuntu:14.04
MAINTAINER Michael Barton, mail@michaelbarton.me.uk

ENV PACKAGES make gcc wget libc6-dev zlib1g-dev ca-certificates xz-utils
RUN apt-get update -y && apt-get install -y --no-install-recommends ${PACKAGES}

ENV ASSEMBLER_DIR /tmp/assembler
ENV ASSEMBLER_URL https://www.ebi.ac.uk/~zerbino/velvet/velvet_1.2.10.tgz
ENV ASSEMBLER_BLD make 'MAXKMERLENGTH=100' && mv velvet* /usr/local/bin/ && rm -r ${ASSEMBLER_DIR}

RUN mkdir ${ASSEMBLER_DIR}
RUN cd ${ASSEMBLER_DIR} &&\
    wget --quiet ${ASSEMBLER_URL} --output-document - |\
    tar xzf - --directory . --strip-components=1 && eval ${ASSEMBLER_BLD}

# Locations for biobox file validator
ENV VALIDATOR /bbx/validator/
ENV BASE_URL https://s3-us-west-1.amazonaws.com/bioboxes-tools/validate-biobox-file
ENV VERSION  0.x.y
RUN mkdir -p ${VALIDATOR}

# download the validate-biobox-file binary and extract it to the directory $VALIDATOR
RUN wget \
      --quiet \
      --output-document -\
      ${BASE_URL}/${VERSION}/validate-biobox-file.tar.xz \
    | tar xJf - \
      --directory ${VALIDATOR} \
      --strip-components=1

ENV PATH ${PATH}:${VALIDATOR}

# download the assembler schema
RUN wget \
    --output-document /schema.yaml \
    https://raw.githubusercontent.com/bioboxes/rfc/master/container/short-read-assembler/input_schema.yaml

ENV CONVERT https://github.com/bronze1man/yaml2json/raw/master/builds/linux_386/yaml2json
# download yaml2json and make it executable
RUN cd /usr/local/bin && wget --quiet ${CONVERT} && chmod 700 yaml2json

ENV JQ http://stedolan.github.io/jq/download/linux64/jq
# download jq and make it executable
RUN cd /usr/local/bin && wget --quiet ${JQ} && chmod 700 jq

# Add Taskfile to /
ADD Taskfile /

In the next section you will see how you can access the task with a simple shell command.