Flowbster

Autodock vina

In this case we have used Flowbster to set up the infrastructure for processing the Vina workflow. The setup is as follows: one VM is acting as the Generator, 5 VMs are acting as Vina processing nodes, and finally one VM is acting as the Collector node.

The application used to execute the performance measurements was a workflow based on the AutoDock Vina application. The workflow consists of three nodes: a Generator, a set of Vina processing nodes, and a Collector. The input of the workflow includes the followings: a receptor molecule, a Vina configuration file, and a set of molecules to dock against the receptor molecule.

The task of the generator node is to split the set of molecules to dock into a number of parts. The task of the Vina nodes is to process this parts, iterating through each molecule in the given part, by performing the docking simulation. The result of the docking includes an energy level, finally the user is interested in the docking with the lowest energy level.

The task of the Collector node is to get the processing result of each molecule part from the Vina nodes, and select the best 5 energy levels.

For running the experiment, we selected a molecule set of 60 molecules. This set was split into 10 parts, so each part included 6 molecules to dock against the receptor molecule.

Features

  • creating nodes through contextualisation

  • using the ec2 resource handler

  • utilising health check against a predefined port and url

  • using parameters to scale up worker nodes

Prerequisites

  • accessing an Occopus compatible interface

  • target cloud contains an Ubuntu 14.04 image with cloud-init support

Download

You can download the example as tutorial.examples.flowbster-autodock-vina .

Steps

The following steps are suggested to be performed:

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the flowbster_node labelled by node_def:.

    The downloadable package for this example contains a resource template for the ec2 plugin.

  2. Make sure your authentication information is set correctly in your authentication file. You must set your email and password in the authentication file. Setting authentication information is described here.

  3. Components in the infrastructure connect to each other, therefore several port ranges must be opened for the VMs executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc.). Make sure you implement port opening in your cloud for the following port:

    Protocol

    Port(s)

    Service

    TCP

    5000

    This is used by nodes to handle incoming requests from other agents

  4. Please note that in order to receive the results, you have to run a Gather service (part of Flowbster), which will finally gather the results (the docking simulations with the lowest energy levels) from the Collector (last node in the workflow). Start the Gather service using the following command:

    scripts/flowbster-gather.sh -s
    

    By default the Gather service is listening on port 5001.

    Note

    The scripts in the scripts directory need Python 2.7. Alternatively you can activate the Occopus virtualenv!

  5. Edit the “variables” section of the infra-autodock-vina.yaml file. Set the following attributes:

    • gather_ip is the ip address of the host where you have started the Gather service

    • gather_port is the port of the Gather service is listening on

    gather_ip: &gatherip "<External IP of the host executing the Gather service>"
    gather_port: &gatherport "5001"
    
  6. Update the number of VINA nodes if necessary. For this, edit the infra-autodock-vina.yaml file and modify the min parameter under the scaling keyword. Currently, it is set to 5.

    - &VINA
        name: VINA
        type: flowbster_node
        scaling:
                min: 5
    
  7. Load the node definition for flowbster_node nodes into the database.

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition (file) changes!

    occopus-import nodes/node_definitions.yaml
    
  8. Start deploying the infrastructure. Make sure the proper virtualenv is activated!

    occopus-build infra-autodock-vina.yaml
    
  9. After successful finish, the nodes with ip address and node id are listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    VINA:
      <ip-address> (2f7d3d7e-c90c-4f33-831d-91e987e8e8b2)
      <ip-address> (49bed8d2-94b0-4a7e-9672-744921dacac0)
      <ip-address> (10664026-0b31-4848-9f7a-98f880f98be7)
      <ip-address> (a0f5d091-aecc-488c-94f2-34e546f87832)
      <ip-address> (285d7efd-84a7-4ed5-a6fa-73db47bc2e87)
    COLLECTOR:
      <ip-address> (4ca11ad3-a6ec-411b-89e6-d516169df9c7)
    GENERATOR:
      <ip-address> (9b8dc4f1-bed4-4d1c-ba9e-45c18ee2523d)
    30bc1d09-8ed5-4b7e-9e51-24ed881fc166
    
  10. Once the infrastructure is ready, the input files can be sent to the Generator node of the workflow (check the address of the node at the end of the output of the occopus-build command). Using the following command in the flowbster-autodock-vina/inputs directory:

    ../scripts/flowbster-feeder.sh -h <ip of GENERATOR node> -i input-description-for-vina.yaml -d input-ligands.zip -d input-receptor.pdbqt -d vina-config.txt
    

    The -h parameter is the Generator node’s address, -i is the input description file and with -d we can define data file(s).

    Note

    The scripts in the scripts directory need Python 2.7. Alternatively you can activate the Occopus virtualenv!

    Note

    It may take a quite few minutes until the processes end. Please, be patient!

  11. With step 10, the data processing was started. The whole processing time depends on the overall performance of the VINA nodes. VINA nodes process 10 molecule packages, which are collected by the Collector node. You can check the progress of processing on the Collector node by checking the number of files under /var/flowbster/jobs/<id of workflow>/inputs directory. When the number of files reaches 10, Collector node combines them and sends one package to Gather node which stores it under directory /tmp/flowbster/results.

  12. Once you finished processing molecules, you may stop the Gather service:

    scripts/flowbster-gather.sh -d
    
  13. Finally, you can destroy the infrastructure using the infrastructure id returned by occopus-build

    occopus-destroy -i 30bc1d09-8ed5-4b7e-9e51-24ed881fc166
    

Note

You can run a bigger application, with more input files. This application will run for approximately 4 hours with 5 VINA nodes. Edit Generator node’s variables section in the infra-autodock-3node.yaml file. Set the jobflow/app/args variable 10 to 240 and repeat the tutorial using the input2 directory. For running this experiment, we selected a molecule set of 3840 molecules. This set will be splitted into 240 parts, so each part included 16 molecules to dock against the receptor molecule.

nodes:
    - &GENERATOR
        name: GENERATOR
        type: flowbster_node
        variables:
            flowbster:
                app:
                    exe:
                        filename: execute.bin
                        tgzurl: https://github.com/occopus/flowbster/raw/devel/examples/vina/bin/generator_exe.tgz
                    args: '240'