Big Data and AI applications

Apache Hadoop cluster

This tutorial sets up a complete Apache Hadoop infrastructure. It contains a Hadoop Master node and Hadoop Slave worker nodes, which can be scaled up or down. To register Hadoop Slave nodes Consul is used.

Features

  • creating two types of nodes through contextualisation

  • utilising health check against a predefined port

  • using scaling parameters to limit the number of Hadoop Slave nodes

  • manage cluster nodes with Consul

Prerequisites

  • accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, OCCI, etc.)

  • target cloud contains a base Ubuntu OS image with cloud-init support

  • generated ssh key-pair (or for testing purposes one is attached)

Download

You can download the example as tutorial.examples.hadoop-cluster .

Note

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

Steps

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the nodes labelled by node_def:.

    The downloadable package for this example contains a resource template for the Nova plugin.

    Important

    Do not modify the values of the contextualisation and the health_check section’s attributes!

    Important

    Do not specify the server_name attribute for slaves so they are named automatically by Occopus to make sure node names are unique!

    Note

    If you want Occopus to monitor (health_check) your Hadoop Master and it is to be deployed in a different network, make sure you assign public (floating) IP to the Master node.

  2. Components in the infrastructure connect to each other, therefore several port ranges must be opened for the VMs executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

    Protocol

    Port(s)

    Service

    TCP

    22

    SSH

    TCP

    8025

    TCP

    8042

    TCP

    8088

    TCP

    8300-8600

    TCP

    9000

    TCP

    50000-51000

  3. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the resource you would like to use. Setting authentication information is described here.

  4. Update the number of Hadoop Slave worker nodes if necessary. For this, edit the infra-occopus-hadoop.yaml file and modifiy the min and max parameter under the scaling keyword. Scaling is the interval in which the number of nodes can change (min, max). Currently, the minimum is set to 2 (which will be the initial number at startup), and the maximum is set to 10.

    - &S
       name: hadoop-slave
       type: hadoop_slave_node
       scaling:
               min: 2
               max: 10
    

    Important

    Important: Keep in mind that Occopus has to start at least one node from each node type to work properly and scaling can be applied only for Hadoop Slave nodes in this example!

  5. Load the node definitions into the database. Make sure the proper virtualenv is activated!

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

    occopus-import nodes/node_definitions.yaml
    
  6. Start deploying the infrastructure.

    occopus-build infra-hadoop-cluster.yaml
    
  7. After successful finish, the nodes with ip address and node id are listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    hadoop-master:
        192.168.xxx.xxx (3116eaf5-89e7-405f-ab94-9550ba1d0a7c)
    hadoop-slave:
        192.168.xxx.xxx (23f13bd1-25e7-30a1-c1b4-39c3da15a456)
        192.168.xxx.xxx (7b387348-b3a3-5556-83c3-26c43d498f39)
    
    14032858-d628-40a2-b611-71381bd463fa
    
  8. You can check the health and statistics of the cluster through the following web pages:

    • Health of nodes: http://[HadoopMasterIP]:50070

    • Job statistics: http://[HadoopMasterIP]:8088

  9. To launch a Hadoop MapReduce job copy your input and executable files to the Hadoop Master node, and perform the submission described here . To login to the Hadoop Master node use the private key attached to the tutorial package:

    ssh -i builtin_hadoop_private_key hduser@[HadoopMaster ip]
    
  10. Finally, you may destroy the infrastructure using the infrastructure id returned by occopus-build

    occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa
    

Apache Spark cluster

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. For more information visit the official Apache Spark page .

This tutorial sets up a complete Apache Spark infrastructure. It contains a Spark Master node and Spark Worker nodes, which can be scaled up or down.

Features

  • creating two types of nodes through contextualisation

  • utilising health check against a predefined port

  • using scaling parameters to limit the number of Spark Worker nodes

Prerequisites

  • accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, OCCI, etc.)

  • target cloud contains a base Ubuntu OS image with cloud-init support

Download

You can download the example as tutorial.examples.spark-cluster .

Note

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

Steps

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the nodes labelled by node_def:.

    The downloadable package for this example contains a resource template for the Nova plugin.

    Important

    Do not modify the values of the contextualisation and the health_check section’s attributes!

    Important

    Do not specify the server_name attribute for workers so they are named automatically by Occopus to make sure node names are unique!

    Note

    If you want Occopus to monitor (health_check) your Spark Master and it is to be deployed in a different network, make sure you assign public (floating) IP to the Master node.

  2. Components in the infrastructure connect to each other, therefore several port ranges must be opened for the VMs executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

    Protocol

    Port(s)

    Service

    TCP

    22

    SSH

    TCP

    4040

    Web UI

    TCP

    8080

    Web UI (Standalone mode)

  3. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the resource you would like to use. Setting authentication information is described here.

  4. Update the number of Spark Worker nodes if necessary. For this, edit the infra-occopus-spark.yaml file and modifiy the min and max parameter under the scaling keyword. Scaling is the interval in which the number of nodes can change (min, max). Currently, the minimum is set to 2 (which will be the initial number at startup), and the maximum is set to 10.

    - &W
       name: spark-worker
       type: spark_worker_node
       scaling:
               min: 2
               max: 10
    

    Important

    Important: Keep in mind that Occopus has to start at least one node from each node type to work properly and scaling can be applied only for Spark Worker nodes in this example!

  5. Load the node definitions into the database. Make sure the proper virtualenv is activated!

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

    occopus-import nodes/node_definitions.yaml
    
  6. Start deploying the infrastructure.

    occopus-build infra-spark-cluster.yaml
    
  7. After successful finish, the nodes with ip address and node id are listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    spark-master:
        192.168.xxx.xxx (3116eaf5-89e7-405f-ab94-9550ba1d0a7c)
    spark-worker:
        192.168.xxx.xxx (23f13bd1-25e7-30a1-c1b4-39c3da15a456)
        192.168.xxx.xxx (7b387348-b3a3-5556-83c3-26c43d498f39)
    
    14032858-d628-40a2-b611-71381bd463fa
    
  8. You can check the health and statistics of the cluster through the following web pages:

    • Spark UI: http://<SparkMasterIP>:8080

    • Application UI: http://<SparkMasterIP>:4040

  9. You can find examples to test your cluster by submitting an aplication on the Apache Spark cluster at this site .

  10. Finally, you may destroy the infrastructure using the infrastructure id returned by occopus-build

    occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa
    

Apache Spark cluster with RStudio Stack

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. For more information visit the official Apache Spark page .

Apache Spark cluster together with HDFS (Hadoop Distributed File System) represents one of the most important tool for Big Data and machine learning applications, enabling the parallel processing of large data sets on many virtual machines, which are running Spark workers. On the other hand, setting up a Spark cluster with HDFS on clouds is not straightforward, requiring deep knowledge of both cloud and Apache Spark architecture. To save this hard work for scientists we have created and made public the required infrastructure descriptors by which Occopus can automatically deploy Spark clusters with the number of workers specified by the user. One of the most typical application area of Big Data technology is the statistical data processing that is usually done by the programming language R. In order to facilitate the work of statisticians using Spark on cloud, we have created an extended version of the Spark infrastructure descriptors placing the sparklyr library on Spark workers, too. Finally, we have also integrated the user-friendly RStudio user interface into the Spark system. As a result, researchers using the statistical R package can easily and quickly deploy a complete R-oriented Spark cluster on clouds containing the following components: RStudio, R, sparklyr, Spark and HDFS.

This tutorial sets up a complete Apache Spark infrastructure integrated with HDFS, R, RStudio and sparklyr. It contains a Spark Master node and Spark Worker nodes, which can be scaled up or down.

Features

  • creating two types of nodes through contextualisation

  • utilising health check against a predefined port

  • using scaling parameters to limit the number of Spark Worker nodes

Prerequisites

  • accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, OCCI, etc.)

  • target cloud contains a base Ubuntu OS image with cloud-init support

Download

You can download the example as tutorial.examples.spark-cluster-with-r .

Note

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

Steps

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the nodes labelled by node_def:.

    The downloadable package for this example contains a resource template for the Nova plugin.

    Important

    Do not modify the values of the contextualisation and the health_check section’s attributes!

    Important

    Do not specify the server_name attribute for workers so they are named automatically by Occopus to make sure node names are unique!

    Note

    If you want Occopus to monitor (health_check) your Spark Master and it is to be deployed in a different network, make sure you assign public (floating) IP to the Master node.

  2. Components in the infrastructure connect to each other, therefore several port ranges must be opened for the VMs executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

    Protocol

    Port(s)

    Service

    TCP

    22

    SSH

    TCP

    4040

    Web UI

    TCP

    8080

    Web UI (Standalone mode)

    TCP

    8787

    TCP

    50070

  3. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the resource you would like to use. Setting authentication information is described here.

  4. Update the number of Spark Worker nodes if necessary. For this, edit the infra-occopus-spark.yaml file and modifiy the min and max parameter under the scaling keyword. Scaling is the interval in which the number of nodes can change (min, max). Currently, the minimum is set to 2 (which will be the initial number at startup), and the maximum is set to 10.

    - &W
       name: spark-worker
       type: spark_worker_node
       scaling:
               min: 2
               max: 10
    

    Important

    Important: Keep in mind that Occopus has to start at least one node from each node type to work properly and scaling can be applied only for Spark Worker nodes in this example!

  5. Load the node definitions into the database. Make sure the proper virtualenv is activated!

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

    occopus-import nodes/node_definitions.yaml
    
  6. Start deploying the infrastructure.

    occopus-build infra-spark-cluster.yaml
    
  7. After successful finish, the nodes with ip address and node id are listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    spark-master:
        192.168.xxx.xxx (3116eaf5-89e7-405f-ab94-9550ba1d0a7c)
    spark-worker:
        192.168.xxx.xxx (23f13bd1-25e7-30a1-c1b4-39c3da15a456)
        192.168.xxx.xxx (7b387348-b3a3-5556-83c3-26c43d498f39)
    
    14032858-d628-40a2-b611-71381bd463fa
    
  8. You can check the health and statistics of the cluster through the following web pages:

    • HDFS NameNode UI: http://<SparkMasterIP>:50070

    • Spark UI: http://<SparkMasterIP>:8080

    • Spark Application UI: http://<SparkMasterIP>:4040 (active only if a Spark application is running)

    Note

    The webUIs are protected, the access needs a login. The default username/password is spark/lpds, which can be changed before deployment.

  9. Testing RStudio

The RStudio’s web interface can be access via http://<SparkMasterIP>:8787, logging with the sparkuser/lpds username/password pair.

9.1. Testing R package

install.packages('txtplot')
library('txtplot')
txtplot(cars[,1], cars[,2], xlab = "speed", ylab = "distance")

In this test, we download an R package, called “txtplot” from CRAN , load it to R and then draw an XY plot.

9.2. Testing R with Spark on local mode

install.packages("sparklyr")
library(sparklyr)
Sys.setenv(SPARK_HOME = '/home/sparkuser/spark')
sc <- spark_connect(master = "local")
sdf_len(sc, 5, repartition = 1) %>%
spark_apply(function(e) I(e))
spark_disconnect_all()

In this test, we download the “sparklyr” package for Spark, load it into R, enter the path to our Spark directory, and create the Spark Context to run the code. When the Spark Context is created, our application is also displayed on the Application UI interface under Running Applications, available at http: // <SparkMasterIP>: 4040. An active Spark Context session can also be found on the interface of RStudio, in the upper right corner, under the “Connections” tab, the Spark logo appears with the configurations of Spark Context.

Note

Downloading new packages may take a few minutes.

The result of the test are numbers listed from 1 to 5. This test shows that the Spark Master ran with Spark R. The last line closes the application, otherwise Spark Context will run forever and a new application would not get new resources. (see Figure 1.)

_images/spark_rstudio_test1.png

Figure 1. Result of the first test

9.3. Testing R with Spark on cluster mode

install.packages("sparklyr")
library(sparklyr)
Sys.setenv(SPARK_HOME = '/home/sparkuser/spark')
sc <- spark_connect(master = "spark://<SparkMasterIP>:7077")
sdf_len(sc, 5, repartition = 1) %>%
spark_apply(function(e) I(e))
spark_disconnect_all()

The first three rows are the same as those of the second test, but we have repeated them for the sake of completeness. In this test, we download the “sparklyr” package required to use Spark, load it into R, enter the path of our Spark directory and create the Spark Context to run the code.

Note

Downloading new packages may take a few minutes.

Important

Do not forget to update placeholders.

When the Spark Context is created, the application is also displayed on the Application UI interface under Running Applications available at http: // <SparkMasterIP>: 4040.

An active Spark Context session can also be seen on the RStudio interface, in the upper right corner, under the “Connections” tab, the Spark logo appears with the configurations of Spark Context, now with the Spark Master IP address.

_images/spark_rstudio_test2.png

Figure 2. Spark Context session on RStudio UI

The test results are the same, numbers listed 1 through 5 (see Figure 1). This test shows that in the Spark cluster, the task was run in parallel, distributed along with R. The last line closes the application, otherwise Spark Context will run indefinitely, so the new application will not get new resources.

Note

For more example visit spark.rstudio.com .

  1. Finally, you may destroy the infrastructure using the infrastructure id returned by occopus-build

occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa

Apache Spark cluster with Python Stack

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming. For more information visit the official Apache Spark page .

Apache Spark cluster together with HDFS (Hadoop Distributed File System) represents one of the most important tool for Big Data and machine learning applications, enabling the parallel processing of large data sets on many virtual machines, which are running Spark workers. On the other hand, setting up a Spark cluster with HDFS on clouds is not straightforward, requiring deep knowledge of both cloud and Apache Spark architecture. To save this hard work for scientists we have created and made public the required infrastructure descriptors by which Occopus can automatically deploy Spark clusters with the number of workers specified by the user. Spark also provides a special library called “Spark MLlib” for supporting machine learning applications. Similarly, to the R-oriented Spark environment, we have developed the infrastructure descriptors for the creation of a machine learning environment in the cloud. Here, the programming language is Python and the user programming environment is Jupyter. The complete machine learning environment consists of the following components: Jupyter, Python, Spark and HDFS. Deploying this machine learning environment is also automatically done by Occopus and the number of Spark workers can be defined by the user.

This tutorial sets up a complete Apache Spark infrastructure integrated with HDFS, Python and Jupyter Notebook. It contains a Spark Master node and Spark Worker nodes, which can be scaled up or down.

Features

  • creating two types of nodes through contextualisation

  • utilising health check against a predefined port

  • using scaling parameters to limit the number of Spark Worker nodes

Prerequisites

  • accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, OCCI, etc.)

  • target cloud contains a base Ubuntu OS image with cloud-init support

Download

You can download the example as tutorial.examples.spark-cluster-with-python .

Note

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

Steps

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the nodes labelled by node_def:.

    The downloadable package for this example contains a resource template for the Nova plugin.

    Important

    Do not modify the values of the contextualisation and the health_check section’s attributes!

    Important

    Do not specify the server_name attribute for workers so they are named automatically by Occopus to make sure node names are unique!

    Note

    If you want Occopus to monitor (health_check) your Spark Master and it is to be deployed in a different network, make sure you assign public (floating) IP to the Master node.

  2. Components in the infrastructure connect to each other, therefore several port ranges must be opened for the VMs executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

    Protocol

    Port(s)

    Service

    TCP

    22

    SSH

    TCP

    4040

    Web UI

    TCP

    8080

    Web UI (Standalone mode)

    TCP

    8888

    Jupyter Notebook

    TCP

    50070

  3. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the resource you would like to use. Setting authentication information is described here.

  4. Update the number of Spark Worker nodes if necessary. For this, edit the infra-occopus-spark.yaml file and modifiy the min and max parameter under the scaling keyword. Scaling is the interval in which the number of nodes can change (min, max). Currently, the minimum is set to 2 (which will be the initial number at startup), and the maximum is set to 10.

    - &W
       name: spark-worker
       type: spark_worker_node
       scaling:
               min: 2
               max: 10
    

    Important

    Important: Keep in mind that Occopus has to start at least one node from each node type to work properly and scaling can be applied only for Spark Worker nodes in this example!

  5. Load the node definitions into the database. Make sure the proper virtualenv is activated!

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

    occopus-import nodes/node_definitions.yaml
    
  6. Start deploying the infrastructure.

    occopus-build infra-spark-cluster.yaml
    
  7. After successful finish, the nodes with ip address and node id are listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    spark-master:
        192.168.xxx.xxx (3116eaf5-89e7-405f-ab94-9550ba1d0a7c)
    spark-worker:
        192.168.xxx.xxx (23f13bd1-25e7-30a1-c1b4-39c3da15a456)
        192.168.xxx.xxx (7b387348-b3a3-5556-83c3-26c43d498f39)
    
    14032858-d628-40a2-b611-71381bd463fa
    
  8. You can check the health and statistics of the cluster through the following web pages:

    • HDFS NameNode UI: http://<SparkMasterIP>:50070

    • Spark UI: http://<SparkMasterIP>:8080

    • Spark Application UI: http://<SparkMasterIP>:4040 (active only if a Spark application is running)

    Note

    The webUIs are protected, the access needs a login. The default username/password is spark/lpds, which can be changed before deployment.

  9. Testing with Jupyter Notebook

    The Jupyter notebook’s web interface can be access via “http://<SparkMasterIP>:8888”. Here, you can upload and run Jupyter notebooks.

    Note

    The webUIs are protected, the access needs a login. The default password is “lpds”, which can be changed before deployment.

  10. Finally, you may destroy the infrastructure using the infrastructure id returned by occopus-build

    occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa
    

TensorFlow and Keras with Jupyter Notebook Stack

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache License 2.0 on November 9, 2015. For more information visit the official TensorFlow page .

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. Keras contains numerous implementations of commonly used neural-network building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make working with image and text data easier. In addition to standard neural networks, Keras has support for convolutional and recurrent neural networks. It supports other common utility layers like dropout, batch normalization, and pooling. For more information visit the official Keras page .

The complete machine learning environment consists of the following components: Jupyter, Keras (version 2.2.4) and TensorFlow (version 1.13.1).

Features

  • creating a node through contextualisation

  • utilising health check against a predefined port

Prerequisites

  • accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, OCCI, etc.)

  • target cloud contains a base Ubuntu OS image with cloud-init support

Download

You can download the example as tutorial.examples.tensorflow-keras-jupyter .

Note

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

Steps

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the nodes labelled by node_def:.

    The downloadable package for this example contains a resource template for the Nova plugin.

    Important

    Do not modify the values of the contextualisation and the health_check section’s attribute!

    Note

    If you want Occopus to monitor (health_check) your initiated virtual machine and it is to be deployed in a different network, make sure you assign public (floating) IP to the node.

  2. Services on the virtual machine should be available from outside, therefore some port numbers must be opened for the VM executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

    Protocol

    Port(s)

    Service

    TCP

    22

    SSH

    TCP

    8888

    Jupyter Notebook

  3. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the resource you would like to use. Setting authentication information is described here.

  4. Load the node definitions into the database. Make sure the proper virtualenv is activated!

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

    occopus-import nodes/node_definitions.yaml
    
  5. Start deploying the infrastructure.

    occopus-build infra-jupyter-server.yaml
    
  6. After successful finish, the node with ip address and node id is listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    jupyter-server:
        192.168.xxx.xxx (3116eaf5-89e7-405f-ab94-9550ba1d0a7c)
    
    14032858-d628-40a2-b611-71381bd463fa
    
  7. You can start using the TensorFlow/Keras stack through the Jupyter notebook using your web browster at the following URL:

    • Jupyter notebook: http://<JupyterServerIP>:8888

    Note

    The webUIs are protected, the access needs a login. The default password is “lpds”, which can be changed before deployment.

  8. Run a demo ML application. Select tensorflow-demo/TensorFlowDemoWithPictures.ipynb file within the Jupyter notebook interface, and select Cells/Run All to run all of the commands below, or use shift+enter within a cell to run the cells one-by-one.

  9. Finally, you may destroy the infrastructure using the infrastructure id returned by occopus-build

    occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa
    

TensorFlow and Keras with Jupyter Notebook Stack using NVIDIA GPU card

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. TensorFlow was developed by the Google Brain team for internal Google use. It was released under the Apache License 2.0 on November 9, 2015. For more information visit the official TensorFlow page .

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research. Keras contains numerous implementations of commonly used neural-network building blocks such as layers, objectives, activation functions, optimizers, and a host of tools to make working with image and text data easier. In addition to standard neural networks, Keras has support for convolutional and recurrent neural networks. It supports other common utility layers like dropout, batch normalization, and pooling. For more information visit the official Keras page .

The complete machine learning environment consists of the following components: Jupyter, Keras and TensorFlow utilizing the power of a GPU card.

Important

If you want to use this tutorial, your virtual machine should have an attached NVIDIA GPU card. If you would like to alter the CUDA driver, feel free to personalize the install-cuda.sh script within nodes/cloud_init_jupyter_server_gpu.yaml file.

Features

  • creating a node through contextualisation

  • utilising health check against a predefined port

Prerequisites

  • accessing a cloud through an Occopus-compatible interface (e.g EC2, Nova, OCCI, etc.)

  • target cloud contains a base Ubuntu OS image with cloud-init support

Download

You can download the example as tutorial.examples.tensorflow-keras-jupyter-gpu .

Note

In this tutorial, we will use nova cloud resources (based on our nova tutorials in the basic tutorial section). However, feel free to use any Occopus-compatible cloud resource for the nodes, but we suggest to instantiate all nodes in the same cloud.

Steps

  1. Open the file nodes/node_definitions.yaml and edit the resource section of the nodes labelled by node_def:.

    The downloadable package for this example contains a resource template for the Nova plugin.

    Important

    Do not modify the values of the contextualisation and the health_check section’s attribute!

    Note

    Make sure you assign public (floating) IP to the node.

  2. Services on the virtual machine should be available from outside, therefore some port numbers must be opened for the VM executing the components. Clouds implement port opening various way (e.g. security groups for OpenStack, etc). Make sure you implement port opening in your cloud for the following port ranges:

    Protocol

    Port(s)

    Service

    TCP

    22

    SSH

    TCP

    8888

    Jupyter Notebook

  3. Make sure your authentication information is set correctly in your authentication file. You must set your authentication data for the resource you would like to use. Setting authentication information is described here.

  4. Load the node definitions into the database. Make sure the proper virtualenv is activated!

    Important

    Occopus takes node definitions from its database when builds up the infrastructure, so importing is necessary whenever the node definition or any imported (e.g. contextualisation) file changes!

    occopus-import nodes/node_definitions.yaml
    
  5. Start deploying the infrastructure.

    occopus-build infra-jupyter-server-gpu.yaml
    
  6. After successful finish, the node with ip address and node id is listed at the end of the logging messages and the identifier of the newly built infrastructure is printed. You can store the identifier of the infrastructure to perform further operations on your infra or alternatively you can query the identifier using the occopus-maintain command.

    List of nodes/ip addresses:
    jupyter-server-gpu:
        192.168.xxx.xxx (3116eaf5-89e7-405f-ab94-9550ba1d0a7c)
    
    14032858-d628-40a2-b611-71381bd463fa
    
  7. You can start using the TensorFlow/Keras stack through the Jupyter notebook using your web browster at the following URL:

    • Jupyter notebook: `http://<JupyterServerIP>:8888`


    Note

    The webUIs are protected, the access needs a login. The default password is “lpds”, which can be changed before deployment.

  8. Run a demo ML application. In this short demo application we can test our Anaconda development environment. We are building a convolutional neural network (_CNN_ or _ConvNet_) which is able to classify animals into 4 categories (dog, cat, bird and fish). Classification is a type of supervised learning - this means we need to provide labels for all the training data. We are going to utilize _Keras_ with _Tensorflow backend_ for achieving this goal, more precisely for creating and compiling model, training and testing. Keras enables fast experimentation with deep neural networks, and it focuses on being user-friendly and modular, so it’s a natural choice for this task, while Tensorflow is responsible for managing the underlying structures and calculations.

    Select keras-gpu-demo/Simple_image_classifier.ipynb file (see Figure 1) within the Jupyter notebook interface, and select Cells/Run All to run all of the commands below, or use shift+enter within a cell to run the cells one-by-one.

    _images/tf-keras-jupyter-gpu-tutorial.png

    Figure 1: Jupyter Notebook for testing TensorFlow/Keras environment with GPU

  9. Finally, you may destroy the infrastructure using the infrastructure id returned by occopus-build

    occopus-destroy -i 14032858-d628-40a2-b611-71381bd463fa