Docker GPU

How to get Docker working for GPU

Ref:
https://github.com/NVIDIA/nvidia-docker
https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-driver
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker
https://github.com/floydhub/dl-docker

FIrst clone the tensorflow module

git clone https://github.com/tensorflow/tensorflow.git

The directory inside this:

tensorflow/tools/docker

has the Dockerfile which we will use to install tensorflow for GPU

But, the tensorflow installation using pip does not work, so comment those lines as shown

# Install TensorFlow GPU version.
#RUN pip --no-cache-dir install \
    #http://storage.googleapis.com/tensorflow/linux/gpu/tensorflow\_gpu-0.0.0-cp27-none-linux\_x86\_64.whl

And add this at the top of the file:

FROM gcr.io/tensorflow/tensorflow:latest-gpu

Latest setup for AWS p2.xlarge instance:

apt-get remove docker docker-engine docker.io
apt-get update
apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    software-properties-common

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

apt-key fingerprint 0EBFCD88

add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

apt-get install docker-ce

apt-get update

./aws_gpu_instance_setup.sh

The modified Dockerfile for GPU looks like:

FROM nvidia/cuda:8.0-cudnn6-devel-ubuntu16.04
FROM gcr.io/tensorflow/tensorflow:latest-gpu

MAINTAINER Craig Citro <craigcitro@google.com>

ARG DEBIAN_FRONTEND=noninteractive

# Pick up some TF dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        build-essential \
        curl \
        libfreetype6-dev \
        libpng12-dev \
        libzmq3-dev \
        pkg-config \
        python \
        python-dev \
        rsync \
        software-properties-common \
        unzip \
        && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

#==================================
# Amit - custom installs
#==================================
#RUN apt-get install -y apt-utils
# Install nodejs
RUN curl -sL https://deb.nodesource.com/setup_7.x | bash -
RUN apt-get install -y nodejs
# for scp
RUN apt-get install -y openssh-client
# for git
RUN apt-get install -y git

# Install mysql-server
RUN apt-get install -y mysql-server
RUN service mysql start

# Amit - custom installs
RUN apt-get install -y vim
RUN apt-get install -y cython
RUN apt-get install -y python-pandas
RUN apt-get install -y python-cairosvg
RUN apt-get install -y python-pydot
RUN apt-get install -y python-pygraphviz
RUN apt-get install -y s3cmd
RUN apt-get install -y python-boto
RUN apt-get install -y python-mysqldb
RUN pip install --upgrade pip
RUN pip install pydotplus
RUN pip install graphviz
RUN pip install keras


RUN curl -O https://bootstrap.pypa.io/get-pip.py && \
    python get-pip.py && \
    rm get-pip.py

RUN pip --no-cache-dir install \
        ipykernel \
        jupyter \
        matplotlib \
        numpy \
        scipy \
        sklearn \
        pandas \
        Pillow \
        && \
    python -m ipykernel.kernelspec

# --- DO NOT EDIT OR DELETE BETWEEN THE LINES --- #
# These lines will be edited automatically by parameterized_docker_build.sh. #
# COPY _PIP_FILE_ /
# RUN pip --no-cache-dir install /_PIP_FILE_
# RUN rm -f /_PIP_FILE_

# Install TensorFlow GPU version.
#RUN pip --no-cache-dir install \
    #http://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-0.0.0-cp27-none-linux_x86_64.whl
# --- ~ DO NOT EDIT OR DELETE BETWEEN THE LINES --- #

# RUN ln -s /usr/bin/python3 /usr/bin/python#

# Set up our notebook config.
COPY jupyter_notebook_config.py /root/.jupyter/

# Copy sample notebooks.
COPY notebooks /notebooks

# Jupyter has issues with being run directly:
#   https://github.com/ipython/ipython/issues/7062
# We just add a little wrapper script.
COPY run_jupyter.sh /

# For CUDA profiling, TensorFlow requires CUPTI.
ENV LD_LIBRARY_PATH /usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH

# TensorBoard
EXPOSE 6006
# IPython
EXPOSE 8888

WORKDIR "/notebooks"

#CMD ["/run_jupyter.sh", "--allow-root"]
CMD ["/bin/bash"]
#CMD ["nohup" "./run_jupyter.sh" "--allow-root" ">" "tf_files/nohup.out" "2>&1" "<" "/dev/null" "&"]

Next execute:

BUILD THE IMAGE:
docker build -t custom-agshift-docker-image-gpu .
RUN THE IMAGE
nvidia-docker run -p 8888:8888 -p 6006:6006 --name custy-agshift-gpu -it -v /home/ubuntu/tf_files:/tf_files custom-agshift-docker-image-gpu
THEN FROM DOCKER BASH, RUN
nohup ./run_jupyter.sh --allow-root > tf_files/nohup.out 2>&1 < /dev/null &

 nohup ./run_jupyter.sh --allow-root > tf_files/nohup.out 2>&1 < /dev/null &

INSTALL OPENCV

Follow the instructions here: http://milq.github.io/install-opencv-ubuntu-debian/

The above did not work for me. So I created an install script following instructions on OpenCV website

make a directory called /tf_files/opencv_install. Then change the directory to that.
Create the below bash shell script. Then execute the script.

#install_opencv_mine.sh
=========================
git clone https://github.com/Itseez/opencv.git
git clone https://github.com/Itseez/opencv_contrib.git
cd opencv
mkdir build
cd build
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..
make -j7 # runs 7 jobs in parallel
make install


After this install the python opencv bindings
apt-get install -y python-opencv

AWS GPU instance

http://ec2-35-164-187-208.us-west-2.compute.amazonaws.com:8888/?token=927d17004747094e5d7aa123342e99e5afcd3ee6a65ea1ea

Note:

If the web browser does not connect to jupyter session, make sure that the EC2 instance has the correct 'security group' with the ports opened up.

security group example (at least these ports should be opened up): launch-wizard-3
===================================================================================
Custom TCP Rule
TCP
8888
0.0.0.0/0

Custom TCP Rule
TCP
8888
::/0

Custom TCP Rule
TCP
6006
0.0.0.0/0

Custom TCP Rule
TCP
6006
::/0

SSH
TCP
22
0.0.0.0/0

HTTPS
TCP
443
0.0.0.0/0

HTTPS
TCP
443
::/0

STOP INSTANCE, RESTART INSTANCE, START DOCKER and then RE_RUN jupyter

When the GPU instance is stopped and then restarted, it assigns a new public DNS.

So the previous way of connecting to AWS EC2 instance won't work. Just change the DNS name in the 'ssh command'

Everything else remains the same. All files, docker stays intact. If we do the following we will see that everything is the same

 sudo su
 cd /home/ubuntu/workdir/tensorflow/tensorflow/tools/docker
 docker images -a
 docker ps -a # will show that the previous docker process has exited. Copy the container id from here

 # NOW JUST START THE DOCKER CONTAINER AND THEN START jupyter
 docker start <CONTAINER ID> # for example: f5e826b541b4
 docker attach <CONTAINER ID> # for example: f5e826b541b4
 cd /
 nohup ./run_jupyter.sh --allow-root > tf_files/nohup.out 2>&1 < /dev/null &

To start another docker shell using the same container id, but avoiding clash with co-user

docker exec -it f5e8 bash
This will open another bash shell in the same container

# To start jupyter inside docker from an xterm (say own Mac) and then 
# silently exiting without breaking docker
> cd /
> nohup ./run_jupyter.sh --allow-root > tf_files/nohup.out 2>&1 < /dev/null &
> CTRL P Q

Enabling GPU's in container ecosystem

Ref: https://devblogs.nvidia.com/gpu-containers-runtime/

PreviousDockerfile examples NextTensorFlow

Last updated 5 years ago

Was this helpful?