GCP Production

GCP Production readiness

Objectives: DO THIS ON THE HOST

The Nvidia driver, CUDA,and CUDNN should not automatically update and screw things up
Uptime 24 hours
Manual upgrade/update of libraries over long weekends, with prior notification to our clients (Driscoll's/Olam)
First, bring up a parallel machine with the new updates/upgrades, switch the redirection in Route53 in AWS to point to the new machine. Then upgrade/update the old machine or delete that instance
To switch of automatic updates edit: sudo vim /etc/apt/apt.conf.d/10periodic and turn off the following:
1. APT::Periodic::Update-Package-Lists “0”;

Ref: https://towardsdatascience.com/troubleshooting-gcp-cuda-nvidia-docker-and-keeping-it-running-d5c8b34b6a4c

There is more to switching auto-updates/upgrades. Turn off by running the following command.

 sudo dpkg-reconfigure -plow unattended-upgrades

# Files to lookout for:
/etc/apt/apt.conf.d/10periodic
/etc/apt/apt.conf.d/20auto-upgrades
/etc/apt/apt.conf.d/50unattended-upgrades

/etc/cron.daily/apt-compat
/usr/lib/apt/apt.systemd.daily

Ref: https://debian-handbook.info/browse/stable/sect.regular-upgrades.html

Preparing the GCP machine with nvidia driver and cuda

Ref: Driver and cuda installation:
     https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
     See the section under ubuntu

1. Go to: https://www.nvidia.com/Download/index.aspx?lang=en-us
    Find the compatible driver for the GPU/OS
    Download the driver to your machine (for example your Mac or laptop)
    scp the downloaded driver  to GCP machine:
    Example:
         gcloud compute --project agshift-dev-dl scp --zone us-west1-b ~/Downloads/nvidia-diag-driver-local-repo-ubuntu1604-410.72_1.0-1_amd64.deb ubuntu@dldev5hydra1:/tmp/

    login to GCP machine
    mkdir Downloads
    cd /Downloads
    mv /tmp/nvidia-diag-driver-local-repo-ubuntu1604-410.72_1.0-1_amd64.deb .

2. Install the driver:
      sudo su
      dpkg -i nvidia-diag-driver-local-repo-ubuntu1604-410.72_1.0-1_amd64.deb
        (It might give the error that the key is not installed. So, run the following command and then run the dpkg command again)
        sudo apt-key add /var/nvidia-diag-driver-local-repo-410.72/7fa2af80.pub

      apt-get update
      apt-get install -y cuda-drivers
      reboot

   Check nvidia driver version
   cat /proc/driver/nvidia/version
   nvidia-smi

3. Install cuda toolkit
   sudo apt-get install -y nvidia-cuda-toolkit

   Type the command to see the cuda compiler version
   nvcc -V

4. Install cuda (given in the website, but apt cannot find this. Step 2 installed cuda-drivers already. Probably that is the latest way of doing it)
   #sudo apt-get install -y cuda

   #Check cuda version: cat /usr/local/cuda/version.txt
   apt list | grep cuda-drivers

   =========================================
   DOING IT THIS WAY
   =========================================
   Ref: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal
   Go to the above website and find the compatible cuda toolkit (example: 10.0)
   cd ~/Downloads
   wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64

   Installation Instructions:
     sudo dpkg -i cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
     sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
     sudo apt-get update
     sudo apt-get install cuda

   After this cuda shows up in /usr/local/cuda-10.0
   Version shows up in cat /usr/local/cuda/version.txt


5. docker-ce installation (for ubuntu machines - 16.04 xenial distro. Don't do the debian)
   Ref: https://docs.docker.com/install/linux/docker-ce/debian/#install-docker-ce-1
   Repository configuration:

   sudo apt-get update
   sudo apt-get install \
     apt-transport-https \
     ca-certificates \
     curl \
     gnupg2 \
     software-properties-common
  curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
  sudo apt-key fingerprint 0EBFCD88
  sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"

  Installation:
    sudo apt-get update
    sudo apt-get install docker-ce

6. nvidia-docker2 installation
   Ref: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
   Ref: https://nvidia.github.io/nvidia-docker/

   Repository configuration:
     curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
     distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
     curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
     sudo apt-get update

  Installation:
     sudo apt-get install nvidia-docker2
     sudo pkill -SIGHUP dockerd

7. Instance setup
   Ref: README file for installing the tensorflow server at:
       https://github.com/agshift/cashewDL/tree/feature/mask_rcnn_segmentation/deep_learning_repo/utils/instance_setup

   Build docker container:
     sudo docker build -f tensorflow_server.Dockerfile -t tf_serving_dev .

   sudo usermod -a -G docker ${USER}

   Run  docker container:
     sudo docker run -it -d --privileged --runtime=nvidia -p 8500:8500 -p 52022:22 -p 6001:80 -p 7006-7025:6006-6025 -p 8090:8090 -v /usr/lib/nvidia-410/bin:/usr/local/nvidia/bin -v /usr/lib/nvidia-410:/usr/local/nvidia/lib -v /home/ubuntu/tf_files:/tf_files tf_serving_dev /bin/bash


   Execute docker terminal to get in docker
   ./start_docker_terminal.sh
      This gets you inside docker. Then run configure.sh inside the docker container to setup docker

Preparing the Docker for nvidia driver

1. Pull the tensorflow-serving image
https://www.tensorflow.org/tfx/serving/docker

docker pull tensorflow/serving:latest-devel-gpu

Or, build your own Docker image from a Dockerfile (this takes a long time, but worked the best)
 git clone https://github.com/tensorflow/serving.git
 cd serving/tensorflow_serving/tools/docker/
 vim Dockerfile.devel-gpu

 Then use the above Dockerfile.devel-gpu to build the image from scratch
 docker build --pull -t $USER/tensorflow-serving-devel-gpu -f Dockerfile.devel-gpu .

THE ABOVE tensorflow serving image will take care of cuda and cudnn installations based 
on nvidia driver installation on the host

2. Run the serving docker image
docker run -it --runtime=nvidia -p 8501:8501 -v /home/ubuntu/tf_files:/tf_files tensorflow/serving:latest-devel-gpu


3. Inside Docker container: Install pip requirements.txt to perform custom tensorflow, opencv and other important 
package installations
git clone https://github.com/agshift/yobiDL.git
git checkout feature/strawberry_segmentation
pip install --upgrade -r requirements.txt


4. Inside Docker container: Install opencv (unofficial pre-built opencv packages)
https://pypi.org/project/opencv-python/

pip install opencv-python
# Install the following if you get an error like below by typing import cv2 in python2.7
Error: ImportError: libSM.so.6: cannot open shared object file: No such file or directory

Remedy: Install the following
apt update && apt install -y libsm6 libxext6

5. Run a standard tensorflow job to make sure everything is working

# train the mnist model
python tensorflow_serving/example/mnist_saved_model.py /tmp/mnist_model

# serve the model
tensorflow_model_server --port=8500 --model_name=mnist --model_base_path=/tmp/mnist_model/ &

# test the client
python tensorflow_serving/example/mnist_client.py --num_tests=1000 --server=localhost:8500

6. If error is encountered, then uninstall tensorflow and re-install another version of tensorflow
pip uninstall tensorflow
pip install --user tensorflow-gpu==1.13.1 (or some other version)

GCP docker setup

# install correct tensorflow-serving

# cuda9.0 check /usr/local/cuda

# then inside docker container install requirements.txt as
pip install --upgrade -r requirements.txt

# to install OpenCV
pip install opencv-python
or
pip install opencv-contrib-python

# To avoid "libSM.so" error after doing import cv2 in python do the following
 apt update && apt install -y libsm6 libxext6

# to avoid ImportError: libgthread-2.0.so.0 after doing import cv2 in python do the following
 apt-get update && apt-get install -y libgtk2.0-dev

#you may also attempt to locate the file:

apt-get install mlocate
sudo updatedb
locate libgthread-2.0

GCP 'scp' issue

Ref: https://console.cloud.google.com/support/cases/detail/19312938?organizationId=599160234450

# If you fail to run the 'scp' command from HCP machine, that means you have
# insufficient privilege

# First check 'auth'
gcloud auth list
(if your id is not showing up, then you need to login)

gcloud auth login
=> follow the instructions that appear on the terminal

# Further note from the ticket:
the default service account with default scopes does not have the necessary permission to run gcloud 
compute scp. Using your own account will work.

You might want to consider using OS Login[1]. This will allow you to register multiple ssh keys which 
you can use to authenticate to a GCE instance using any ssh client like Putty or OpenSSH. You can then 
use ssh-agent to forward your credentials for a seamless experience.

GCP cloud drive

Ref: https://cloud.google.com/storage/docs/quickstart-gsutil#create

Ref: https://cloud.google.com/compute/docs/disks/gcs-buckets

# copy image from CLI
# this will also create the folder 'test' if it does not exist
 gsutil cp op_pic_1527058743670.jpg gs://enclosure/test/
 gsutil cp gs://enclosure/test/op_pic_1527058743670.jpg /tmp/

# list objects from gcloud bucket
 gsutil ls gs://enclosure/
 gsutil ls -l gs://enclosure/test

# Delete object
 gsutil rm gs://enclosure/test/op_pic_1527058743670.jpg

# to check ACL privilege of bucket or a folder
 gsutil acl get gs://enclosure

# to make publicly accessible
 gsutil acl ch -u AllUsers:R gs://my-awesome-bucket/kitten.png

# to delete public permission
 gsutil acl ch -d AllUsers gs://my-awesome-bucket/kitten.png

# to give IAM access to an individual
 gsutil iam ch user:jane@gmail.com:objectCreator,objectViewer gs://my-awesome-bucket

# to remove the IAM access
 gsutil iam ch -d user:jane@gmail.com:objectCreator,objectViewer gs://my-awesome-bucket

GCP cloud storage fuse

Ref: https://cloud.google.com/storage/docs/gcs-fuse

Ref: https://github.com/GoogleCloudPlatform/gcsfuse/blob/master/docs/mounting.md

# Technical overview
 Cloud Storage FUSE works by translating object storage names into a file and directory system, 
 interpreting the “/” character in object names as a directory separator so that objects with the 
 same common prefix are treated as files in the same directory. Applications can interact with the 
 mounted bucket like a simple file system, providing virtually limitless file storage running in the cloud.

While Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the 
backend. Cloud Storage FUSE retains the same fundamental characteristics of Cloud Storage, preserving 
he scalability of Cloud Storage in terms of size and aggregate performance while maintaining the same 
latency and single object performance. As with the other access methods, Cloud Storage does not support 
concurrency and locking. For example, if multiple Cloud Storage FUSE clients are writing to the same file, 
the last flush wins.

# to install gcsfuse
 export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
 echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
 curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
 sudo apt-get update
 sudo apt-get install gcsfuse

# to set correct permissions
  gcloud auth application-default login

  then follow the steps to create the respective json credential file
  the set the following ENV variable
  export GOOGLE_APPLICATION_CREDENTIALS="/home/ubuntu/.config/gcloud/application_default_credentials.json"

# to unmount gcsfuse
  fusermount -u /home/ubuntu/tf_files/gcs_fuse_mnt

# /etc/fstab entry looks like this
 enclosure /home/ubuntu/tf_files/gcs_fuse_mnt gcsfuse rw,noauto,allow_other,uid=1000,gid=1000,key_file=/home/ubuntu/.config/gcloud/application_default_credentials.json

#1000, 1000 are the uid, gid of ubuntu user

Making python point to python3 and starting uwsgi flask in python3

 Ref: https://askubuntu.com/questions/320996/how-to-make-python-program-command-execute-python-3
 $ sudo update-alternatives --config python
 If you get the error "no alternatives for python" then set up an alternative yourself with the following command:

 $ sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10

Optimized Tensorflow serving

  docker pull ubuntu:16.04
# launch ubuntu docker container
  docker run -d -it --privileged -p 8090:8099 -p 7006:6006 -v /home/ubuntu/tf_files:/tf_files ubuntu:16.04 /bin/bash


  docker pull tensorflow/serving:latest-gpu
  export MODEL_FILE=/home/ubuntu/tf_files/common/model_configs/serving_config_23.txt
# launch optimized tensorflow serving
  docker run --runtime=nvidia --privileged -p 9000-9008:9000-9008 -p 8501:8501 -p 7007-7027:6007-6027 --mount type=bind,source=${MODEL_FILE},target=/models/models.config --mount type=bind,source=/home/ubuntu/tf_files,target=/tf_files -it -d tensorflow/serving:latest-gpu --model_config_file=/models/models.config --port=9000 /bin/bash

Dealing with concurrent requests - using nginx,uwsgi,flask

https://www.reddit.com/r/Python/comments/4s40ge/understanding_uwsgi_threads_processes_and_gil/

https://www.nginx.com/blog/tuning-nginx/

You should share more context about your application if you want good feedback on how to configure it. 
What kind of environment are you running this in? What bottleneck are you encountering? 
What are the performance characteristics of your service? QPS, cpu bound, io bound, memory bound, etc.

Nginx is a reverse proxy, web server, load balancer, etc. etc. Whatever you want to call it and however 
you're using it, the important thing to understand is that it's independent of python concurrency. All it 
does is forward requests. If you suspect that nginx is misconfigured or the source of your bottleneck, 
I can go into a little more detail (I have the least experience working with this layer), but it's unlikely 
to be your problem.

uWSGI works by creating an instance of the python interpreter and importing the python files related to your 
application. If you configure it with more than one process, it will fork that instance of the interpreter 
until there are the required number of processes. This is roughly equivalent to just starting that number of 
python interpreter instances by hand, except that uWSGI will handle incoming HTTP requests and forward them 
to your application. This also means that each process has memory isolation—no state is shared, so each 
process gets its own GIL.

Thus, using only processes for workers will give you the best performance if your aim is just to optimize 
throughput. However, processes come with tradeoffs. The main problem is that if your application benefits 
from sharing state and resources within the process, pre-forking makes this untenable. If you want an 
in-process cache for something, for example, your cache hit ratio would be much greater if all of your 
workers were housed in one process and could share the same cache. An important implication of this is that 
processes are very memory inefficient—memory isolation often requires that a lot of data is duplicated. 
(As a sidenote, there are ways to take advantage of copy on write semantics by loading things at import time, 
but that's a story for another day.)

For this reason, uWSGI also allows your workers to live within threads in the same process. These threads 
solve the problems mentioned above regarding shared state—now your workers can share the same cache, for 
example. However, it also means they share the same GIL. When more than one thread needs CPU time, it will 
not be possible for them to make progress concurrently. In fact, GIL contention and context switching will 
make your application run slower, on net.
For IO-bound applications where workers spend so much time waiting for IO that GIL contention is rare, this 
sounds like it shouldn't be a problem. And if your application is like most web applications that spend a 
large part of its time talking to other services or a database, it's probably IO bound. So all good right?

In reality, thread based uWSGI workers almost never work flawlessly for any python web application of even 
moderate complexity. The reason for this is primarily the ecosystem and the assumptions that people make 
writing python code—many libraries and internal code are flagrantly, unapologetically, and inconsolably NOT 
threadsafe. Even if your application is running smoothly today with thread based workers, you'll likely run 
into some hard-to-debug problem involving thread safety sooner than later.

Moreover, so called "IO bound" applications spend way more time on CPU than most developers realize, 
especially in python. Python code executes very slowly compared to most runtimes, and it's not uncommon for 
simple CRUD apps to spend ~20% of its time running python code as opposed to blocking on IO. Even with two 
threads, that's a lot of opportunity for GIL contention and further slowdowns.

My main point is this: Whatever bottleneck you're running into likely has to do with the fact that you're 
running 2 threads for each process. So unless shared state or memory utilization is very important to you, 
consider replacing that configuration with 4x processes, 1x threads instead and see what effect it has.

Previousscratchpad NextGStreamer

Last updated 5 years ago

Was this helpful?