GCP Production
Last updated
Was this helpful?
Last updated
Was this helpful?
Objectives: DO THIS ON THE HOST
The Nvidia driver, CUDA,and CUDNN should not automatically update and screw things up
Uptime 24 hours
Manual upgrade/update of libraries over long weekends, with prior notification to our clients (Driscoll's/Olam)
First, bring up a parallel machine with the new updates/upgrades, switch the redirection in Route53 in AWS to point to the new machine. Then upgrade/update the old machine or delete that instance
To switch of automatic updates edit: sudo vim /etc/apt/apt.conf.d/10periodic and turn off the following:
APT::Periodic::Update-Package-Lists “0”;
Ref:
There is more to switching auto-updates/upgrades. Turn off by running the following command.
sudo dpkg-reconfigure -plow unattended-upgrades
# Files to lookout for:
/etc/apt/apt.conf.d/10periodic
/etc/apt/apt.conf.d/20auto-upgrades
/etc/apt/apt.conf.d/50unattended-upgrades
/etc/cron.daily/apt-compat
/usr/lib/apt/apt.systemd.daily
Ref: Driver and cuda installation:
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
See the section under ubuntu
1. Go to: https://www.nvidia.com/Download/index.aspx?lang=en-us
Find the compatible driver for the GPU/OS
Download the driver to your machine (for example your Mac or laptop)
scp the downloaded driver to GCP machine:
Example:
gcloud compute --project agshift-dev-dl scp --zone us-west1-b ~/Downloads/nvidia-diag-driver-local-repo-ubuntu1604-410.72_1.0-1_amd64.deb ubuntu@dldev5hydra1:/tmp/
login to GCP machine
mkdir Downloads
cd /Downloads
mv /tmp/nvidia-diag-driver-local-repo-ubuntu1604-410.72_1.0-1_amd64.deb .
2. Install the driver:
sudo su
dpkg -i nvidia-diag-driver-local-repo-ubuntu1604-410.72_1.0-1_amd64.deb
(It might give the error that the key is not installed. So, run the following command and then run the dpkg command again)
sudo apt-key add /var/nvidia-diag-driver-local-repo-410.72/7fa2af80.pub
apt-get update
apt-get install -y cuda-drivers
reboot
Check nvidia driver version
cat /proc/driver/nvidia/version
nvidia-smi
3. Install cuda toolkit
sudo apt-get install -y nvidia-cuda-toolkit
Type the command to see the cuda compiler version
nvcc -V
4. Install cuda (given in the website, but apt cannot find this. Step 2 installed cuda-drivers already. Probably that is the latest way of doing it)
#sudo apt-get install -y cuda
#Check cuda version: cat /usr/local/cuda/version.txt
apt list | grep cuda-drivers
=========================================
DOING IT THIS WAY
=========================================
Ref: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=deblocal
Go to the above website and find the compatible cuda toolkit (example: 10.0)
cd ~/Downloads
wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64
Installation Instructions:
sudo dpkg -i cuda-repo-ubuntu1604-10-0-local-10.0.130-410.48_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
After this cuda shows up in /usr/local/cuda-10.0
Version shows up in cat /usr/local/cuda/version.txt
5. docker-ce installation (for ubuntu machines - 16.04 xenial distro. Don't do the debian)
Ref: https://docs.docker.com/install/linux/docker-ce/debian/#install-docker-ce-1
Repository configuration:
sudo apt-get update
sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg2 \
software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88
sudo add-apt-repository \
"deb [arch=amd64] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) \
stable"
Installation:
sudo apt-get update
sudo apt-get install docker-ce
6. nvidia-docker2 installation
Ref: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
Ref: https://nvidia.github.io/nvidia-docker/
Repository configuration:
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
Installation:
sudo apt-get install nvidia-docker2
sudo pkill -SIGHUP dockerd
7. Instance setup
Ref: README file for installing the tensorflow server at:
https://github.com/agshift/cashewDL/tree/feature/mask_rcnn_segmentation/deep_learning_repo/utils/instance_setup
Build docker container:
sudo docker build -f tensorflow_server.Dockerfile -t tf_serving_dev .
sudo usermod -a -G docker ${USER}
Run docker container:
sudo docker run -it -d --privileged --runtime=nvidia -p 8500:8500 -p 52022:22 -p 6001:80 -p 7006-7025:6006-6025 -p 8090:8090 -v /usr/lib/nvidia-410/bin:/usr/local/nvidia/bin -v /usr/lib/nvidia-410:/usr/local/nvidia/lib -v /home/ubuntu/tf_files:/tf_files tf_serving_dev /bin/bash
Execute docker terminal to get in docker
./start_docker_terminal.sh
This gets you inside docker. Then run configure.sh inside the docker container to setup docker
1. Pull the tensorflow-serving image
https://www.tensorflow.org/tfx/serving/docker
docker pull tensorflow/serving:latest-devel-gpu
Or, build your own Docker image from a Dockerfile (this takes a long time, but worked the best)
git clone https://github.com/tensorflow/serving.git
cd serving/tensorflow_serving/tools/docker/
vim Dockerfile.devel-gpu
Then use the above Dockerfile.devel-gpu to build the image from scratch
docker build --pull -t $USER/tensorflow-serving-devel-gpu -f Dockerfile.devel-gpu .
THE ABOVE tensorflow serving image will take care of cuda and cudnn installations based
on nvidia driver installation on the host
2. Run the serving docker image
docker run -it --runtime=nvidia -p 8501:8501 -v /home/ubuntu/tf_files:/tf_files tensorflow/serving:latest-devel-gpu
3. Inside Docker container: Install pip requirements.txt to perform custom tensorflow, opencv and other important
package installations
git clone https://github.com/agshift/yobiDL.git
git checkout feature/strawberry_segmentation
pip install --upgrade -r requirements.txt
4. Inside Docker container: Install opencv (unofficial pre-built opencv packages)
https://pypi.org/project/opencv-python/
pip install opencv-python
# Install the following if you get an error like below by typing import cv2 in python2.7
Error: ImportError: libSM.so.6: cannot open shared object file: No such file or directory
Remedy: Install the following
apt update && apt install -y libsm6 libxext6
5. Run a standard tensorflow job to make sure everything is working
# train the mnist model
python tensorflow_serving/example/mnist_saved_model.py /tmp/mnist_model
# serve the model
tensorflow_model_server --port=8500 --model_name=mnist --model_base_path=/tmp/mnist_model/ &
# test the client
python tensorflow_serving/example/mnist_client.py --num_tests=1000 --server=localhost:8500
6. If error is encountered, then uninstall tensorflow and re-install another version of tensorflow
pip uninstall tensorflow
pip install --user tensorflow-gpu==1.13.1 (or some other version)
# install correct tensorflow-serving
# cuda9.0 check /usr/local/cuda
# then inside docker container install requirements.txt as
pip install --upgrade -r requirements.txt
# to install OpenCV
pip install opencv-python
or
pip install opencv-contrib-python
# To avoid "libSM.so" error after doing import cv2 in python do the following
apt update && apt install -y libsm6 libxext6
# to avoid ImportError: libgthread-2.0.so.0 after doing import cv2 in python do the following
apt-get update && apt-get install -y libgtk2.0-dev
#you may also attempt to locate the file:
apt-get install mlocate
sudo updatedb
locate libgthread-2.0
# If you fail to run the 'scp' command from HCP machine, that means you have
# insufficient privilege
# First check 'auth'
gcloud auth list
(if your id is not showing up, then you need to login)
gcloud auth login
=> follow the instructions that appear on the terminal
# Further note from the ticket:
the default service account with default scopes does not have the necessary permission to run gcloud
compute scp. Using your own account will work.
You might want to consider using OS Login[1]. This will allow you to register multiple ssh keys which
you can use to authenticate to a GCE instance using any ssh client like Putty or OpenSSH. You can then
use ssh-agent to forward your credentials for a seamless experience.
# copy image from CLI
# this will also create the folder 'test' if it does not exist
gsutil cp op_pic_1527058743670.jpg gs://enclosure/test/
gsutil cp gs://enclosure/test/op_pic_1527058743670.jpg /tmp/
# list objects from gcloud bucket
gsutil ls gs://enclosure/
gsutil ls -l gs://enclosure/test
# Delete object
gsutil rm gs://enclosure/test/op_pic_1527058743670.jpg
# to check ACL privilege of bucket or a folder
gsutil acl get gs://enclosure
# to make publicly accessible
gsutil acl ch -u AllUsers:R gs://my-awesome-bucket/kitten.png
# to delete public permission
gsutil acl ch -d AllUsers gs://my-awesome-bucket/kitten.png
# to give IAM access to an individual
gsutil iam ch user:jane@gmail.com:objectCreator,objectViewer gs://my-awesome-bucket
# to remove the IAM access
gsutil iam ch -d user:jane@gmail.com:objectCreator,objectViewer gs://my-awesome-bucket
# Technical overview
Cloud Storage FUSE works by translating object storage names into a file and directory system,
interpreting the “/” character in object names as a directory separator so that objects with the
same common prefix are treated as files in the same directory. Applications can interact with the
mounted bucket like a simple file system, providing virtually limitless file storage running in the cloud.
While Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the
backend. Cloud Storage FUSE retains the same fundamental characteristics of Cloud Storage, preserving
he scalability of Cloud Storage in terms of size and aggregate performance while maintaining the same
latency and single object performance. As with the other access methods, Cloud Storage does not support
concurrency and locking. For example, if multiple Cloud Storage FUSE clients are writing to the same file,
the last flush wins.
# to install gcsfuse
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install gcsfuse
# to set correct permissions
gcloud auth application-default login
then follow the steps to create the respective json credential file
the set the following ENV variable
export GOOGLE_APPLICATION_CREDENTIALS="/home/ubuntu/.config/gcloud/application_default_credentials.json"
# to unmount gcsfuse
fusermount -u /home/ubuntu/tf_files/gcs_fuse_mnt
# /etc/fstab entry looks like this
enclosure /home/ubuntu/tf_files/gcs_fuse_mnt gcsfuse rw,noauto,allow_other,uid=1000,gid=1000,key_file=/home/ubuntu/.config/gcloud/application_default_credentials.json
#1000, 1000 are the uid, gid of ubuntu user
Ref: https://askubuntu.com/questions/320996/how-to-make-python-program-command-execute-python-3
$ sudo update-alternatives --config python
If you get the error "no alternatives for python" then set up an alternative yourself with the following command:
$ sudo update-alternatives --install /usr/bin/python python /usr/bin/python3 10
docker pull ubuntu:16.04
# launch ubuntu docker container
docker run -d -it --privileged -p 8090:8099 -p 7006:6006 -v /home/ubuntu/tf_files:/tf_files ubuntu:16.04 /bin/bash
docker pull tensorflow/serving:latest-gpu
export MODEL_FILE=/home/ubuntu/tf_files/common/model_configs/serving_config_23.txt
# launch optimized tensorflow serving
docker run --runtime=nvidia --privileged -p 9000-9008:9000-9008 -p 8501:8501 -p 7007-7027:6007-6027 --mount type=bind,source=${MODEL_FILE},target=/models/models.config --mount type=bind,source=/home/ubuntu/tf_files,target=/tf_files -it -d tensorflow/serving:latest-gpu --model_config_file=/models/models.config --port=9000 /bin/bash
You should share more context about your application if you want good feedback on how to configure it.
What kind of environment are you running this in? What bottleneck are you encountering?
What are the performance characteristics of your service? QPS, cpu bound, io bound, memory bound, etc.
Nginx is a reverse proxy, web server, load balancer, etc. etc. Whatever you want to call it and however
you're using it, the important thing to understand is that it's independent of python concurrency. All it
does is forward requests. If you suspect that nginx is misconfigured or the source of your bottleneck,
I can go into a little more detail (I have the least experience working with this layer), but it's unlikely
to be your problem.
uWSGI works by creating an instance of the python interpreter and importing the python files related to your
application. If you configure it with more than one process, it will fork that instance of the interpreter
until there are the required number of processes. This is roughly equivalent to just starting that number of
python interpreter instances by hand, except that uWSGI will handle incoming HTTP requests and forward them
to your application. This also means that each process has memory isolation—no state is shared, so each
process gets its own GIL.
Thus, using only processes for workers will give you the best performance if your aim is just to optimize
throughput. However, processes come with tradeoffs. The main problem is that if your application benefits
from sharing state and resources within the process, pre-forking makes this untenable. If you want an
in-process cache for something, for example, your cache hit ratio would be much greater if all of your
workers were housed in one process and could share the same cache. An important implication of this is that
processes are very memory inefficient—memory isolation often requires that a lot of data is duplicated.
(As a sidenote, there are ways to take advantage of copy on write semantics by loading things at import time,
but that's a story for another day.)
For this reason, uWSGI also allows your workers to live within threads in the same process. These threads
solve the problems mentioned above regarding shared state—now your workers can share the same cache, for
example. However, it also means they share the same GIL. When more than one thread needs CPU time, it will
not be possible for them to make progress concurrently. In fact, GIL contention and context switching will
make your application run slower, on net.
For IO-bound applications where workers spend so much time waiting for IO that GIL contention is rare, this
sounds like it shouldn't be a problem. And if your application is like most web applications that spend a
large part of its time talking to other services or a database, it's probably IO bound. So all good right?
In reality, thread based uWSGI workers almost never work flawlessly for any python web application of even
moderate complexity. The reason for this is primarily the ecosystem and the assumptions that people make
writing python code—many libraries and internal code are flagrantly, unapologetically, and inconsolably NOT
threadsafe. Even if your application is running smoothly today with thread based workers, you'll likely run
into some hard-to-debug problem involving thread safety sooner than later.
Moreover, so called "IO bound" applications spend way more time on CPU than most developers realize,
especially in python. Python code executes very slowly compared to most runtimes, and it's not uncommon for
simple CRUD apps to spend ~20% of its time running python code as opposed to blocking on IO. Even with two
threads, that's a lot of opportunity for GIL contention and further slowdowns.
My main point is this: Whatever bottleneck you're running into likely has to do with the fact that you're
running 2 threads for each process. So unless shared state or memory utilization is very important to you,
consider replacing that configuration with 4x processes, 1x threads instead and see what effect it has.
Ref:
Ref:
Ref:
Ref:
Ref:
Ref: