tensorFlow on iOS

Inspired from Pete Warden's blog:

https://petewarden.com/2016/09/27/tensorflow-for-mobile-poets/

Bazel guide:

https://bazel.build/versions/master/docs/bazel-user-manual.html#flag--host_jvm_args

Step 1:

Assuming that we have already ran the retrain.py on train_images like this:

root@15a68539db77:/tf_files# python retrain_ORIG.py --bottleneck_dir=/tf_files/bottlenecks --how_many_training_steps=500 --model_dir
=/tf_files/inception --output_graph=/tf_files/retrained_graph.pb --output_labels=/tf_files/retrained_labels.txt --image_dir=/tf_file
s/train_images/ --summaries_dir=/tmp/retrain_logs


Using docker container, where tensorflow is installed (AWS)
Build the label image tool:
---------------------------

cd /tensorflow
bazel build --jobs=1 tensorflow/examples/label_image:label_image

* The --jobs=1 helps reduce the VM memory footprint, otherwise the default mode 
  spawns 200 jobs and the VM crashes with memory error
This generates a binary file 'label_image' as:
bazel-bin/tensorflow/examples/label_image/label_image

With the help of this tool we can run the model and get labels for images

Test it out, before proceeding forward. This is a sanity check:
----------------------------------------------------------------

bazel-bin/tensorflow/examples/label_image/label_image \
--output_layer=final_result \
--labels=/tf_files/retrained_labels.txt \
--graph=/tf_files/retrained_graph.pb
--input_layer=Mul \
--image=/tf_files/flower_photos/daisy/5547758_eea9edfd54_n.jpg

In my case:
-----------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/
retrained_labels_11.txt --graph=/tf_files/retrained_graph_11.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20
170420_134921.jpg

bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/retrained_graph.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20170420_134921.jpg

using test image from the internet:
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/retrained_graph.pb --input_layer=Mul --image=/tf_files/test_images_internet/strawberry_m3.jpg


Tip: The --input_layer=Mul is needed to avoid "FeedInputs: unable to find feed output input" error

This also gives some 'W' (warning messages). From the community information the advice to avoid these Warning
messages is to use: export TF_CPP_MIN_LOG_LEVEL=2. But, this did not work, since it also suppressed the
prediction information from appearing on screen

Step 2:

Freeze Graph step: (NOT NEEDED)
------------------

bazel build --jobs=1 tensorflow/python/tools:freeze_graph

Generate the freezed_graph model as:
-------------------------------------
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=/tmp/voice/graph.pb --input_checkpoint=/tmp/voice/model \
--output_node_names=model/y_pred,inference/inference --input_binary \
--output_graph=/tmp/voice/frozen.pb

In my case:
-------------------
bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=/tf_files/retrained_graph.pb --input_checkpoint=/tmp/checkpoint_amit/model --input_binary --output_graph=/tf_files/frozen_graph.pb --output_node_names=final_result

Step 3:

Optimization step:
------------------
This removes operations that are not supported by iOS. It also helps in reducing memory
footprint. For example, it removes ops like DecodeJpeg, which uses a memory intensive library 'libjpeg'
Besides, it also optimizes other things like, merging explicit batch normalization steps and merging
it with convolutional weights to minimize calculations

bazel build --jobs=1 tensorflow/python/tools:optimize_for_inference

Generate the optimized model as:
---------------------------------
bazel-bin/tensorflow/python/tools/optimize_for_inference \
--input=/tf_files/retrained_graph.pb \
--output=/tf_files/optimized_graph.pb \
--input_names=Mul \
--output_names=final_result

In my case:
-------------------
bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/tf_files/retrained_graph_11.pb --output=/tf_files/optimized_graph_11.pb --input_names=Mul --output_names=final_result

bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/tf_files/retrained_graph.pb --output=/tf_files/optimized_graph.pb --input_names=Mul --output_names=final_result


To check that it hasn’t altered the output of the network, run label_image again on the 
updated model(optimized_graph_11.pb):

Test as:
-------------------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/
retrained_labels_11.txt --graph=/tf_files/optimized_graph_11.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20
170420_134921.jpg

In my case:
-----------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/optimized_graph.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20170420_134921.jpg

using test image from the internet:
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/optimized_graph.pb --input_layer=Mul --image=/tf_files/test_images_internet/strawberry_m3.jpg

The optimized model is still ~87MB
root@15a68539db77:/tf_files# du -sk optimized_graph_11.pb
85124   optimized_graph_11.pb

Step 4:

Quantization step:
--------------------
Since ~87MB is still ahuge size to be included in the .ipa Apple packages, it's important to reduce the
size of the model/graph with minimum impact to the accuracy. A simple approach is to quantize the floating
point weights by just rounding them to a constant 256 levels. This helps in repetitions and the compression
algorithm to work better

bazel build --jobs=1 tensorflow/tools/quantization:quantize_graph

Generate the quantized/rounded model (graph) as:
-----------------------------------------
bazel-bin/tensorflow/tools/quantization/quantize_graph \
--input=/tf_files/optimized_graph.pb \
--output=/tf_files/rounded_graph.pb \
--output_node_names=final_result \
--mode=weights_rounded

In my case:
bazel-bin/tensorflow/tools/quantization/quantize_graph --input=/tf_files/optimized_graph_11.pb --outp
ut=/tf_files/quantized_rounded_graph_11.pb --output_node_names=final_result --mode=weights_rounded

bazel-bin/tensorflow/tools/quantization/quantize_graph --input=/tf_files/optimized_graph.pb --output=/tf_files/quantized_rounded_graph.pb --output_node_names=final_result --mode=weights_rounded



Note:
------
If you look on disk, the raw size of the rounded_graph_11.pb file is the same at 87MB, but if you 
right-click on it in the finder and choose “Compress”, you should see it results in a file that’s only 
about 24MB or so. That reflects what size increase you’d actually see in a compressed .ipa on iOS, 
or an .apk on Android.

Test as:
--------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/
retrained_labels_11.txt --graph=/tf_files/quantized_rounded_graph_11.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdar
eject/20170420_134921.jpg

bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/quantized_rounded_graph.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20170420_134921.jpg

using test image from the internet:
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/quantized_rounded_graph.pb --input_layer=Mul --image=/tf_files/test_images_internet/strawberr
y_m3.jpg

We see that it suffers a slight loss in accuracy due to the quantization step

Step 5:

Memory mapping step:
---------------------
To avoid memory pressure in iPhone when the weights are all loaded this memory mapping step will help.
The memory mapping will rearrange the model so that the weights are held in sections that can be 
easily loaded separately from the main GraphDef, though they’re all still in one file.
The weight buffers are read-only, so it’s possible to map them into memory in a way that the OS can easily 
discard them behind the scenes when there’s memory pressure, avoiding the possibility of those crashes.

bazel build --jobs=1 tensorflow/contrib/util:convert_graphdef_memmapped_format

bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format \
--in_graph=/tf_files/rounded_graph.pb \
--out_graph=/tf_files/mmapped_graph.pb

In my case:
-----------
bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=/tf_files/quantized_ro
unded_graph_11.pb --out_graph=/tf_files/mmapped_graph_11.pb

bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=/tf_files/quantized_rounded_graph.pb --out_graph=/tf_files/mmapped_graph.pb



Note:
--------
One thing to watch out for is that the file on disk is no longer a plain GraphDef protobuf, 
so if you try loading it into a program like label_image that expects one, you’ll see errors. 
You need to load the model file slightly differently.

Step 5:

Compile tensorFlow for iOS
1. Switch to your Mac machine
2. Make sure you have Xcode 7.3 or greater, automake and Xcode CLI installed
   brew install automake
3. Make a new directory and 'cd' to the directory
4. Clone the latest tensorFlow from git:
   gti clone https://github.com/tensorflow/tensorflow
5. cd tensorflow
6. Run iOS compile script:
   tensorflow/contrib/makefile/build_all_ios.sh

Step 6:

Own App Building:

Ref: https://www.youtube.com/watch?v=0r9w3V923rk

The iOS part in the video runs between 11:28 to 15:40

Time: 13:50

Link against:
tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a

Link against:
the generated protobuf libraries

Add include paths

Compile app with -force_load
or --whole-archive (in linux) otherwise 'no session found error' will happen

Simple example:
================
Load Xcode project from:
tensorflow/contrib/ios-examples/simple/simple.xcodeproj

This is minimal code and UI and runs an inception model on images

Size of App:

Inception-v3 is 94MB. The following will help reduce the model size
With quantization this goes down to ~24MB

Using older version like quantized Inception-v1 we can get ~7MB

While Exporting models do the following

1. freeze_graph
tensorflow/pythons/tools:freeze_graph

This converts the existing model GraphDef (architecture) and checkpoints (trained weights) into a single large file,
which also makes it simple format for loading and further processing

2. Graph transform tool
Useful toolbox for re-writing graphs. The options that are helpful in this toolbox
     -> "strip_unused_nodes" : gets rid of "OpKernel not found" errors
     -> "remove_nodes(op=Identity, op=CheckNumerics)" : gets rid of debug nodes
     -> "fold_batch_norms" : optimizes away some ops

3. Quantize weights
Shrinks file size by 75%, Can use "round_weights" to shrink compressed bundle size instead, but still use float
Will help in converting 32 bit floats (default) to 8-bit floats
Penalty: Small accuracy loss
Gain: Big reduction in size

4. Quantize calculations
This is even taking it further, where the calculations are also quantized
    -> "quantize_nodes" transform switches math to 8-bit
    -> Subset of ops supported
    -> Only optimized for ARM
    -> Works on inception model

5. Memory mapping
To avoid huge memory footprint when the app is run on the device. If it is huge then the app might be killed 
by the iOS
tensorflow/contrib/util/convert_graphdef_memmapped_format.cc

This will:
    -> speed up loading
    -> save memory usage
Requires altered loading code as in: tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm code

To reduce the executable size:
==============================
i.e to reduce the binary size, sicne tensorflow can increase the binary size by 12MB before tuning
But, there are optimization techniques to reduce the executable size as well.
This is needed, since we do not want to increase the app download size> most space is taken up by op kernel
implementation. To keep the size under control, only common ops and datatypes are included by default

For "No OpKernel found" errors
Add:
tensorflow/core/kernels/*.cc to the 
tensorflow/contrib/makefile/tf_op_files.txt if using makefile

Also, use selective registration by:
only including the ops you need
    -> Run: python/tools/print_selective_registration_header.py on the model
    -> Put: ops_to_register.h file in the root of your tensorflow source tree
    -> Build with: -DSELECTIVE_REGISTRATION

After all these tunings, the executable size increases only by 2MB

PreviousTensorFlow NextJupyter

Last updated 5 years ago

Was this helpful?