tensorFlow on iOS
Last updated
Was this helpful?
Last updated
Was this helpful?
Inspired from Pete Warden's blog:
Bazel guide:
Assuming that we have already ran the retrain.py on train_images like this:
root@15a68539db77:/tf_files# python retrain_ORIG.py --bottleneck_dir=/tf_files/bottlenecks --how_many_training_steps=500 --model_dir
=/tf_files/inception --output_graph=/tf_files/retrained_graph.pb --output_labels=/tf_files/retrained_labels.txt --image_dir=/tf_file
s/train_images/ --summaries_dir=/tmp/retrain_logs
Using docker container, where tensorflow is installed (AWS)
Build the label image tool:
---------------------------
cd /tensorflow
bazel build --jobs=1 tensorflow/examples/label_image:label_image
* The --jobs=1 helps reduce the VM memory footprint, otherwise the default mode
spawns 200 jobs and the VM crashes with memory error
This generates a binary file 'label_image' as:
bazel-bin/tensorflow/examples/label_image/label_image
With the help of this tool we can run the model and get labels for images
Test it out, before proceeding forward. This is a sanity check:
----------------------------------------------------------------
bazel-bin/tensorflow/examples/label_image/label_image \
--output_layer=final_result \
--labels=/tf_files/retrained_labels.txt \
--graph=/tf_files/retrained_graph.pb
--input_layer=Mul \
--image=/tf_files/flower_photos/daisy/5547758_eea9edfd54_n.jpg
In my case:
-----------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/
retrained_labels_11.txt --graph=/tf_files/retrained_graph_11.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20
170420_134921.jpg
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/retrained_graph.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20170420_134921.jpg
using test image from the internet:
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/retrained_graph.pb --input_layer=Mul --image=/tf_files/test_images_internet/strawberry_m3.jpg
Tip: The --input_layer=Mul is needed to avoid "FeedInputs: unable to find feed output input" error
This also gives some 'W' (warning messages). From the community information the advice to avoid these Warning
messages is to use: export TF_CPP_MIN_LOG_LEVEL=2. But, this did not work, since it also suppressed the
prediction information from appearing on screen
Freeze Graph step: (NOT NEEDED)
------------------
bazel build --jobs=1 tensorflow/python/tools:freeze_graph
Generate the freezed_graph model as:
-------------------------------------
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=/tmp/voice/graph.pb --input_checkpoint=/tmp/voice/model \
--output_node_names=model/y_pred,inference/inference --input_binary \
--output_graph=/tmp/voice/frozen.pb
In my case:
-------------------
bazel-bin/tensorflow/python/tools/freeze_graph --input_graph=/tf_files/retrained_graph.pb --input_checkpoint=/tmp/checkpoint_amit/model --input_binary --output_graph=/tf_files/frozen_graph.pb --output_node_names=final_result
Optimization step:
------------------
This removes operations that are not supported by iOS. It also helps in reducing memory
footprint. For example, it removes ops like DecodeJpeg, which uses a memory intensive library 'libjpeg'
Besides, it also optimizes other things like, merging explicit batch normalization steps and merging
it with convolutional weights to minimize calculations
bazel build --jobs=1 tensorflow/python/tools:optimize_for_inference
Generate the optimized model as:
---------------------------------
bazel-bin/tensorflow/python/tools/optimize_for_inference \
--input=/tf_files/retrained_graph.pb \
--output=/tf_files/optimized_graph.pb \
--input_names=Mul \
--output_names=final_result
In my case:
-------------------
bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/tf_files/retrained_graph_11.pb --output=/tf_files/optimized_graph_11.pb --input_names=Mul --output_names=final_result
bazel-bin/tensorflow/python/tools/optimize_for_inference --input=/tf_files/retrained_graph.pb --output=/tf_files/optimized_graph.pb --input_names=Mul --output_names=final_result
To check that it hasn’t altered the output of the network, run label_image again on the
updated model(optimized_graph_11.pb):
Test as:
-------------------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/
retrained_labels_11.txt --graph=/tf_files/optimized_graph_11.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20
170420_134921.jpg
In my case:
-----------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/optimized_graph.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20170420_134921.jpg
using test image from the internet:
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/optimized_graph.pb --input_layer=Mul --image=/tf_files/test_images_internet/strawberry_m3.jpg
The optimized model is still ~87MB
root@15a68539db77:/tf_files# du -sk optimized_graph_11.pb
85124 optimized_graph_11.pb
Quantization step:
--------------------
Since ~87MB is still ahuge size to be included in the .ipa Apple packages, it's important to reduce the
size of the model/graph with minimum impact to the accuracy. A simple approach is to quantize the floating
point weights by just rounding them to a constant 256 levels. This helps in repetitions and the compression
algorithm to work better
bazel build --jobs=1 tensorflow/tools/quantization:quantize_graph
Generate the quantized/rounded model (graph) as:
-----------------------------------------
bazel-bin/tensorflow/tools/quantization/quantize_graph \
--input=/tf_files/optimized_graph.pb \
--output=/tf_files/rounded_graph.pb \
--output_node_names=final_result \
--mode=weights_rounded
In my case:
bazel-bin/tensorflow/tools/quantization/quantize_graph --input=/tf_files/optimized_graph_11.pb --outp
ut=/tf_files/quantized_rounded_graph_11.pb --output_node_names=final_result --mode=weights_rounded
bazel-bin/tensorflow/tools/quantization/quantize_graph --input=/tf_files/optimized_graph.pb --output=/tf_files/quantized_rounded_graph.pb --output_node_names=final_result --mode=weights_rounded
Note:
------
If you look on disk, the raw size of the rounded_graph_11.pb file is the same at 87MB, but if you
right-click on it in the finder and choose “Compress”, you should see it results in a file that’s only
about 24MB or so. That reflects what size increase you’d actually see in a compressed .ipa on iOS,
or an .apk on Android.
Test as:
--------
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/
retrained_labels_11.txt --graph=/tf_files/quantized_rounded_graph_11.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdar
eject/20170420_134921.jpg
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/quantized_rounded_graph.pb --input_layer=Mul --image=/tf_files/test_images_1/moldyusdareject/20170420_134921.jpg
using test image from the internet:
bazel-bin/tensorflow/examples/label_image/label_image --output_layer=final_result --labels=/tf_files/retrained_labels.txt --graph=/tf_files/quantized_rounded_graph.pb --input_layer=Mul --image=/tf_files/test_images_internet/strawberr
y_m3.jpg
We see that it suffers a slight loss in accuracy due to the quantization step
Memory mapping step:
---------------------
To avoid memory pressure in iPhone when the weights are all loaded this memory mapping step will help.
The memory mapping will rearrange the model so that the weights are held in sections that can be
easily loaded separately from the main GraphDef, though they’re all still in one file.
The weight buffers are read-only, so it’s possible to map them into memory in a way that the OS can easily
discard them behind the scenes when there’s memory pressure, avoiding the possibility of those crashes.
bazel build --jobs=1 tensorflow/contrib/util:convert_graphdef_memmapped_format
bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format \
--in_graph=/tf_files/rounded_graph.pb \
--out_graph=/tf_files/mmapped_graph.pb
In my case:
-----------
bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=/tf_files/quantized_ro
unded_graph_11.pb --out_graph=/tf_files/mmapped_graph_11.pb
bazel-bin/tensorflow/contrib/util/convert_graphdef_memmapped_format --in_graph=/tf_files/quantized_rounded_graph.pb --out_graph=/tf_files/mmapped_graph.pb
Note:
--------
One thing to watch out for is that the file on disk is no longer a plain GraphDef protobuf,
so if you try loading it into a program like label_image that expects one, you’ll see errors.
You need to load the model file slightly differently.
Compile tensorFlow for iOS
1. Switch to your Mac machine
2. Make sure you have Xcode 7.3 or greater, automake and Xcode CLI installed
brew install automake
3. Make a new directory and 'cd' to the directory
4. Clone the latest tensorFlow from git:
gti clone https://github.com/tensorflow/tensorflow
5. cd tensorflow
6. Run iOS compile script:
tensorflow/contrib/makefile/build_all_ios.sh
The iOS part in the video runs between 11:28 to 15:40
Time: 13:50
Link against:
tensorflow/contrib/makefile/gen/lib/libtensorflow-core.a
Link against:
the generated protobuf libraries
Add include paths
Compile app with -force_load
or --whole-archive (in linux) otherwise 'no session found error' will happen
Simple example:
================
Load Xcode project from:
tensorflow/contrib/ios-examples/simple/simple.xcodeproj
This is minimal code and UI and runs an inception model on images
Inception-v3 is 94MB. The following will help reduce the model size
With quantization this goes down to ~24MB
Using older version like quantized Inception-v1 we can get ~7MB
While Exporting models do the following
1. freeze_graph
tensorflow/pythons/tools:freeze_graph
This converts the existing model GraphDef (architecture) and checkpoints (trained weights) into a single large file,
which also makes it simple format for loading and further processing
2. Graph transform tool
Useful toolbox for re-writing graphs. The options that are helpful in this toolbox
-> "strip_unused_nodes" : gets rid of "OpKernel not found" errors
-> "remove_nodes(op=Identity, op=CheckNumerics)" : gets rid of debug nodes
-> "fold_batch_norms" : optimizes away some ops
3. Quantize weights
Shrinks file size by 75%, Can use "round_weights" to shrink compressed bundle size instead, but still use float
Will help in converting 32 bit floats (default) to 8-bit floats
Penalty: Small accuracy loss
Gain: Big reduction in size
4. Quantize calculations
This is even taking it further, where the calculations are also quantized
-> "quantize_nodes" transform switches math to 8-bit
-> Subset of ops supported
-> Only optimized for ARM
-> Works on inception model
5. Memory mapping
To avoid huge memory footprint when the app is run on the device. If it is huge then the app might be killed
by the iOS
tensorflow/contrib/util/convert_graphdef_memmapped_format.cc
This will:
-> speed up loading
-> save memory usage
Requires altered loading code as in: tensorflow/contrib/ios_examples/camera/CameraExampleViewController.mm code
To reduce the executable size:
==============================
i.e to reduce the binary size, sicne tensorflow can increase the binary size by 12MB before tuning
But, there are optimization techniques to reduce the executable size as well.
This is needed, since we do not want to increase the app download size> most space is taken up by op kernel
implementation. To keep the size under control, only common ops and datatypes are included by default
For "No OpKernel found" errors
Add:
tensorflow/core/kernels/*.cc to the
tensorflow/contrib/makefile/tf_op_files.txt if using makefile
Also, use selective registration by:
only including the ops you need
-> Run: python/tools/print_selective_registration_header.py on the model
-> Put: ops_to_register.h file in the root of your tensorflow source tree
-> Build with: -DSELECTIVE_REGISTRATION
After all these tunings, the executable size increases only by 2MB
Ref: