Human Pose Estimation and Quantization of PyTorch to ONNX Models — A Detailed Guide
The story begins with a assignment given to me that needed me to deploy a Monocular Single Human Pose Estimation model on AWS Lambda. Me being a student, i prefer to be in the free tier of Lambda, where we get about 3GB of RAM and 500MB storage, the storage is quite less, and i had troubles fitting everything in one lambda, so i thought of trying out ONNX instead of using PyTorch. So let’s see how HPE works and how i converted a PyTorch Model to ONNX and then Quantized it.
Buckle up, this is going to be a long story !
If TL DR; then just see the below colab notebook satyajitghana/TSAI-DeepVision-EVA4.0-Phase-2 HumanPoseEstimation-PyTorch-ONNX-Quantgithub.com Google Colaboratory HumanPoseEstimation-PyTorch-ONNX-Quantcolab.research.google.com
Monocular Human Pose Estimation
Human pose estimation is the process of estimating the configuration of the body (pose) from a single, typically monocular, image. It can be applied to many applications such as action/activity recognition, action detection, human tracking, in movies and animation, virtual reality, human-computer interaction, video surveillance, medical assistance, self-driving, sports motion analysis, etc.
Broadly there are 4 HPE Methods
- Generative and Discriminative (3D Single Person)
- Top Down and Bottom Up (Multi-Person)
- Regression and Detection Based (Single Person)
- One-Stage and Multi-Stage
But in this story we will be using the Bottom Up Approach, i.e. we will be detecting the body parts (joints, limbs, or small template patches) and then joining them to create our human body.
The model i am referring to here is from the this Paper.
The paper describes and compares their model to SOTA hpe models that use the Hourglass model structure, the paper shows how even a very simple model, by adding deconv layers to the ResNet backbone can also give pretty good results. The code for this ResNet backbone PoseNet can be found here.
Simple Pose Benchmark
As you can see that a simple conv network has got a pretty good accuracy.
Enough of the model talk ! (i recommend reading the beautiful paper i referred above), Now let’s get to do some inferencing on the model and see what it outputs. Throughout this story i will be using Google Colab for running everything.
Start by cloning the human-pose-estimation.pytorch repository
! git clone https://github.com/microsoft/human-pose-estimation.pytorch && cd human-pose-estimation.pytorch && git checkout 18f1d0fa5b5db7fe08de640610f3fdbdbed8fb2f
Add it to the sys.path so colab knows where the library is
import sys
if "/content/human-pose-estimation.pytorch/lib/" not in sys.path:
sys.path.insert(0, "/content/human-pose-estimation.pytorch/lib/")
Import everything !
For this story we will use the ResNet50 model trained on 256x256 images of the MPII Dataset, it has 16 human body points. All of the MPII models can be found below pose_mpii - Google Drive Edit descriptiondrive.google.com
set the CONFIG_FILE and MODEL_PATH variables appropriately
CONFIG_FILE = '/content/human-pose-estimation.pytorch/experiments/mpii/resnet50/256x256_d256x3_adam_lr1e-3.yaml'
MODEL_PATH = '/content/pose_resnet_50_256x256.pth.tar'
update the config file
update_config(CONFIG_FILE)
config.GPUS = '' # we are running on CPU
Now we will load the model
We’ll be now using this guy’s image to detect pose. I wonder who this might be 🤔
Time to finally run the model on the image ! (ofcourse doing some image transformations first), you’ll notice something called JOINTS in below code, we’ll use those later ! they are from the MPII dataset, and our model will output those 16 human points points.
What now ? Lets do some visualizations !
Looks Amazing right ! thats Bottom Up HPE approach ! we never detected a bbox for my body, just the 16 parts !
What next ? just connect the dot !
It got all the 16 points ! 😲 (you can reduce the THRESHOLD if it didn’t)
But did you notice ? that lady in the back isn’t detected ? that’s because our model was trained on large human images only ! if we were to use a hourglass model kind of architecture, or maybe something like YOLO does for creating different resolution(scales) representations of the image, then we would have detected the pose for that lady as well.
Converting to ONNX and Quantizing
What is ONNX ?
ONNX is an open format built to represent machine learning models. ONNX defines a common set of operators — the building blocks of machine learning and deep learning models - and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers.
Install onnx and onnxruntime, we’ll need these
! pip install onnx onnxruntime
print_size_of_model(model)
Size (MB): 136.326509
Convert it to ONNX !
Size (MB): 136.247923
Now we’ve successfully converted our model to ONNX
At this point i tried to simply deploy the model to AWS Lambda, but the model size 130MB was too much, it didn’t fit in the 500MB provided.
Quantize it all !
A question you might have in your mind is, why not use the PyTorch’s Quantization ?
Well well well, i did take a look at that here, the issue being, but take a look at this, TL DR; the models we have right now cannot be quantized, only a few very special models can be like BERT, LSTM, or else you have to modify your model and add some special layers.
Size (MB): 65.933789
Did you see that ? the model is half the size now ! although this comes with a caveat that the accuracy is reduced.
Running the model on ONNX Runtime
Now we will run the model on python onnx runtime
In the Quant Model i lost a hand 😟 maybe reducing threshold might bring it back
But what did i gain from doing all this ?
onnxruntime for cpu is really small, and now i am not dependent upon PyTorch libraries !
Look at the size ! its teeny-tiny for cpu, for my current deployment i was using torch-1.6.0 and torchvision-0.7.0 which took over 500MB uncompressed. something i can’t afford in AWS Lambda free tier. Now that i have the ONNX model and a really small runtime, everything will fit in a single free Lambda runtime !
Plus i have a plan to use onnx.js and run the model on the client side itself ! it’ll save the roundtrips made to Lambda.
Checkout the deployment at: https://thetensorclan-web.herokuapp.com/
That’s it Folks ! you now know how simple HPE works ! and how we can convert our PyTorch/Tensorflow/Caffe2 models to ONNX. And how we can then Quantize the model.
Below is the link to the Colab Notebook where all this code is situated, you can play with the data, modify stuff and rerun the notebook satyajitghana/TSAI-DeepVision-EVA4.0-Phase-2 Permalink Dismiss GitHub is home to over 50 million developers working together to host and review code, manage…github.com