
In this post, I implement an object detection inference running in AWS Lambda in a python environment without a docker image. Therefore we use IceVision (PyTorch) as Deep Learning framework.
Most of the deployment guides for Deep learning with AWS Lambda focus on large-scale applications with a lot of traffic for your algorithm. However, if you work on a hobby project or in a business environment, where you have long idle times and infrequent access, you usually do not want to run a server 24/7 and wait for an inference request. The best fit for these types of deployments is Serverless.
When you google “inference with AWS Lambda”, you will find some guides, but you will be a bit disappointed when you see that these guides talk about classic ML inference with scikit-learn. Everyone, who works with DL frameworks, knows that the setup is quite heavy, and regarding the quotas for Lambda functions, the solution is not entirely obvious.
I had the idea for this post from Francesco Pochettis’s blog post about deploying object detection with AWS Lambda. I recommend reading this article because it shows you how to integrate workloads in Lambda with significant dependencies.
TL;DR
The tutorial is relatively long, so I created a Repo, which you can download and run in 3 simple steps. All you need to do is to change the ./lambda_inference/src/lambda/.taskcat.yml
and add an S3 bucket of yours in lines 5, 9, and 11. Furthermore, you may need to change the region in line 4. Then run these steps:
# Setup virtual environment and install dependencies ./setup.sh # [Optional] Train icevision tutorial model ./train.sh # Test and deploy to your AWS account ./deploy.sh
If you want to deploy your model, add a class_map.json into ./lambda_inference/src/lambda/lambda_functions/source/object_detection/ folder and a model.onnx to the./lambda_inference/src/modelfolder. Then you can run the above commands without the training command, and it will deploy your model into your specified bucket. Now you have a running object detection inference with AWS Lambda.
Object detection inference in AWS Lambda
The naive approach for inference with AWS Lambda would be to add your favorite vision framework into the requiremetns.txt of the Lambda and add some inference code along with your exported model. However, you will quickly notice that these frameworks are too large for a single Lambda function. The quotas for a single Lambda function are listed in the table below.
For example, TensorFlow has a package size of 1.4 GB, PyTorch of 4.3 GB, and even albumentations, which uses cv2 under hood, lead to an unzipped size of roughly 500 MB. Thus, our main target should be to reduce the package size to under 50 MB zipped and under 250 MB unzipped.
Although AWS enabled Lambda functions as a docker container, which is a great feature, the container size is quite large (>10 GB with full IceVision install), leading to cold starts of roughly 60 seconds. Another way of using these frameworks in a Lambda function is to attach an EFS (Elastic File System) to the Lambda. However, this is more complex than only using a single Lambda function.
Resource | Quota |
---|---|
Invocation payload (request and response) | 6 MB (synchronous) |
Deployment package (.zip file archive) size | 50 MB (zipped, for direct upload) 250 MB (unzipped) |
Container image code package size | 10 GB |
/tmp directory storage | 512 MB |
Another challenge can be the payload size when working with large images. In this tutorial, we will send images by converting the bytes to base64 (increases size around 30%) and then send the string in the payload. If your image is too large, you can directly retrieve an image from an S3 with your Lambda.
Our solution for the large package size is to use the IceVision model adapter to create a PyTorch Lightning model. From that, we can easily export our object detection model to ONNX, which can run inference in a lightweight environment with ONNX runtime in AWS Lambda. It is as easy as that!
The setup
For this tutorial, I expect you to have a working Python 3 installation on your system. I tested the code with the python:3.8 docker image. As infrastructure as code tool, I use Cloudformation from AWS, as it has the best state management, and you can deploy the tutorial more safely on your account. To quickly test your IaC I use TaskCat, which expects you to have credentials of your IAM access key and secret key in ~/.aws/credentials
. Just call aws configure
in your bash and add the credentials (needs to AWS CLI to work).
To avoid any impact on your current Python installation, you should install all dependencies in a virtual environment. Therefore, you can execute the ./setup.sh
from the repository. It will create a virtual environment in the repository folder called tutorial_venv
and install the requirements from the Lambda function and the training script. The code looks as follows:
#!/bin/bash set -e # Get dir of script and move to this dir SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" cd $SCRIPT_DIR # Install venvs for lambda test and neural network training python3 -m venv tutorial_venvrefore you can execute the ./setup.sh from the reporefore you can execute the ./setup.sh from the repo install () { source tutorial_venv/bin/activate pip install -r src/lambda/lambda_functions/source/object_detection/requirements.txt pip install -r src/model/requirements.txt pip3 install taskcat } install
After the setup, you can activate the virtual environment by executing source tutorial_venv/bin/activate
in your shell.
Creating an ONNX object detection model
First, let us train an object detection model. Therefore, we use the basic tutorial. I will post the complete training script here, but only explain important parts, that differ from the main tutorial:
import json from icevision.all import * from icevision.core import class_map import torch # Params image_size = 384 dest_dir = "fridge" num_epochs = 20 dl_worker = 0 # Download the dataset url = "https://cvbp-secondary.z19.web.core.windows.net/datasets/object_detection/odFridgeObjects.zip" data_dir = icedata.load_data(url, dest_dir) # Create the parser parser = parsers.VOCBBoxParser(annotations_dir=data_dir / "odFridgeObjects/annotations", images_dir=data_dir / "odFridgeObjects/images") # Parse annotations to create records train_records, valid_records = parser.parse() # Transforms # size is set to 384 because EfficientDet requires its inputs to be divisible by 128 train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()]) valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()]) # Datasets train_ds = Dataset(train_records, train_tfms) valid_ds = Dataset(valid_records, valid_tfms) # Show an element of the train_ds with augmentation transformations applied samples = [train_ds[0] for _ in range(3)] show_samples(samples, ncols=3) model_type = models.retinanet backbone = model_type.backbones.resnet50_fpn(pretrained=True) model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map)) # Data Loaders train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=dl_worker, shuffle=True) valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=dl_worker, shuffle=False) # show batch model_type.show_batch(first(valid_dl), ncols=4) metrics = [COCOMetric(metric_type=COCOMetricType.bbox)] # Model adapter class LightModel(model_type.lightning.ModelAdapter): def configure_optimizers(self): return SGD(self.parameters(), lr=1e-4) light_model = LightModel(model, metrics=metrics) # Train if torch.cuda.is_available(): trainer = pl.Trainer(max_epochs=num_epochs, gpus=1) else: trainer = pl.Trainer(max_epochs=num_epochs) trainer.fit(light_model, train_dl, valid_dl) # Export to ONNX light_model.to_onnx("model.onnx", input_sample=torch.randn((1, 3, image_size, image_size)), opset_version=11) with open("class_map.json", "w") as file: json.dump(parser.class_map._class2id, file) # In icevision 0.8 the fn get_classes does not exist. Need to call the private fn
IceVision provides an awesome model adapter to convert the model into a PyTorch lightning model. PyTorch Lightning has a function called .to_onnx()
. To convert the IceVision model, you need to instantiate a PyTorch lightning model as a class by sub-classing the IceVision model adapter for PyTorch lightning. Then you need to define the optimizer in the configure_optimizer()
function and instantiate the model:
# Model adapter class LightModel(model_type.lightning.ModelAdapter): def configure_optimizers(self): return SGD(self.parameters(), lr=1e-4) light_model = LightModel(model, metrics=metrics)
After successful training, you can call the .to_onnx()
function of your PyTorch Lightning model. The function takes as first argument a path as a string, where the ONNX binary is stored. The second argument is an input example, which can be a simple random torch tensor. The last argument defines the opset version that should be used. The default value is 9, which is too low for some operations used in object detection models. However, if you use the default, ONNX returns a meaningful exception, that tells you to change the opset. The last thing we need to do is to export the class map to a JSON file so that we can later map the index returned by the network to the class name.
# Export to ONNX light_model.to_onnx("model.onnx", input_sample=torch.randn((1, 3, image_size, image_size)), opset_version=11) with open("class_map.json", "w") as file: json.dump(parser.class_map._class2id, file) # In icevision 0.8 the fn get_classes does not exist. Need to call the private fn
Writing the AWS Lambda
When you write a Lambda function you basically need only one file, which can have an arbitrary name. This file has to define a function, usually called event_handler()
. This function is called when the Lambda is invoked. Any code that is executed outside this function is considered as running in the cold start phase. After the event_handler
is called, it stays approximately 15 minutes warm, when no other invocation occurs.
Imports
The imports are pretty much standard except the aforementioned ONNX Runtime.
from typing import List, Tuple import boto3 from onnxruntime import InferenceSession from PIL import Image import base64 import numpy as np import os import json from io import BytesIO
Preprocessing
As mentioned before, even albumentations is too large for use in AWS Lambda. Thus, we need to implement the preprocessing in low-level NumPy. Fortunately, we do not need any complex transformations and ONNX expects a float32 NumPy array as input. These preprocessing steps are necessary:
- Resize and pad
- Normalize
- Convert to expected numpy array and dtype
The resize and pad function from IceVision, which is based on albumentations, is straightforward. It takes a PIL image as input and an integer as target images size. At first, we calculate the resize factor in the first line of the function. Subsequently, we need to determine the longest side of the image (either height or width). Next, we resize the longest side to the wanted image size. With this ratio, we can now resize the image accordingly with the built-in PIL.Image.resize
function. After the resize, we pad zeros to the shorter side of the image to match the quadratic shape defined by the target image size:
def resize_and_pad(img: Image.Image, image_size:int)->Tuple[np.ndarray, float, List[int]]: resize_factor = min(image_size/img.width, image_size/img.height) if img.width > img.height: resize_target = (image_size, int(img.height*resize_factor)) else: resize_target = (int(img.width*resize_factor), image_size) img = img.resize(resize_target, resample=0) if img.width == image_size: padded_pixel = image_size - img.height else: padded_pixel = image_size - img.width if padded_pixel%2 != 0: pad_1 = (padded_pixel-1)//2 pad_2 = pad_1+1 else: pad_1 = padded_pixel//2 pad_2 = pad_1 if img.width == image_size: img = np.array(img) img = np.pad(img, [[pad_1, pad_2], [0, 0], [0, 0]], constant_values=0) paddings = [pad_1+pad_2, 0] else: img = np.array(img) img = np.pad(img, [[0, 0], [pad_1, pad_2], [0, 0]], constant_values=0) paddings = [0, pad_1+pad_2] return img, resize_factor, paddings
The function returns a tuple containing the resized and padded image as NumPy array, the resize factor and the padding applied to the image. The latter two return values are used later for correcting the bounding boxes as they will be based on the resized and padded image.
The normalization is fairly easy, too. The function takes a NumPy array as input and normalizes the image with the de facto standard of Imagenet mean and std values.
def normalize(img: np.ndarray, mean:List[float]=[0.485, 0.456, 0.406], std: List[float]=[0.229, 0.224, 0.225], max_pixel_value:float = 255.)->np.ndarray: # Based on albumations normalize img = np.stack([ (img[:, :, 0]-mean[0]* max_pixel_value) /(std[0]* max_pixel_value), (img[:, :, 1]-mean[1]* max_pixel_value) /(std[1]* max_pixel_value), (img[:, :, 2]-mean[2]* max_pixel_value) /(std[2]* max_pixel_value), ], axis=-1) return img
The last preprocessing function prepares the image for the model input. At first, a new batch dimension is added to the tensor. After that, the tensor is transposed to comply with the torch image structure of BCHW. Finally, the image is converted to float32 as it is expected in our model input.
def convert_to_pytorch_input(img: np.ndarray)->np.ndarray: # Add batch dimension img = np.expand_dims(img, 0) # Transpose to pytroch format img = np.transpose(img, [0, 3, 1, 2]) return img.astype(np.float32)
Downloading the model
During the cold start phase of the Lambda function, we want to ready up all time-consuming operations. For small models, it is possible to add the model in the zipped Lambda package. In my opinion, it is preferable to store the model in an S3 bucket, because most of the models are too large anyway and it is easier to update an S3 object than a Lambda function.
The bucket name and the key of the model.onnx
are stored in the environment variables of the Lambda. We can use boto3 to perform the client.download_fileobj()
method. Lambda functions have a read-only file system, where only the /tmp
directory is writable. Thus, we save the model under /tmp/model.onnx. Besides the download of the model, we load the class map into RAM with the JSON library provided by Python.
def download_model()->Tuple[dict, os.PathLike]: s3_model_path = os.environ["BLOG_TUTORIAL_INFERENCE_BUCKET"] model_path = os.path.join("/tmp", os.path.basename(s3_model_path)) s3 = boto3.client('s3') with open(model_path, 'wb') as f: s3.download_fileobj(os.environ["BLOG_TUTORIAL_INFERENCE_MODEL"], s3_model_path, f) class_map = json.load(open("class_map.json")) return class_map, model_path
The function returns the class map, which is a dictionary, and the model path.
Post processing
The last function that is defined is the resize_bboxes
function, that calculates the bounding boxes for the original input image. As it receives the resize factor and padding, it is fairly easy. Just remove the padding and rescale the pixel position by the rescaling factor of the initial pad_and_resize()
method:
def resize_bboxes(resize_factor: float, paddings: List[int], bbox: List[int])-> List[int]: # Expects bbox in [x1, y1, x2, y2] # Outputs bbox in [x1, y1, x2, y2] bbox = [ (bbox[0]-paddings[0]/2)/resize_factor, (bbox[1]-paddings[1]/2)/resize_factor, (bbox[2]-paddings[0]/2)/resize_factor, (bbox[3]-paddings[1]/2)/resize_factor ] return bbox
Cold start setup
After we discussed every function in detail, we can now have a look at the cold start initialization. It is done by the following code:
# Setup model class_map, model_path = download_model() session_instance = InferenceSession(model_path) input_name = session_instance.get_inputs()[0].name label_names = [el.name for el in session_instance.get_outputs()] detection_threshold = float(os.environ.get("BLOG_TUTORIAL_DETECTION_THRESHOLD")) if os.environ.get("BLOG_TUTORIAL_DETECTION_THRESHOLD") else .5 input_image_size = int(os.environ["BLOG_TUTORIAL_INPUT_IMAGE_SIZE"])
Before we can execute inference, we need to download the object detection model to our AWS Lambda, which will be the most time-consuming part in the cold start. Then we create an InferenceSession
object from the ONNX Runtime library. It receives a path to an ONNX model and loads that model.
Based on the instantiated InferenceSession
object the input and output names can be retrieved by calling session_instance.get_inputs()
and session_instance.get_outputs()
, respectively. This function returns a list of onnxruntime.NodeArg
objects, which have an attribute called name. We need to provide these names when executing inference with this InferenceSessions
.
The last two lines retrieve the detection threshold and the image size from the environment variables.
The event handler
In this section, we go through the actual processing of the inference request. At first, we need to convert the image from base64 to bytes and read these bytes with PIL to decode the image:
# Load image img = base64.decodebytes(event["image"].encode()) img = Image.open(BytesIO(img)) print("Loading image finished")
After we create a PIL image, we can feed the data into our preprocessing function. Remember, we save the resize_factor
and padded_pixels
for a later bounding box correction.
# Preprocess img, resize_factor, padded_pixel = resize_and_pad(img, input_image_size) img = normalize(img) img = convert_to_pytorch_input(img) print("Preprocessing finished")
With the pre-processed image, we can now trigger the inference with the function InferenceSession.run()
. The first argument takes a list of strings with the desired output names. The second argument receives a dictionary, where the key is the name of the input node and the value of the actual NumPy array, which is the image used for inference.
result = session_instance.run(label_names, {input_name: img})
The InferenceSession.run()
method returns a Tuple, where each element corresponds to the specified output nodes. Our model returns three different arrays. The first array specifies all bounding boxes. The second one is the class id, and the third one is the score. As you can see, we use our resize_bboxes method to correct the bounding box and return each object to a dictionary.
result_payload = [] for idx in range(len(result[0])): if result[1][idx] >= .5: result_payload.append( { "bbox": resize_bboxes(resize_factor, padded_pixel, result[0][idx]), "score": result[1][idx], "label": class_map[result[2][idx]] } ) return json.dumps(result_payload).encode()
Deployment to AWS
Although the exciting part about the post might be the object detection inference, we will also have a quick look into the deployment. As mentioned earlier, I use Taskcat as an IaC tool. We only need two resources to run the Lambda. One of them is the Lambda function, and the second is an IAM role that allows the Lambda a) to execute and b) to access the model in the S3 bucket.
We have these parameters in the template:
Parameters: ServiceName: Type: String Default: FridgeObjectDetectors S3BucketLambda: # S3 bucket, where the lambda function is stored Type: String S3PathLambda: # S3 path where the lambda function is stored Type: String S3BucketModel: # S3 bucket, where the onnx model is stored Type: String S3PathModel: # S3 path where the onnx model is stored Type: String
The ServiceName
is used as a prefix for every human-readable name that you should provide. This is useful for quickly relating your resources to a project while browsing your AWS console. The four other parameters describe a bucket and key for storing the Lambda function and the bucket and key to inform the Lambda where your model is stored. Hypothetically you could also hard code the S3 path to the model. Still, the template will automatically build the ARN and add access rights for the model to the Lambda execution role.
The Lambda template looks as follows:
objectDetectionLambda: Type: AWS::Lambda::Function Properties: Code: S3Bucket: !Ref S3BucketLambda S3Key: !Sub ${S3PathLambda} Description: Inference of onnx models Environment: Variables: BLOG_TUTORIAL_INFERENCE_MODEL: !Ref S3BucketModel BLOG_TUTORIAL_INFERENCE_BUCKET: !Ref S3PathModel BLOG_TUTORIAL_DETECTION_THRESHOLD: 0.5 BLOG_TUTORIAL_INPUT_IMAGE_SIZE: 384 FunctionName: !Sub ObjectDetection${ServiceName} Handler: inference.event_handler MemorySize: 10240 PackageType: Zip Role: !GetAtt lambdaRole.Arn Runtime: python3.8 Timeout: 900 Tags: - Key: Name Value: !Sub ${ServiceName}ObjectDetectionLambda
These parameters for the Lambda are nothing special. We provide the model’s S3 path, detection threshold, and input image size via environment variables. The event handler is simply the name of the python file plus the name of the event handler function. We reference the execution role from the same template, which we will see shortly. We set the timeout of a single Lambda execution to 15 minutes, which should be lower in production. One crucial thing in this template is the memory size. The number of CPUs available for the Lambda function depends on the memory size allocated for the Lambda. With a maximum of 10 GB memory, we can have six vCPUs, the current maximum. Depending on your model, you could tune this parameter. However, the current setup needs around 2 seconds.
The second resource is the Lambda execution role. Your AWS Lambda needs to download the object detection model before executing inference. Again, this is standard, and I added only one thing to the primary example:
lambdaRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Statement: - Action: - sts:AssumeRole Effect: Allow Principal: Service: - lambda.amazonaws.com Version: 2012-10-17 ManagedPolicyArns: - arn:aws:iam::aws:policy/AWSLambdaExecute RoleName: !Sub LambdaRole${ServiceName} Policies: - PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: "s3:GetObject" Resource: !Sub arn:aws:s3:::${S3BucketModel}/${S3PathModel} PolicyName: !Sub ${ServiceName}AccessToModel Tags: - Key: Name Value: !Sub ${ServiceName}ObjectDetectionIamRole
I added a policy to this role, which allows an s3:GetObject
action to the model object in your S3 bucket. This complies with the grant least privilege principle of best practices for IAM. Besides that, the role needs the AWSLambdaExecute
role. If you want to run your Lambda in your VPC, you need to add the AWSLambdaVPCAccessExecutionRole
.
Outlook
I hope this tutorial was informative, or at least you could copy and paste some meaningful code samples out of it. Regarding the inference speed for object detection models, the speed can be improved with different infrastructures, but for an AWS Lambda, it is relatively good. Besides possible performance improvements to using GPU or more CPU cores, it is a cost-effective alternative for tasks where you can afford to wait 2 seconds for a response. Furthermore, it does not require specific knowledge about compilation frameworks like TVM. Just convert to ONNX, which is built-in in most DL frameworks, and execute it in the ONNX Runtime.
[…] Object detection inference with AWS Lambda and IceVision (PyTorch) […]