Skip to content

Object detection inference with AWS Lambda and IceVision (PyTorch)

Architecture for this post. We use Cloudformation and Taskcat to create our AWS resources. The Deep Learning part comes from Icevision (PyTorch Lightning model adapter) and everything in pure Python! With ONNX Runtime, we can execute object detection inference inside the python environment of AWS Lambda.

In this post, I implement an object detection inference running in AWS Lambda in a python environment without a docker image. Therefore we use IceVision (PyTorch) as Deep Learning framework.

Most of the deployment guides for Deep learning with AWS Lambda focus on large-scale applications with a lot of traffic for your algorithm. However, if you work on a hobby project or in a business environment, where you have long idle times and infrequent access, you usually do not want to run a server 24/7 and wait for an inference request. The best fit for these types of deployments is Serverless.

When you google “inference with AWS Lambda”, you will find some guides, but you will be a bit disappointed when you see that these guides talk about classic ML inference with scikit-learn. Everyone, who works with DL frameworks, knows that the setup is quite heavy, and regarding the quotas for Lambda functions, the solution is not entirely obvious.

I had the idea for this post from Francesco Pochettis’s blog post about deploying object detection with AWS Lambda. I recommend reading this article because it shows you how to integrate workloads in Lambda with significant dependencies.


The tutorial is relatively long, so I created a Repo, which you can download and run in 3 simple steps. All you need to do is to change the ./lambda_inference/src/lambda/.taskcat.yml and add an S3 bucket of yours in lines 5, 9, and 11. Furthermore, you may need to change the region in line 4. Then run these steps:

# Setup virtual environment and install dependencies

# [Optional] Train icevision tutorial model

# Test and deploy to your AWS account

If you want to deploy your model, add a class_map.json into ./lambda_inference/src/lambda/lambda_functions/source/object_detection/ folder and a model.onnx to the./lambda_inference/src/modelfolder. Then you can run the above commands without the training command, and it will deploy your model into your specified bucket. Now you have a running object detection inference with AWS Lambda.

Object detection inference in AWS Lambda

The naive approach for inference with AWS Lambda would be to add your favorite vision framework into the requiremetns.txt of the Lambda and add some inference code along with your exported model. However, you will quickly notice that these frameworks are too large for a single Lambda function. The quotas for a single Lambda function are listed in the table below.

For example, TensorFlow has a package size of 1.4 GB, PyTorch of 4.3 GB, and even albumentations, which uses cv2 under hood, lead to an unzipped size of roughly 500 MB. Thus, our main target should be to reduce the package size to under 50 MB zipped and under 250 MB unzipped.

Although AWS enabled Lambda functions as a docker container, which is a great feature, the container size is quite large (>10 GB with full IceVision install), leading to cold starts of roughly 60 seconds. Another way of using these frameworks in a Lambda function is to attach an EFS (Elastic File System) to the Lambda. However, this is more complex than only using a single Lambda function.

Invocation payload (request and response)6 MB (synchronous)
Deployment package (.zip file archive) size50 MB (zipped, for direct upload)
250 MB (unzipped)
Container image code package size10 GB
/tmp directory storage512 MB
AWS Lambda quotas for a single function

Another challenge can be the payload size when working with large images. In this tutorial, we will send images by converting the bytes to base64 (increases size around 30%) and then send the string in the payload. If your image is too large, you can directly retrieve an image from an S3 with your Lambda.

Our solution for the large package size is to use the IceVision model adapter to create a PyTorch Lightning model. From that, we can easily export our object detection model to ONNX, which can run inference in a lightweight environment with ONNX runtime in AWS Lambda. It is as easy as that!

The setup

For this tutorial, I expect you to have a working Python 3 installation on your system. I tested the code with the python:3.8 docker image. As infrastructure as code tool, I use Cloudformation from AWS, as it has the best state management, and you can deploy the tutorial more safely on your account. To quickly test your IaC I use TaskCat, which expects you to have credentials of your IAM access key and secret key in ~/.aws/credentials. Just call aws configure in your bash and add the credentials (needs to AWS CLI to work).

To avoid any impact on your current Python installation, you should install all dependencies in a virtual environment. Therefore, you can execute the ./ from the repository. It will create a virtual environment in the repository folder called tutorial_venv and install the requirements from the Lambda function and the training script. The code looks as follows:

set -e

# Get dir of script and move to this dir
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"

# Install venvs for lambda test and neural network training
python3 -m venv tutorial_venvrefore you can execute the ./ from the reporefore you can execute the ./ from the repo

install () {
    source tutorial_venv/bin/activate
    pip install -r src/lambda/lambda_functions/source/object_detection/requirements.txt
    pip install -r src/model/requirements.txt
    pip3 install taskcat


After the setup, you can activate the virtual environment by executing source tutorial_venv/bin/activate in your shell.

Creating an ONNX object detection model

First, let us train an object detection model. Therefore, we use the basic tutorial. I will post the complete training script here, but only explain important parts, that differ from the main tutorial:

import json
from icevision.all import *
from icevision.core import class_map
import torch

# Params
image_size = 384
dest_dir = "fridge"
num_epochs = 20
dl_worker = 0

# Download the dataset
url = ""
data_dir = icedata.load_data(url, dest_dir)

# Create the parser
parser = parsers.VOCBBoxParser(annotations_dir=data_dir / "odFridgeObjects/annotations", images_dir=data_dir / "odFridgeObjects/images")
# Parse annotations to create records
train_records, valid_records = parser.parse()

# Transforms
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

# Show an element of the train_ds with augmentation transformations applied
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)

model_type = models.retinanet
backbone = model_type.backbones.resnet50_fpn(pretrained=True)

model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map))

# Data Loaders
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=dl_worker, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=dl_worker, shuffle=False)

# show batch
model_type.show_batch(first(valid_dl), ncols=4)

metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]

# Model adapter
class LightModel(model_type.lightning.ModelAdapter):
    def configure_optimizers(self):
        return SGD(self.parameters(), lr=1e-4)
light_model = LightModel(model, metrics=metrics)

# Train
if torch.cuda.is_available():
    trainer = pl.Trainer(max_epochs=num_epochs, gpus=1)
    trainer = pl.Trainer(max_epochs=num_epochs), train_dl, valid_dl)

# Export to ONNX
light_model.to_onnx("model.onnx", input_sample=torch.randn((1, 3, image_size, image_size)), opset_version=11)
with open("class_map.json", "w") as file:
    json.dump(parser.class_map._class2id, file)  # In icevision 0.8 the fn get_classes does not exist. Need to call the private fn

IceVision provides an awesome model adapter to convert the model into a PyTorch lightning model. PyTorch Lightning has a function called .to_onnx(). To convert the IceVision model, you need to instantiate a PyTorch lightning model as a class by sub-classing the IceVision model adapter for PyTorch lightning. Then you need to define the optimizer in the configure_optimizer() function and instantiate the model:

# Model adapter
class LightModel(model_type.lightning.ModelAdapter):
    def configure_optimizers(self):
        return SGD(self.parameters(), lr=1e-4)

light_model = LightModel(model, metrics=metrics)

After successful training, you can call the .to_onnx() function of your PyTorch Lightning model. The function takes as first argument a path as a string, where the ONNX binary is stored. The second argument is an input example, which can be a simple random torch tensor. The last argument defines the opset version that should be used. The default value is 9, which is too low for some operations used in object detection models. However, if you use the default, ONNX returns a meaningful exception, that tells you to change the opset. The last thing we need to do is to export the class map to a JSON file so that we can later map the index returned by the network to the class name.

# Export to ONNX
light_model.to_onnx("model.onnx", input_sample=torch.randn((1, 3, image_size, image_size)), opset_version=11)
with open("class_map.json", "w") as file:
    json.dump(parser.class_map._class2id, file)  # In icevision 0.8 the fn get_classes does not exist. Need to call the private fn

Writing the AWS Lambda

When you write a Lambda function you basically need only one file, which can have an arbitrary name. This file has to define a function, usually called event_handler(). This function is called when the Lambda is invoked. Any code that is executed outside this function is considered as running in the cold start phase. After the event_handler is called, it stays approximately 15 minutes warm, when no other invocation occurs.


The imports are pretty much standard except the aforementioned ONNX Runtime.

from typing import List, Tuple
import boto3
from onnxruntime import InferenceSession
from PIL import Image
import base64 
import numpy as np
import os
import json
from io import BytesIO


As mentioned before, even albumentations is too large for use in AWS Lambda. Thus, we need to implement the preprocessing in low-level NumPy. Fortunately, we do not need any complex transformations and ONNX expects a float32 NumPy array as input. These preprocessing steps are necessary:

  1. Resize and pad
  2. Normalize
  3. Convert to expected numpy array and dtype

The resize and pad function from IceVision, which is based on albumentations, is straightforward. It takes a PIL image as input and an integer as target images size. At first, we calculate the resize factor in the first line of the function. Subsequently, we need to determine the longest side of the image (either height or width). Next, we resize the longest side to the wanted image size. With this ratio, we can now resize the image accordingly with the built-in PIL.Image.resize function. After the resize, we pad zeros to the shorter side of the image to match the quadratic shape defined by the target image size:

def resize_and_pad(img: Image.Image, image_size:int)->Tuple[np.ndarray, float, List[int]]:
    resize_factor = min(image_size/img.width, image_size/img.height)
    if img.width > img.height:
        resize_target = (image_size, int(img.height*resize_factor))
        resize_target = (int(img.width*resize_factor), image_size)
    img = img.resize(resize_target, resample=0)

    if img.width == image_size:
        padded_pixel = image_size - img.height            
        padded_pixel = image_size - img.width 
    if padded_pixel%2 != 0:
        pad_1 = (padded_pixel-1)//2
        pad_2 = pad_1+1
        pad_1 = padded_pixel//2
        pad_2 = pad_1
    if img.width == image_size:
        img = np.array(img)
        img = np.pad(img, [[pad_1, pad_2], [0, 0], [0, 0]], constant_values=0)
        paddings = [pad_1+pad_2, 0]
        img = np.array(img)
        img = np.pad(img, [[0, 0], [pad_1, pad_2], [0, 0]], constant_values=0)
        paddings = [0, pad_1+pad_2]

    return img, resize_factor, paddings

The function returns a tuple containing the resized and padded image as NumPy array, the resize factor and the padding applied to the image. The latter two return values are used later for correcting the bounding boxes as they will be based on the resized and padded image.

The normalization is fairly easy, too. The function takes a NumPy array as input and normalizes the image with the de facto standard of Imagenet mean and std values.

def normalize(img: np.ndarray, mean:List[float]=[0.485, 0.456, 0.406],
                std: List[float]=[0.229, 0.224, 0.225],
                max_pixel_value:float = 255.)->np.ndarray:
    # Based on albumations normalize
    img = np.stack([
                (img[:, :, 0]-mean[0]* max_pixel_value) /(std[0]* max_pixel_value),
                (img[:, :, 1]-mean[1]* max_pixel_value) /(std[1]* max_pixel_value),
                (img[:, :, 2]-mean[2]* max_pixel_value) /(std[2]* max_pixel_value),
            ], axis=-1)
    return img

The last preprocessing function prepares the image for the model input. At first, a new batch dimension is added to the tensor. After that, the tensor is transposed to comply with the torch image structure of BCHW. Finally, the image is converted to float32 as it is expected in our model input.

def convert_to_pytorch_input(img: np.ndarray)->np.ndarray:
    # Add batch dimension
    img = np.expand_dims(img, 0)
    # Transpose to pytroch format
    img = np.transpose(img, [0, 3, 1, 2])
    return img.astype(np.float32)

Downloading the model

During the cold start phase of the Lambda function, we want to ready up all time-consuming operations. For small models, it is possible to add the model in the zipped Lambda package. In my opinion, it is preferable to store the model in an S3 bucket, because most of the models are too large anyway and it is easier to update an S3 object than a Lambda function.

The bucket name and the key of the model.onnx are stored in the environment variables of the Lambda. We can use boto3 to perform the client.download_fileobj() method. Lambda functions have a read-only file system, where only the /tmp directory is writable. Thus, we save the model under /tmp/model.onnx. Besides the download of the model, we load the class map into RAM with the JSON library provided by Python.

def download_model()->Tuple[dict, os.PathLike]:

    s3_model_path = os.environ["BLOG_TUTORIAL_INFERENCE_BUCKET"]
    model_path = os.path.join("/tmp", os.path.basename(s3_model_path))

    s3 = boto3.client('s3')
    with open(model_path, 'wb') as f:
                            s3_model_path, f)
    class_map = json.load(open("class_map.json"))

    return class_map, model_path

The function returns the class map, which is a dictionary, and the model path.

Post processing

The last function that is defined is the resize_bboxes function, that calculates the bounding boxes for the original input image. As it receives the resize factor and padding, it is fairly easy. Just remove the padding and rescale the pixel position by the rescaling factor of the initial pad_and_resize() method:

def resize_bboxes(resize_factor: float, paddings: List[int], bbox: List[int])-> List[int]:
    # Expects bbox in [x1, y1, x2, y2]
    # Outputs bbox in [x1, y1, x2, y2]
    bbox =  [


    return bbox

Cold start setup

After we discussed every function in detail, we can now have a look at the cold start initialization. It is done by the following code:

# Setup model
class_map, model_path = download_model()

session_instance = InferenceSession(model_path)
input_name = session_instance.get_inputs()[0].name
label_names = [ for el in session_instance.get_outputs()]
detection_threshold = float(os.environ.get("BLOG_TUTORIAL_DETECTION_THRESHOLD")) if os.environ.get("BLOG_TUTORIAL_DETECTION_THRESHOLD") else .5
input_image_size = int(os.environ["BLOG_TUTORIAL_INPUT_IMAGE_SIZE"])

Before we can execute inference, we need to download the object detection model to our AWS Lambda, which will be the most time-consuming part in the cold start. Then we create an InferenceSession object from the ONNX Runtime library. It receives a path to an ONNX model and loads that model.

Based on the instantiated InferenceSession object the input and output names can be retrieved by calling session_instance.get_inputs() and session_instance.get_outputs(), respectively. This function returns a list of onnxruntime.NodeArg objects, which have an attribute called name. We need to provide these names when executing inference with this InferenceSessions.

The last two lines retrieve the detection threshold and the image size from the environment variables.

The event handler

In this section, we go through the actual processing of the inference request. At first, we need to convert the image from base64 to bytes and read these bytes with PIL to decode the image:

 # Load image
 img = base64.decodebytes(event["image"].encode())
 img =
 print("Loading image finished")

After we create a PIL image, we can feed the data into our preprocessing function. Remember, we save the resize_factor and padded_pixels for a later bounding box correction.

# Preprocess
img, resize_factor, padded_pixel = resize_and_pad(img, input_image_size)
img = normalize(img)
img = convert_to_pytorch_input(img)
print("Preprocessing finished")

With the pre-processed image, we can now trigger the inference with the function The first argument takes a list of strings with the desired output names. The second argument receives a dictionary, where the key is the name of the input node and the value of the actual NumPy array, which is the image used for inference.

result =, {input_name: img})

The method returns a Tuple, where each element corresponds to the specified output nodes. Our model returns three different arrays. The first array specifies all bounding boxes. The second one is the class id, and the third one is the score. As you can see, we use our resize_bboxes method to correct the bounding box and return each object to a dictionary.

result_payload = []

for idx in range(len(result[0])):
    if result[1][idx] >= .5:
            "bbox": resize_bboxes(resize_factor, padded_pixel, result[0][idx]),
            "score": result[1][idx],
            "label": class_map[result[2][idx]]
return json.dumps(result_payload).encode()

Deployment to AWS

Although the exciting part about the post might be the object detection inference, we will also have a quick look into the deployment. As mentioned earlier, I use Taskcat as an IaC tool. We only need two resources to run the Lambda. One of them is the Lambda function, and the second is an IAM role that allows the Lambda a) to execute and b) to access the model in the S3 bucket.

We have these parameters in the template:

    Type: String
    Default: FridgeObjectDetectors
  S3BucketLambda:  # S3 bucket, where the lambda function is stored
    Type: String
  S3PathLambda:  # S3 path where the lambda function is stored
    Type: String
  S3BucketModel:  # S3 bucket, where the onnx model is stored
    Type: String
  S3PathModel:  # S3 path where the onnx model is stored
    Type: String

The ServiceName is used as a prefix for every human-readable name that you should provide. This is useful for quickly relating your resources to a project while browsing your AWS console. The four other parameters describe a bucket and key for storing the Lambda function and the bucket and key to inform the Lambda where your model is stored. Hypothetically you could also hard code the S3 path to the model. Still, the template will automatically build the ARN and add access rights for the model to the Lambda execution role.

The Lambda template looks as follows:

    Type: AWS::Lambda::Function
        S3Bucket: !Ref S3BucketLambda
        S3Key: !Sub ${S3PathLambda}
      Description: Inference of onnx models
      FunctionName: !Sub ObjectDetection${ServiceName}
      Handler: inference.event_handler
      MemorySize: 10240
      PackageType: Zip
      Role: !GetAtt lambdaRole.Arn
      Runtime: python3.8
      Timeout: 900
        - Key: Name
          Value: !Sub ${ServiceName}ObjectDetectionLambda

These parameters for the Lambda are nothing special. We provide the model’s S3 path, detection threshold, and input image size via environment variables. The event handler is simply the name of the python file plus the name of the event handler function. We reference the execution role from the same template, which we will see shortly. We set the timeout of a single Lambda execution to 15 minutes, which should be lower in production. One crucial thing in this template is the memory size. The number of CPUs available for the Lambda function depends on the memory size allocated for the Lambda. With a maximum of 10 GB memory, we can have six vCPUs, the current maximum. Depending on your model, you could tune this parameter. However, the current setup needs around 2 seconds.

The second resource is the Lambda execution role. Your AWS Lambda needs to download the object detection model before executing inference. Again, this is standard, and I added only one thing to the primary example:

    Type: AWS::IAM::Role
          - Action:
            - sts:AssumeRole
            Effect: Allow
        Version: 2012-10-17
        - arn:aws:iam::aws:policy/AWSLambdaExecute
      RoleName: !Sub LambdaRole${ServiceName}
        - PolicyDocument:
            Version: "2012-10-17"
              - Effect: Allow
                Action: "s3:GetObject"
                Resource: !Sub arn:aws:s3:::${S3BucketModel}/${S3PathModel}
          PolicyName: !Sub ${ServiceName}AccessToModel
        - Key: Name
          Value: !Sub ${ServiceName}ObjectDetectionIamRole

I added a policy to this role, which allows an s3:GetObject action to the model object in your S3 bucket. This complies with the grant least privilege principle of best practices for IAM. Besides that, the role needs the AWSLambdaExecute role. If you want to run your Lambda in your VPC, you need to add the AWSLambdaVPCAccessExecutionRole.


I hope this tutorial was informative, or at least you could copy and paste some meaningful code samples out of it. Regarding the inference speed for object detection models, the speed can be improved with different infrastructures, but for an AWS Lambda, it is relatively good. Besides possible performance improvements to using GPU or more CPU cores, it is a cost-effective alternative for tasks where you can afford to wait 2 seconds for a response. Furthermore, it does not require specific knowledge about compilation frameworks like TVM. Just convert to ONNX, which is built-in in most DL frameworks, and execute it in the ONNX Runtime.

Published inAWSAWS LambdaDeep LearningIceVisionObject detection

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Consent Management Platform by Real Cookie Banner