Skip to content

React for computer vision

Sometimes I like to tinker a bit with the frontend part of development. Thus, I decided to write a computer vision app in React. Although there are great tools like Gradio or FiftyOne to visualize your work, there are no plug-and-play systems that can be presented to the customer. The app will not be perfect at all, but it may give you some hints on how such an app can be designed or you can use it as a starting point and adapt the code to your needs.

This post is a bit longer as it defines the basis of a React app for computer vision. You can find the complete code in this Repo and you can try the app here.

Introduction

I will explain the main building blocks of the app but not every detail. Thus, this post is not beginner-friendly. This post is more like a personal documentation for my side project and should provide one of many solutions for a given problem. If there are any questions about details or suggestions for improvement, please let me know.

Before we start, let me first introduce our frontend tech stack. My background is in Data Science, and I later learned some frontend. Thus I don’t like to operate on the DOM directly, but thankfully React abstracted web development to a declarative programming style. We will only use functional components and ES6 syntax. Furthermore, I am not a CSS wizard, so I’m happy with pre-styled components from Material-UI (MUI). We will also use TypeScript as it is more complicated initially but is handy when the application grows. Lastly, we will use Redux Toolkit as a state management tool. Once you learn it, it makes things a lot easier.

Although, most of the mobile deep learning applications use TensorflowJS, training computer vision models with Tensorflow is always a bit odd. For Pytroch exist many great libraries, like Icevision, which speed up the process enormously. Thus, we use ONNXruntime to execute models, which can be created from all state-of-the-art deep learning frameworks.

Setup

I will list all steps necessary for an ubuntu installation, but you will find plenty of material online for other operating systems. When you do not have already Node.js with Yarn as a package manager installed, you can do this in 3 simple steps:

# 1. Install node
cd ~
curl -sL https://deb.nodesource.com/setup_16.10.0 -o nodesource_setup.sh
sudo bash nodesource_setup.sh
sudo apt install nodejs
# 2. Install npm
sudo apt-get install npm
# 3. Install yarn
npm install --global yarn

Now that you have installed the necessary environment, we can use create-react-app to generate a good starting point for our React app for computer vision models. We will use a create-react-app template with Redux and Typescript:

npx create-react-app cv-app --template redux-typescript

And that’s it! Unbelievable easy to create a React app environment these days. Now, we need to install MUI, MUI icons, and uuid:

yarn add @mui/material @emotion/react @emotion/styled @mui/icons-material uuid
yarn add --dev @types/uuid

Uuid is just a tiny helper to create unique ids. The material stack has pre-styled and material-opinionated components, saving a lot of time.

Ok, now we have a working react app called cv-app on your computer. Open the top-level folder in your favorite IDE and let’s make some adjustments to the template: As I said, I’m happy with a completely declarative frontend, so we remove all CSS files, as we will add CSS with the style prop of JSX (I know not the best practice, but it is ok for the size of the app) or with the sx prop of MUI components. Furthermore, we can delete all the template code in App.tsx, the feature folder, and the references to the CSS files.

Then we add the <ThemeProvider> component from MUI as the top-level component for the <App> component. You can use the MUI theme builder to create your own theme with a good overview. I came up with this design:

import React from "react";
import { createTheme, ThemeProvider } from "@mui/material/styles";

import CustomImageList from "./components/CustomImageList";
import CustomAppbar from "./components/CustomAppbar";
import AddContentFab from "./components/AddContentFab";

export const theme = createTheme({
  palette: {
    mode: "dark",
    primary: {
      main: "#8a43c1",
    },
    secondary: {
      main: "#375ee8",
    },
    background: {
      default: "#353434",
      paper: "#525252",
    },
  },
  spacing: 8,
});

function App() {
  return (
    <ThemeProvider theme={theme}>
      <CustomAppbar />
      <CustomImageList />
      <AddContentFab />
    </ThemeProvider>
  );
}

export default App;

Inside the <ThemeProvider> component, we add our top-level components for our app. You can already see that we only use three components for our React app for computer vision. We will go into detail about these later. First, we need to have an idea about the visual appearance and the data types we will use.

Main Layout of our React app for computer vision

Main functionality for our React app for computer vision

Before we start to write some code, we need an idea of what our app should look like. The main feature is an image view, which shows an image and its labels (classification, bounding boxes, or polygons). We display this main component inside a Modal for a larger view and in a list as an overview. Afterward, we need a button or FAB (Floating Action Button) to add an image. At last, we create an App bar to improve the visual appearance. You can see the app in the video above. Thus, we have four components in total:

  1. <ImageCard>: Displays image and main actions of images of an image
  2. <ImageList>: Incorporates several instances of <ImageCards> in a list
  3. <AddContentFab>: Fixed button to add content
  4. <Appbar>: To make it look more Material-ish

Defining types

One major advantage of TypeScript is that you define your data types beforehand and stick to them throughout the project, which throws errors directly at compilation. Since we focus on images only with this app, our central object is called imageCardObject:

export interface imageCardObject {
  id: string;
  src: string;
  title: string;
  dateCreated: string;
  highlighted: boolean;
  annotations: Array<
    bboxAnnotationObject | classAnnotationObject | polygonAnnotationObject
  >;
}

It contains all relevant information about an image, like the source, the title, and the creation date. The src stores a data URL (base64). On top of that, it contains a key called annotation that holds class, bounding box, and polygon annotations.

The different annotation types are very similar. All contain a className, id, score, model and color key. Each annotation type has a unique key, like the box key that should be an array with [x1, y1, widht, height] and is defined by a type key. So typical data science stuff until here.

export interface baseAnnotationObject {
  className: string;
  color: string;
  id: string;
  score: number;
  model: string;
}

export interface classAnnotationObject extends baseAnnotationObject {
  type: "class";
}

export interface bboxAnnotationObject extends baseAnnotationObject {
  box: Array<number>;
  type: "bbox";
}

export interface polygonAnnotationObject extends baseAnnotationObject {
  polygon: Array<Array<number>>; // Array, that consists of Arrays with x, y points
  type: "polygon";
}

export type anyAnnoationObject =
  | bboxAnnotationObject
  | polygonAnnotationObject
  | bboxAnnotationObject;

Besides the different annotation types, we also create an anyAnnoationObject, a small wrapper to save some lines of code when we define a type, where any of the three annotation files are acceptable.

Setup redux for our datatypes

We define the redux state for images in imageStates.ts. Our main object is an array that contains instances of the imageCardObject. Our reducer includes three methods to add, update and remove a given imageCardObject. Here is the code:

import { createSlice } from "@reduxjs/toolkit";
import { imageCardObject } from "../util/types";
import {
  saveImageState,
  removeImageState,
  loadImageState,
  saveStateIds,
  loadStateIds,
} from "../util/localStateHandler";

export const imageSlice = createSlice({
  name: "images",
  initialState: {
    images: loadStateIds()
      .map((id: string) => loadImageState(id))
      .filter((el: imageCardObject | null) => el !== null),
  },
  reducers: {
    updateImage: (state, action) => {
      state.images = state.images.map((el: imageCardObject) =>
        el.id === action.payload.id ? action.payload : el
      );
      saveImageState(action.payload);
    },
    addImage: (state, action) => {
      state.images = [...state.images, action.payload];
      saveImageState(action.payload);
      saveStateIds(state.images.map((el: imageCardObject) => el.id));
    },
    removeImage: (state, action) => {
      state.images = [
        ...state.images.filter(
          (el: imageCardObject) => el.id !== action.payload.id
        ),
      ];
      removeImageState(action.payload.id);
      saveStateIds(state.images.map((el: imageCardObject) => el.id));
    },
  },
});

// Action creators are generated for each case reducer function
export const { addImage, updateImage, removeImage } = imageSlice.actions;

export default imageSlice.reducer;

For the add and update reducer, we expect an imageCardObject as payload. When updating an object, we use the unique id to compare each object with our payload object to exchange the new object with the old one. Adding images is easier because we add the payload to the old state and update it. To remove an image, we only need the id and the filter method and keep only objects that do not match the id.

As you can see, we import several functions from /util/localStateHandler.tsx and use them after each state update accordingly. This small util file saves the redux state to your local storage. However, this is not optimal because the state can be very large due to the whole image stored in it, and there are quotas for maximum item size for local storage. Thus, we implement a function to resize images that are too large later. Although loading the images to a server would be preferable, this is a topic for another blog post.

import { imageCardObject } from "./types";

const idName = "Ids";

export const saveStateIds = (event: string[]) => {
  localStorage.setItem(idName, JSON.stringify(event));
};

export const loadStateIds = (): string[] => {
  const ids = localStorage.getItem(idName);
  return ids !== null ? JSON.parse(ids) : [];
};

export const saveImageState = (event: imageCardObject) => {
  localStorage.setItem(event.id, JSON.stringify(event));
};

export const loadImageState = (id: string): imageCardObject => {
  return JSON.parse(String(localStorage.getItem(id)));
};

export const removeImageState = (id: string) => {
  localStorage.removeItem(id);
};

We use the local storage for two things:

  1. Save a list with all images to create the initial state
  2. Save/load/remove each imageCardObject to/from local storage

The first part is done by the functions loadStateIds and saveStateIds. For loading, we parse the stringified list and return it, and for saving, we stringify the list of IDs and save them to local storage.

For loading images, we parse the loaded imageCardObject and for saving, we stringify it. Furthermore, we write another function for deletion, which is also straightforward. Based on these functions, we can retrieve our initial state by loading all image IDs and mapping over them to load each imageCardObject.

images: loadStateIds()
      .map((id: string) => loadImageState(id))
      .filter((el: imageCardObject | null) => el !== null)

The Appbar

The Appbar for our react app for computer vision.
The appbar of our React app for computer vision

Let’s start with the Appbar, the most simple component . We can directly use MUI’s Appbar and add additional CSS with the sx prop. Let’s make the Appbar sticky. It follows you as you scroll down and always stays on the top of your viewed screen. To stick it on the top of the screen, we set thebottom prop to auto and the top prop to zero. Then, we need padding to enlarge the Appbar slightly for a nicer look. Lastly, we import the theme from our App.tsx to obtain our secondary color.

import AppBar from "@mui/material/AppBar";
import Typography from "@mui/material/Typography";
import { theme } from "../App";

export default function CustomAppbar() {
  return (
    <AppBarrom Material-UI.  I will also use TypeScripot as it is more complicated in the beginning, but 
      sx={{
        position: "sticky",
        bottom: "auto",
        top: 0,
        padding: 2,
        background: theme.palette.secondary.main,
      }}
    >
      <Typography variant="h5" component="div" sx={{ flexGrow: 1 }}>
        Welcome!
      </Typography>
    </AppBar>
  );
}

The Image card object for our React app for computer vision

Image of the <ImageCard> component
The <ImageCard> component, which displays images and their labels.

This is our main component with roughly 260 lines of code. We need this component to understand the two other components (AddContentFab and CustomImageList) as they are only simple wrappers around this one.

Imports

The first import lines are the basic MUI components we use throughout the component. The second part imports the specific functions and interfaces of our project. We import all annotation types as well as the imageCardObject type. Then we import a custom function called drawAnnotations, that we use to draw annotations on an HTML canvas. The theme is needed to access our color pattern of the app. useAppDispatch is our typescript wrapper around redux’s useDispatch, which is used to update a redux state. Lastly, we import some basic React hooks.

// MUI
import Card from "@mui/material/Card";
import CardHeader from "@mui/material/CardHeader";
import CardMedia from "@mui/material/CardMedia";
import CardContent from "@mui/material/CardContent";
import CardActions from "@mui/material/CardActions";
import IconButton from "@mui/material/IconButton";
import DeleteIcon from "@mui/icons-material/Delete";
import StarBorderIcon from "@mui/icons-material/StarBorder";
import StarIcon from "@mui/icons-material/Star";
import Chip from "@mui/material/Chip";
import CheckBoxOutlineBlankIcon from "@mui/icons-material/CheckBoxOutlineBlank";
import HexagonOutlinedIcon from "@mui/icons-material/HexagonOutlined";
import ImageOutlinedIcon from "@mui/icons-material/ImageOutlined";

// Own
import {
  anyAnnoationObject,
  bboxAnnotationObject,
  classAnnotationObject,
  imageCardObject,
  polygonAnnotationObject,
} from "../util/types";
import { drawAnnotations } from "../util/drawAnnotations";
import { theme } from "../App";
import { useAppDispatch } from "../app/hooks";
import { updateImage, removeImage } from "../app/imageState";
import InferenceMenu from "./InferenceMenu";

// React
import { useEffect, useRef, useState } from "react";

Input props

To mount our component, we need more information than just the basic props that are described by our imageCardObject. Therefore we extend the type with a width, height, showActions, and onClick attribute:

export interface imageCardProps extends imageCardObject {
  width: number;
  height: number;
  showActions?: boolean;
  onClick?: () => void;
}

export default function ImageCard({
  id,
  src,
  title,
  dateCreated,
  highlighted,
  annotations,
  width,
  height,
  showActions = true,
  onClick,

The width and height prop is used to create our image card with the correct dimension. onClick is optional and expects a function, which we execute, when a user clicks on the image. We use it later to enlarge the card. The prop showActions is enabled by default and determines if the inference button is visible to the user.

Hooks and variables

At first, we initialize our dispatch function. Then we define three different states for our component. The first one is called resizedSize and contains the size of the displayed image according to the defined height and width of the image card. Images come in many different sizes, and to fit them to the screen, we need to resize them. Therefore, we use an array with two entries, where the first one depicts the width and the second one the height.
The next one (showLabel) is an array that contains strings. We later implement a function where you can switch on or off the visibility of a label in the image. We initialize this state with all annotation ids. If an annotation id is not present in this array, we will display it on the image later.
The highlightedLabel state is either a string (an id) or null. We use this state to highlight a label when a user hovers with the mouse over an annotation.
Lastly, we need to initialize the Refs, that we use in this component. We split the image and the annotation into two HTML elements to avoid massive re-renderings. In this way, we only need to load the image once and can redraw the annotation updates and highlights separately.

  // States and refs
  const dispatch = useAppDispatch();
  const [resizeSize, setResizeSize] = useState([0, 0]);
  const [showLabel, setShowLabel] = useState<string[]>(
    annotations?.map((el) => {
      return el.id;
    })
  );
  const [highlightedLabel, setHighlightedLabel] = useState<string | null>(null);

  const annotationCanvasRef = useRef<HTMLCanvasElement>(
    document.createElement("canvas")
  );
  const imageRef = useRef<HTMLCanvasElement>(document.createElement("canvas"));

Redux update

The state update is straightforward for the <ImageCard> component. We wrap the update into the function updateInferenceResult(), which it takes two arguments:

  1. inferenceResult: an array of annotations, which we add to the image
  2. annotationIdToRemove: an id of an annotation that we delete from the image
const updateInferenceResult = (
    inferenceResult: anyAnnoationObject[],
    annotationIdToRemove: string | null = null
  ) => {
    var processedAnnotation = annotations;
    if (annotationIdToRemove) {
      processedAnnotation = annotations.filter(
        (el) => el.id !== annotationIdToRemove
      );
    }
    dispatch(
      updateImage({
        id: id,
        src: src,
        title: title,
        dateCreated: dateCreated,
        highlighted: highlighted,
        annotations: [...processedAnnotation, ...inferenceResult],
      })
    );
    setShowLabel([
      ...showLabel,
      ...inferenceResult.map((el: anyAnnoationObject) => el.id),
    ]);
  };

The first part checks if an annotation should be deleted and removes the annotation accordingly by filtering out the annotation by its id. Afterward, we dispatch the updateImage reducer with all the states inside the component. The only difference is that we merge the existing list of annotations with the new list of annotations: [...processedAnnotation, ...inferenceResult]. Afterward, we need to update the showLabel state to render new annotations immediately.

The rendering

<Card sx={{ margin: "auto", marginTop: 3 }}>
      <CardHeader
        sx={{ background: theme.palette.secondary.main }}
        action={
          <IconButton
            aria-label="settings"
            onClick={() => {
              console.log(`${id} is deleted`);
              dispatch(removeImage({ id: id }));
            }}
          >
            <DeleteIcon />
          </IconButton>
        }
        title={title}
        subheader={new Date(dateCreated).toDateString()}
      />
      <CardMedia>
        <div
          style={{
            width: resizeSize[0],
            height: resizeSize[1],
            position: "relative",
            marginRight: "auto",
            marginLeft: "auto",
            cursor: "pointer",
          }}
          onClick={onClick}
        >
          <canvas
            ref={imageRef}
            width={resizeSize[0]}
            height={resizeSize[1]}
            style={{ position: "absolute" }}
          />
          <canvas
            ref={annotationCanvasRef}
            style={{ position: "absolute" }}
            width={resizeSize[0]}
            height={resizeSize[1]}
          />
        </div>
      </CardMedia>
      <CardContent
        sx={{
          background: theme.palette.background.paper,
          flexWrap: "wrap",
          overflow: "auto",
          width: resizeSize[0],
        }}
      >
        {renderAnnotationChips(annotations)}
      </CardContent>
     {showActions && (
        <CardActions
          disableSpacing
          sx={{
            position: "relative",
            background: theme.palette.secondary.main,
          }}
        >
          <IconButton
            aria-label="add to favorites"
            sx={{ marginRight: "auto" }}
            onClick={() => {
              dispatch(
                updateImage({
                  id: id,
                  src: src,
                  title: title,
                  dateCreated: dateCreated,
                  highlighted: !highlighted,
                  annotations: annotations,
                })
              );
            }}
          >
            {highlighted ? <StarIcon /> : <StarBorderIcon />}
          </IconButton>

          <InferenceMenu src={src} updateAnnotation={updateInferenceResult} />
        </CardActions>
      )}
    </Card>

Our <ImageCard> returns an MUI Card, and most of it follows the tutorial from the official MUI page. The card header displays the name of the image as the title and the dateCreated as a subtitle. Furthermore, the action of the <CardHeader>is used to display a <DeleteIcon>, which dispatches the removeImage function.

An image and its annotations are displayed inside the <CardMedia> component. I mentioned earlier that we want to split the image and label canvas to redraw the annotations without changing the image. Furthermore, we want to resize the image, which we do by drawing the image onto an HTML canvas. Thus, both canvases (image and annotation) should overlay. To achieve this, we use a simple <div> as parent container, which has the expected image width and height of the image. We achieve this by setting width and height to resizeSize[0] and resizeSize[1], respectively. Then we can set the position of both canvases to absolute to orient each canvas at the top of the parent <div>.

Our <CardContent> is used to display the annotations. Furthermore, we want to highlight the bounding box if one hovers over the annotation. We wrap the described logic in a single function:

const renderAnnotationChips = (annotations: anyAnnoationObject[]) => {
    return annotations?.map(
      (
        el:
          | bboxAnnotationObject
          | polygonAnnotationObject
          | classAnnotationObject
      ) => {
        return (
          <Chip
            label={el.className}
            key={el.id}
            icon={
              el.type === "bbox" ? (
                <CheckBoxOutlineBlankIcon />
              ) : el.type === "polygon" ? (
                <HexagonOutlinedIcon />
              ) : (
                <ImageOutlinedIcon />
              )
            }
            sx={{
              marginLeft: "5px",
              marginRight: "5px",
              backgroundColor: showLabel.includes(el.id) ? el.color : null,
            }}
            variant={showLabel.includes(el.id) ? "filled" : "outlined"}
            // Functions
            onClick={() => {
              if (showLabel?.includes(el.id)) {
                setShowLabel(showLabel.filter((id) => el.id !== id));
              } else {
                setShowLabel([...showLabel, el.id]);
              }
            }}
            onDelete={() => updateInferenceResult([], el.id)}
            deleteIcon={<DeleteIcon />}
            onMouseOver={() => {
              setHighlightedLabel(el.id);
            }}
            onMouseOut={() => {
              setHighlightedLabel(null);
            }}
          />
        );
      }
    );
  };

The function expects an array of annotations. For each annotation, the function renders a <Chip>. Based on the annotation type, it renders a different icon. The label of the <Chip> is the class name. If one clicks on a <Chip> , the annotation should disappear/reappear on the canvas. Therefore, we remove/add the specific annotation id from the showLabels state. To highlight the label on hover, we set the highlightLabel state to the specific annotation id and remove it when the cursor moves out. With this <Chip> it is also possible to delete an annotation by clicking on the delete icon. Therefore, we set the onDelete and deleteIcon property of the <Chip>. The delete method is the updateInferenceResult method, called with the specific annotation id, which we already discussed.

We use the <CardActions> element to display a button for add-to-favorites and inference. To align them left and right, we set marginLeft and marginRight to auto, respectively. When clicking on the button that adds an image to your favorites, we inverse the highlighted state and dispatch the whole imageCardObject. Regarding the inference button, we will go into implementation details in the next post about the inference.

Draw computer vision annotations in React

Until now, we discussed all JSX components and HTML tags used for the <ImageCard> as well as the internal state and variables of the component. The only thing left is our interaction with the canvas. As mentioned before, we have one canvas for the image and a second canvas for the annotations.

Image canvas

Drawing images on a canvas is very basic. However, we want to resize the image to fit the card’s dimension. A canvas expects you to provide a start (in our case [0, 0]) and the maximum x and y value. To keep the number of redraws of the image as low as possible, we implement a useEffect hook, which listens on the height, widht and src prop of the component. Thus the image is only drawn if either the width or height changes or the image itself. Here is the code:

// Effects
useEffect(() => {
    const img = new Image();
    img.onload = (el: any) => {
      // Not optimal, however the solution does not work: https://www.kindacode.com/article/react-typescript-image-onload-onerror-events/
      let newResizeSize;

      newResizeSize = [
        el?.currentTarget?.width * ((height * 0.6) / el?.currentTarget?.height),
        height * 0.6,
      ];
      if (newResizeSize[0] > width) {
        newResizeSize = [
          width,
          el?.currentTarget?.height * (width / el?.currentTarget?.width),
        ];
      }

      setResizeSize(newResizeSize);
      // Draw on image canvas
      imageRef?.current
        .getContext("2d")
        ?.drawImage(img, 0, 0, newResizeSize[0], newResizeSize[1]);
    };
    img.src = src;
  }, [height, width, src]);

Regarding the workflow of the hook, we create a new HTMLImageElement , define an onload function that draws on the canvas and loads the image afterward by setting img.src. The onload function defines a variable called newResizeSize. This contains the maximum x and y values for the resized image. We calculate them by defining a target size (in our case height*0.6) and calculate the respective rescale factor for the other side (width). We do this twice in the process. First, by scaling the image to 60% of the <Card> height. Secondly, we check if the rescaled width is larger than the Cards width and if so, we resize the image to the width of the Card. After we determined the new image size, we changed its state (resizeSize) accordingly and use drawImage() from the canvas context to draw the image. We need the process to determine the maximum size of the drawn image while preserving the image’s aspect ratio.

Drawing annotations

We handle every redraw of annotations with a useEffect hook. It watches the annotation prop, the showLabel and highlighteLabel state as well as the resizeSize state. We need to observe the resizeSizebecause the dimensions ofannotationCanvasRef depend on it. We separated the drawing logic on the canvas in the ./utils/drawAnnotations file. Thus, the code in the useEffect function is small. It only checks whether the annotationCanvasRef is not null and if the resizeSize is already calculated and triggers the redraw accordingly:

useEffect(() => {
    if (resizeSize[0] !== 0 && annotationCanvasRef !== null) {
      drawAnnotations(
        annotations,
        annotationCanvasRef,
        resizeSize[0],
        resizeSize[1],
        highlightedLabel,
        showLabel
      );
    }
  }, [annotations, resizeSize, showLabel, highlightedLabel]);

So let us have a look on the drawAnnoations function:

import { RefObject } from "react";
import {
  anyAnnoationObject,
  bboxAnnotationObject,
  polygonAnnotationObject,
} from "./types";

export const drawAnnotations = (
  annotations: anyAnnoationObject[],
  canvas: RefObject<HTMLCanvasElement>,
  imgW: number,
  imgH: number,
  highlightAnnotationId: string | null,
  filter: string[]
) => {
  // Get context
  const ctx = canvas.current?.getContext("2d");
  if (ctx) {
    ctx.clearRect(0, 0, imgW, imgH);
    ctx.lineWidth = 2;

    annotations.forEach((el) => {
      if (filter.includes(el.id)) {
        ctx.strokeStyle = el.color;
        if (el.type === "bbox") {
          drawBbox(ctx, el, imgW, imgH, highlightAnnotationId);
        }
        if (el.type === "polygon") {
          drawPolygon(ctx, el, imgW, imgH, highlightAnnotationId);
        }
      }
    });
  }
};

Since we use basic HTML and JS in this function, we only need to import some types to comply with TypeScript. The main function drawAnnotations takes an array of annotations, the canvas Ref, the image dimensions, a highlighted label string, and a filter as input. At the start of the function, we retrieve the canvas’s context and clear the canvas. Afterward, we iterate over each annotation and execute the specific draw function for each annotation type (bbox or polygon), if the annotation.id is in the filter array. The draw functions are pretty similar and only differ by the method of the context we use:

const drawBbox = (
  ctx: CanvasRenderingContext2D,
  annotation: bboxAnnotationObject,
  imgW: number,
  imgH: number,
  highlightAnnotationId: string | null
) => {
  ctx.beginPath();

  ctx.rect(
    annotation.box[0] * imgW,
    annotation.box[1] * imgH,
    annotation.box[2] * imgW,
    -annotation.box[3] * imgH
  );
  if (highlightAnnotationId === annotation.id) {
    ctx.fillStyle = annotation.color + "4D";
    ctx.fill();
  }
  ctx.stroke();
};

const drawPolygon = (
  ctx: CanvasRenderingContext2D,
  annotation: polygonAnnotationObject,
  imgW: number,
  imgH: number,
  highlightAnnotationId: string | null
) => {
  ctx.beginPath();
  ctx.moveTo(annotation.polygon[0][0] * imgW, annotation.polygon[0][1] * imgH);
  annotation.polygon.forEach((el: number[], idx: number) => {
    if (idx === 0) {
      ctx.moveTo(el[0] * imgW, (1 - el[1]) * imgH);
    } else {
      ctx.lineTo(el[0] * imgW, (1 - el[1]) * imgH);
    }
  });
  if (highlightAnnotationId === annotation.id) {
    ctx.fillStyle = annotation.color + "4D";
    ctx.fill();
  }
  ctx.closePath();
  ctx.stroke();
};

At first, we execute ctx.beginPath to ensure that we start to draw a new object. Afterward, we draw the shape. For bounding boxes, we can use ctx.rect to draw a rectangle by defining x1, y1, width, and height. The ctx.rect function expects you to provide the starting point from the top-left point and specify the width and height of a rectangle. When you are familiar with python frameworks for computer vision, most of the frameworks expect you to specify the bottom left point as the starting point. We stick to the python convention to store annotations. By adding a minus to the height, JS allows us to reverse the height and draw from bottom to top. Thus we can specify the bottom left as starting point.

I like to store points in images like Fiftyone. They store points in relative values from 0 to 1. Thus, independent from the image size, which are not affected by resizing, if you keep the aspect ratio.

To draw a polygon, we iterate over each (x, y) coordinate and call ctx.lineTo, except for the first point, where we call ctx.moveTo to define a starting point.

After we draw the shape (bbox or polygon), we fill the shape, if the annoation.id is equal to the highlightedAnnoationId. Therefore, we add “4D” to the hex value of the annotation.color to set an alpha value of 30% to the color. Lastly, we call ctx.fill and ctx.stroke to apply the changes to the canvas.

An image list for our computer vision react app

Our React app for computer vision should display a list of <ImageCards> on the home screen. At first, I wanted to use <ImageList> from MUI. It is perfect for plain images but does not integrate with custom components, like <ImageCard>. Thus, we create a list with good old divs and CSS. The component is called <CustomImageList> and you can find in under ./components/CustomImageList.tsx:

import Modal from "@mui/material/Modal";
import Container from "@mui/material/Container";
import ImageCard, { imageCardProps } from "./ImageCard";
import { useAppSelector } from "../app/hooks";
import { imageCardObject } from "../util/types";
import { useState } from "react";

export default function CustomImageList() {
  const [modalCardId, setModalCardId] = useState<null | string>(null);
  const images = useAppSelector((state) => state.images.images);

  return (
    <>
      <div
        style={{
          display: "flex",
          flexWrap: "wrap",
          justifyContent: "center",
          justifyItems: "center",
          alignContent: "center",
        }}
      >
        {images.map((image: imageCardObject) => {
          const props: imageCardProps = {
            ...image,
            width: window.innerWidth * 0.49,
            height: window.innerHeight * 0.49,
            onClick: () => {
              setModalCardId(image.id);
            },
          };
          return (
            <ImageCard {...props} key={`${props.id}-image-card-in-list`} />
          );
        })}
      </div>
      <Modal open={modalCardId !== null} onClose={() => setModalCardId(null)}>
        <Container>
          {modalCardId !== null &&
            images
              .filter((el: imageCardObject) => el.id === modalCardId)
              .map((el: imageCardObject) => {
                return (
                  <ImageCard
                    {...{
                      ...el,
                      width: window.innerWidth * 0.8,
                      height: window.innerHeight * 0.8,
                    }}
                    key={`${el.id}-modal-card`}
                  />
                );
              })}
        </Container>
      </Modal>
    </>
  );
}

The component has two functions. The first one is to display all the <ImageCard>s and the second one is to enlarge a specific <ImageCard> if someone clicks on the image. We use a <div> and set its display to flex and flexWrap to wrap to create a list. Then we center all items and our <ImageCard>s appear in a list. We fetch the redux state for the images and iterate over each one to create an <ImageCard> component.

To enlarge the image, we use a <Modal> component from MUI. To trigger the modal, we use the modalCardId state that contains either null (modal not open) or a string, which should be an id of an imageCardObject. If it is set to an id of a card, it opens and shows the enlarged card. The card is wrapped inside an MUI <Container> to center it.

Create an entry point for images

Until now, we assumed that we have already images in our React app for computer vision. However, a user needs some functionality to insert images. Luckily, we can use our <ImageCard> component to visualize an image. Then, we add a save and discard button on the bottom, and we have a nice interface to insert images. We achieve this with this component under src/components/AddContentFab.tsx:

// MUI
import Fab from "@mui/material/Fab";
import CameraIcon from "@mui/icons-material/Camera";
import Modal from "@mui/material/Modal";
import Button from "@mui/material/Button";
import Grid from "@mui/material/Grid";

// Other
import { useRef, useState } from "react";
import { v4 as uuidv4 } from "uuid";
import ImageCard from "./ImageCard";
import { useAppDispatch } from "../app/hooks";
import { addImage } from "../app/imageState";
import { imageCardObject } from "../util/types";

export default function AddContentFab() {
  // States, Refs & vars
  const [takenImage, setTakenImage] = useState<null | string>(null);
  const [imageName, setImageName] = useState<null | string>(null);
  const [dateCreated, setDateCreated] = useState<null | string>(null);
  const inputFileRef = useRef<null | HTMLInputElement>(null);
  const dispatch = useAppDispatch();

  // Click functions
  const handleModalClose = () => {
    setImageName(null);
    setTakenImage(null);
  };

  const resizeImageToMaximum2Mb = (img: string) => {
    if (img.length > 1_000_000) {
      const rescaleRatio = 1_000_000 / img.length;
      console.log(`Rescale ratio: ${rescaleRatio}`);
      const tmp_img = new Image();
      tmp_img.onload = () => {
        const rescaledSize = [
          Math.floor(tmp_img.width * rescaleRatio),
          Math.floor(tmp_img.height * rescaleRatio),
        ];

        const canvas = document.createElement("canvas");
        canvas.width = rescaledSize[0];
        canvas.height = rescaledSize[1];
        const ctx = canvas.getContext("2d");
        ctx?.drawImage(tmp_img, 0, 0, rescaledSize[0], rescaledSize[1]);

        setTakenImage(canvas.toDataURL("image/jpeg", 1));
      };
      tmp_img.src = img;
    } else {
      setTakenImage(img);
    }
  };

  const saveImage = () => {
    const payload: imageCardObject = {
      id: uuidv4(),
      src: String(takenImage),
      title: String(imageName),
      dateCreated: String(dateCreated),
      highlighted: false,
      annotations: [],
    };
    dispatch(addImage(payload));
    handleModalClose();
  };

  return (
    <>
      <Fab
        sx={{ position: "fixed", bottom: 16, right: 16 }}
        onClick={() => {
          if (inputFileRef.current !== null) {
            inputFileRef.current.click();
          }
        }}
      >
        <CameraIcon />
      </Fab>
      <Modal open={takenImage !== null} onClose={handleModalClose}>
        <Grid container spacing={2}>
          <Grid item xs={12}>
            <ImageCard
              id={uuidv4()}
              src={takenImage !== null ? takenImage : ""}
              annotations={[]}
              width={window.innerWidth * 0.8}
              height={window.innerHeight * 0.8}
              title={imageName ? imageName : "Your new image"}
              dateCreated={String(dateCreated)}
              highlighted={false}
              showActions={false}
            />
          </Grid>
          <Grid
            item
            xs={6}
            sx={{
              display: "flex",
              justifyContent: "center",
              alignItems: "center",
            }}
          >
            <Button variant="contained" size="large" onClick={handleModalClose}>
              Discard
            </Button>
          </Grid>
          <Grid
            item
            xs={6}
            sx={{
              display: "flex",
              justifyContent: "center",
              alignItems: "center",
            }}
          >
            <Button variant="contained" size="large" onClick={saveImage}>
              Save
            </Button>
          </Grid>
        </Grid>
      </Modal>
      <input
        ref={inputFileRef}
        type="file"
        style={{ display: "none" }}
        accept="image/*"
        onChange={(inp) => {
          if (inp.currentTarget.files !== null) {
            const reader = new FileReader();
            reader.onload = (e) => {
              resizeImageToMaximum2Mb(String(e.target?.result));
            };
            setImageName(inp?.currentTarget.files[0].name);
            setDateCreated(String(new Date()));
            reader.readAsDataURL(inp.currentTarget.files[0]);
          }
        }}
      />
    </>
  );
}

As imports, we need some MUI components, our redux state, and our <ImageCard> component. Our state and variables describe, for the most part, the new image, like dateCreated, imageName, and the image itself: takenImage. These are all set when a user selects an image. Then, we have a Ref, that holds an HTMLInputElement. We need this later to open the browser input menu for images when the user clicks on the floating action button.

Before we get to the return statement, we define three functions. The first one, handleModalClose , resets the state of the image name and the image itself. We wrap this because we need to call this from several points in the component. We use the second function (saveImage) to save the image. It constructs an imageCardObject and dispatches it to the redux state. We need the last function to resize images if they are too large. Remember that we save all images to the local state. However, some browsers have a maximum quota for the local storage, mostly ranging between 2 – 10 MB. Since the local storage stores only UTF-16 encoded strings, each character needs two bytes. Thus, we set the maximum number of characters to one million, equal to 2 MB. The function resizeImageToMaximum2Mb checks if the image source is larger than one million characters and, if so, calculates the ratio to resize the string to one million characters by dividing one million by the current length of the image source. With this rescaleRatio the image is loaded and resized by drawing it on a canvas, which is not rendered in the DOM. Afterward, the canvas is converted as DataURL and saved to the internal takenImage state. This function allows users to take photos with their smartphone (usually more than 8 MB) and still use them inside the app.

Now let’s look at the rendered elements of the component. At first, we define our floating action button. We want to stick the action button on the bottom right side of the application. Therefore, we set the position to fixed and add some margin to the bottom and right. As an icon, we use the official CameraIcon from MUI. As onClick method, we trigger the click method of our inputRef mentioned earlier. The corresponding HTMLInputElement is defined at the end of the component and the display CSS argument is set to none, to make it invisible. The type is set to file and accept to image/*. With this setting, the user can only select one image from his device or take a picture if he uses a smartphone. Then, we define an onChange method for the input element that creates a new FileReader, which executes the resizeImageToMaximum1Mb function after loading to the corresponding DataURL of the image (base64 encoded). Afterward, we try to set the imageName and the dateCreated states.

If the state takenImage is not null, the <Modal> opens, and the image is displayed. The <Modal> incorporates an <ImageCard> and our save and discard buttons. To arrange these components, we use MUI’s <Grid> component. First, we create a grid with the container prop, and afterward, we create an item <Grid> to incorporate the <ImageCard> with the xs prop set to 12 to stretch it over the whole width. The buttons are both encapsulated in <Grid> (with the item prop set) with an xs value of 6. to place them in one row.

Outlook on our computer vision app in React

This concludes the first of two posts, where we build an app that executes computer vision models entirely on the client-side. Again you can find all the code here and visit the app under this link. I will add an inference option with ONNXruntime for JS for the next post. Stay tuned.

If you are interested in ONNXruntime in python, check out my other post about object detection in AWS Lambda.

Published inDeep LearningObject detectionONNXReactJS

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Consent Management Platform by Real Cookie Banner