ML Assisted Annotation Powered by Sama's MICROMODEL Technology

With 80% of AI project time being spent training the large volume of data necessary to train a model, efficiency improvements early on in the process are sure to have compounding effects.

Your machine learning model is only as good as the data it’s trained on according to our partner for the Deep Learning Hybrid Summit, Sama.

At Sama, there is a dedicated Machine Learning team working at the forefront of AI research to identify optimization opportunities just like this, "so that we can develop advanced annotation tools to smooth the path to production for our clients". One of the papers the Sama team presented at last year’s CVPR—Human-Centric Efficiency Improvements in Image Annotation for Autonomous Driving—shared an approach to speeding up polygonal instance segmentation using ML.

Today, this technology has been incorporated into Sama's platform to make clients’ labeling process more efficient.

They call it ML Assisted Annotation powered by MICROMODEL technology, and it’s already helping Sama's clients predictably get higher quality training data in half the time.

Read on for an overview of ML Assisted Annotation powered by MICROMODEL technology how it can help you develop models that are more scalable, robust and accurate – and can be brought into production more quickly.

What is ML Assisted Annotation powered by MICROMODEL technology?

ML Assisted Annotation (MAA) powered by MICROMODEL technology is an architecture that allows Sama to expedite the labeling process by drawing from a library of models trained on specific use cases. MAA can be used to generate high-quality pre-labeled annotations, which annotators validate to help them continuously improve over time.

This powerful combination of skilled annotators and an AI-powered platform allows Sama to deliver a high standard of label quality to our customers every time, along with efficiency improvements and quicker time to market.

How it works

In order to understand how MAA works, we need to first discuss the DEXTR model. DEXTR, or “Deep Extreme Cut,” is a publicly available object segmentation model for images and videos.

Discover the DEXTR model and Sama's approach in detail in this post.

Many ML methods like DEXTR have been suggested to speed up the process of instance segmentation, but these are not typically tested in a high-scale production environment, nor are ML outputs easily edited by human annotators. This makes it difficult to confidently reach the label quality standards required to run a model in production.

MAA combines the well-known DEXTR approach with a raster-to-polygon algorithm to make results easily editable by a human in the loop. Sama found that this approach—which pairs skilled annotators with ML-powered automation—significantly increases labelling efficiency and quality.

Let’s see what that looks like in practice, using an example from the Autonomous Vehicle industry.

Machine-Assisted Polygon Annotation

When an annotator logs into the Sama annotation platform, they are presented with this workspace. In this example, the workspace is customized to allow the annotator to draw instance segmentation polygons around each of these vehicles:

You’ll notice that there are several vehicles in this image. In a manual context, it could take a human several hours to deliver high-quality annotations of every single vehicle:

What the manual annotation process would look like (sped up significantly): several clicks are required to draw a polygon around each of the vehicles.

This process is significantly accelerated with Machine-Assisted Polygon Annotation.

The model allows the annotator to use a crosshair tool to identify only four extreme points: left, right, top and bottom boundaries. These four clicks are the only inputs needed to create a heat map that is then sent to the inference server, returning an accurate prediction of a raster mask.

With Machine-Assisted Polygon Annotation, annotators only need to perform four clicks to produce an accurate raster mask prediction.

Machine-Assisted Polygon Editing
A polygon prediction can then be further refined by an annotator by switching into editing mode. This enables annotators to label precisely and ensure that high-quality requirements are met without compromise.

In this example, the raster mask prediction is edited by the annotator to ensure precise and high-quality labels.

This mode also enables annotators to use more than four extreme points in order to produce even more accurate predictions. A fifth user input point can easily be added, with the model immediately incorporating the new input to update its prediction.

If an ML model struggles to identify specific shapes, annotators can add a few more inference points to help result in a more accurate prediction, and then refine that prediction manually to ensure high-quality labels.

Results from ML Assisted Annotation powered by MICROMODEL technology

Sama's clients are already seeing impressive results from MAA powered by MICROMODEL technology:

Predictably producing 94-98% IOU (Intersection over Union) accuracy
Because the models are pre-trained on specific use cases for better performance out of the gate, clients are seeing a quicker time to accuracy.
2-4x more efficient annotation process
You can clearly see above that using MAA over a more manual polygon labeling approach results in significant time savings. But it’s also an iterative process with a human annotator in the loop; modifications to the predictions get fed back into the training data pipeline to retrain the model, enabling it to perform better predictions over time.
Quicker time to market
The end result for Sama's clients is faster iterations and a quicker time to market. A more efficient annotation process results in more data returned quickly, and ultimately a significantly shorter path to production.

What’s more: increasing the efficiency of this labor-intensive manual data annotation process reduces the barrier to entry for more ML teams... and not just those with large R&D budgets. Technology like this can also help democratize data labeling by driving down cost, so we can see even more deserving companies leverage AI to drive value for their business.

Small teams who are getting started with labelling may not have yet defined what type of annotations they need, or how much data they need to be successful. MAA can help them iterate more quickly, developing models in short increments rather than in large, cumbersome workstreams. The end result is a quicker time to value, and ultimately, to market — for organizations of all shapes and sizes.

Want to hear more about Sama and optimise your machine learning model or get started?

Join us virtually or in-person in San Francisco on 17-18 February.