Data Labelling: The Power Behind Artificial Intelligence

AI and machine learning are changing the world, but it can only do that with large amounts of data it can understand. For that you need effective data labelling

Much has been written about the exciting potential of AI, but beneath the surface is the more mundane but equally important world of data labelling and annotation. These are the worker bees of the AI revolution laboriously labelling data to help AI unleash its full potential.

The AI revolution

Today, AI is being used to power self-drive vehicles, to detect tumours, review legal contracts or extract trends from annual accounts. Beneath all that is a hidden workforce of data labellers who train the models which allows AI to do its job.

At its simplest, data labelling involves annotating shapes in an image or entries in data so AI algorithms can make sense of it. It’s something many of us do when tagging friends in images on Facebook. The label renders that image readable by an AI system.

Recently it has become much more complex and sophisticated. Today it is helping computers understand complex shapes and patterns to drive ever increasing sophistication in AI capabilities.

Here are some of the ways it may be used:

  • 2D boundary boxing: A boundary box is placed around an image and labelled. For example, if you’re using a street view image, you might want to label vehicles, people and individual buildings.
  • Traffic sign annotation: Machine generated detection of sign images takes what would otherwise be a meaningless shape to a machine and allows it to attribute a value.
  • Image classification: Shapes in images need to be classified. This helps the algorithms understand if a shape is human or not. For example, it might be used in car cameras to issue an alert to a driver if a person strays into the road.
  • Polygon annotation: Complex shapes can’t be annotated with simple boxes. This allows annotators to plot points on vertexes of each targeted object.
  • Semantic segmentation: This is much more precise and allows annotators to classify everything within an image.
  • Lines and splines: As the name implies, annotators simply draw lines along the objects you need to be recognised.

Demand is surging. In a few years it has gone from a relatively small operational requirement to a thriving billion-dollar industry. It’s one in which accuracy and reliability are paramount.

One of the most challenging areas is in self driving. This requires vast amounts of image annotation, but also an extremely high degree of accuracy. A small error here or there when processing text messages might lead to embarrassment, but a single error with an image used for a self-driving system could be catastrophic.

From crowdsourcing to specialist outsourcing

Getting that accuracy is extremely laborious and time consuming and requires high levels of expertise. The challenge is twofold: first, large amounts of data have to be processed quickly and that data needs to be highly accurate.

Labellers will need to be able to process tens of thousands of images and data sets to a degree of accuracy greater than 95%.

One option has been crowd sourced data labelling operations which use data labellers from all around the world to process images. Some claim to have thousands of crowd sourced data labelling specialists from all over the world capable of handing millions of images in a day.

This has a problem. Although these companies make bold claims about quality, it is difficult to guarantee. Quality control is vital which is why some projects are increasingly turning to specialist outsourcing companies.

These third parties offer the quality and reliability crowdsourcing cannot. They provide the expertise and scale required to manage the volume of highly accurate data labelling and annotation services required to power AI.

Quantinite, for example, uses a dedicated team of annotation professionals who can either work as an extension of a company’s inhouse team or provide the entire service. They can plug directly into a client’s back office, use existing software or customise their own to deliver a complete tailored solution.

This is the future for data annotation. It offers superior scale than inhouse operations and greater accuracy than crowdsourcing. It’s fast, accurate and provides 24/7 coverages 365 days of the year. For many companies, services such as this open the door to a world of possibilities AI can bring. 

Share on facebook
Share on google
Share on twitter
Share on linkedin