IBM Developing Huge Public Dataset to Help Eliminate Bias from Facial Recognition

“IBM is going to release a dataset of 36,000 facial images that will be equally distributed across a range of ethnicities, genders, and ages.”

IBM Developing Huge Public Dataset to Help Eliminate Bias from Facial Recognition

As the competition to develop sophisticated facial recognition systems continues to escalate, IBM is preparing to make a decidedly counterintuitive move, with plans to release a huge dataset for training facial recognition systems – even those of its competitors.

While IBM has a business interest in promoting its own Watson Visual Recognition service – an AI platform sporting its own facial recognition capability – the company wants to raise the state of the art more broadly. To that end, it’s building a face database of over a million images for training purposes, sourced from Flickr and annotated with attribute and identity tags by IBM Research scientists, available for any third party to use in the training and development of its own facial recognition system.

Perhaps more importantly, IBM is going to release a dataset of 36,000 facial images that will be equally distributed across a range of ethnicities, genders, and ages. This will primarily be for evaluation purposes, a tool to help developers eliminate bias from their facial recognition systems.

An AI system trained primarily on images of one particular ethnic group or gender will be less accurate in identifying individuals from other groups; and indeed, some research has found such biases in current facial recognition systems. This has proven to be a real sticking point in ongoing controversies over public deployments of facial recognition systems; privacy advocates protesting Amazon’s sale of its own facial recognition technology to police agencies and other government services, for example, have specifically argued that such deployments could result in real world racial discrimination in policing, among other issues.

In a blog post announcing its plans, IBM explained that “AI holds significant power to improve the way we live and work, but only if AI systems are developed and trained responsibly, and produce outcomes we trust,” adding, “Making sure that the system is trained on balanced data, and rid of biases is critical to achieving such trust.”

In other words, the company believes that it’s to everyone’s advantage – including its own – that demographic biases is eliminated from the field of AI and facial recognition as a whole, and that’s why it’s planning to make these datasets public.

IBM says it will make these datasets available this autumn.

Sources: IBM, Axios

June 27, 2018 – by Alex Perala