A team of Facebook AI researchers has released a new Casual Conversations dataset that was put together to reduce bias in face and voice recognition systems. The dataset contains video recordings of 3,011 unique individuals, with an even distribution across gender, race, age, and lighting categories. The full dataset has 45,186 total videos, which corresponds to roughly 15 one-minute videos for each different subject.
According to the researchers, the dataset was built primarily to evaluate the robustness of other facial recognition systems. They noted that an algorithm that performs well with a biased set of images may not be as accurate when applied to the general population in a real-world setting. Casual Conversations attempts to prevent that, and has consequently been made available to the public for testing purposes.
To ensure accuracy, the paid actors in the Casual Conversations videos were asked to apply their own age and gender tags, while skin tone classifications were based on the Fitzpatrick scale. The former detail distinguishes Casual Conversations from other datasets that apply such tags after the fact, either manually or with a machine learning application. Such labels can be problematic because they can recreate and codify pre-existing biases if applied improperly.
Facebook is hoping that Casual Conversations will help guard against the growing threat of deepfakes. When tested with Casual Conversations, the company found that all five winners in its Deepfake Detection Challenge were better at spotting fakes of people with lighter skin than they were at identifying fakes of those with darker tones, which inevitably limits the utility of fraud prevention systems that rely on such technology.
Of course, Facebook is not the first company to emphasize the importance of representative datasets in AI development. AnyVision and Jumio have both warned about the dangers of algorithmic bias, while Onfido has already taken steps to retrain its own facial recognition tech. For their part, the Facebook researchers stressed that there is still room for improvement, and indicated that they are planning to expand Casual Conversations to include more categories (and in particular more gender categories) in the future.
April 26, 2021 – by Eric Weiss