Meta’s Open Source AI Training Dataset Features 26,000 Videos

Meta has announced a new, open-source dataset for AI training that the company is hoping will reduce the kind of demographic bias that has been documented by researchers at the National Institute of Standards and Technology (NIST) and elsewhere.

The “Casual Conversations v2” dataset comprises over 26,000 video monologues depicting individuals from a number of countries: Brazil, India, Indonesia, Mexico, Vietnam, Philippines, and the United States. In the videos, the participants describe certain of their own demographic attributes – things like race, gender, and age – which can help AI systems to properly tag and interpret demographic data.

Importantly, in recruiting the 5,567 paid participants who recorded the videos, Meta asked them to explicitly consent to having their data collected and used for AI training, which should help the company to steer clear of lawsuits under some of the disparate biometric privacy regulations around the world.

Illinois’s Biometric Information Privacy Act (BIPA), for example, is notorious for its wide scope and harsh penalties, and led to legal trouble for Amazon and Microsoft over their use of IBM’s “Diversity in Faces Dataset”, which they leveraged in an effort to reduce the demographic bias of their own facial recognition systems. While the Diversity in Faces Dataset comprised images collected through the public photo-sharing platform Flickr, the companies nevertheless faced BIPA lawsuits thanks in part to their failures to obtain consent for the use of the photos’ biometric data.

Clearly establishing a consensual basis for the collection of subjects’ faces in the Casual Conversations v2 dataset is therefore an important step in helping to make third party organizations feel safe in their use of the dataset for AI training, and that, in turn, could lead to positive outcomes down the line in terms of reducing or eliminating the demographic bias that has tarnished the reputation of facial recognition technology in recent years.

The dataset’s inclusion of vocal sample could also help to alleviate the related but much less discussed issue of demographic bias in voice recognition. While the real world outcomes of this algorithmic bias are likely less impactful than those of facial recognition, which tends to be used in consequential law enforcement and government security applications, demographic disparities in voice recognition could potentially affect huge numbers of consumers interacting with digital devices’ voice interfaces.

The Casual Conversations v2 dataset is available through the Meta AI website.

Sources: Axios, Popular Science

–

March 14, 2023 – by Alex Perala

Related News

Partners

FaceTec’s patented, industry-leading 3D Face Verification and Reverification software anchors digital identity, creating a chain of trust from user onboarding to ongoing authentication on all modern smart devices and webcams. FaceTec’s 3D FaceMaps™ finally make trusted, remote identity verification possible. As the only technology backed by a persistent spoof bounty program and NIST/iBeta Certified Liveness Detection, FaceTec is the global standard for 3D Liveness and Face Matching with millions of users on six continents in financial services, border security, transportation, blockchain, e-voting, social networks, online dating and more. www.facetec.com

The Biometric Digital Identity Prism is a market landscape framework designed to help influencers and decision makers understand, innovate, and implement digital identity technologies and solutions. This innovative framework for understanding and evaluating the rapidly evolving biometric digital identity marketplace is the only market model that is truly biometric-centric based on the foundational conviction that in the age of digital transformation the only true, reliable link between humans and their digital data is biometrics. https://www.the-prism-project.com

Mobile ID World is here to bring you the latest in mobile authentication solutions and application providers. Our company is dedicated to providing users with the best content and cutting edge information on technology, news, and mobile solutions for your mobile identity management needs. https://mobileidworld.com

ID R&D combines extensive R&D capabilities with advances in AI to deliver superior biometrics and liveness detection software. Our products work across mobile, web, and telephone channels, as well as conversational interfaces, IoT devices, and embedded hardware to improve security and significantly reduce friction in the user experience. https://www.idrnd.ai

AuthenticID’s disruptive and cutting-edge, AI-driven solution quickly, accurately, and securely reproduces real-world identity verification so that companies can be assured of who they are conducting business with, strengthen underwriting, reduce the losses associated with fraud, and streamline onerous customer onboarding procedures, leading to higher conversion rates. https://www.authenticid.com/

Identity Week aims to be a significant identity industry catalyst. It’s our mission is to help accelerate the move towards a world where trusted identity solutions enable governments and commercial organisations to provide citizens, employees, customers and consumers with a multitude of opportunities to transact in a seamless, yet secure manner. All the while preventing the efforts of those intent on doing harm. https://identityweek.net/

Related News

Footer

Follow Us