Home News OpenAI wants to work with organizations to build new AI training data sets

OpenAI wants to work with organizations to build new AI training data sets

by WeeklyAINews
0 comment

It’s an open secret that the information units used to coach AI fashions are deeply flawed.

Picture corpora tends to be U.S.- and Western-centric, partly as a result of Western photos dominated the web when the information units had been compiled. And as most lately highlighted by a examine out of the Allen Institute for AI, the information used to coach giant language fashions like Meta’s Llama 2 incorporates poisonous language and biases.

Fashions amplify these flaws in dangerous methods. Now, OpenAI says that it needs to fight them by partnering with exterior establishments to create new, hopefully improved knowledge units.

OpenAI immediately introduced Information Partnerships, an effort to collaborate with third-party organizations to construct private and non-private knowledge units for AI mannequin coaching. In a blog post, OpenAI says Information Partnerships is meant to “allow extra organizations to assist steer the way forward for AI” and “profit from fashions which are extra helpful.”

“To finally make [AI] that’s secure and useful to all of humanity, we’d like AI fashions to deeply perceive all topic issues, industries, cultures and languages, which requires as broad a coaching knowledge set as attainable,” OpenAI writes. “Together with your content material could make AI fashions extra useful to you by rising their understanding of your area.”

As part of the Information Partnerships program, OpenAI says that it’ll accumulate “large-scale” knowledge units that “replicate human society” and that aren’t simply accessible on-line immediately. Whereas the corporate plans to work throughout a variety of modalities, together with photos, audio and video, it’s significantly searching for knowledge that “expresses human intention” (e.g. long-form writing or conversations) throughout completely different languages, matters and codecs.

See also  OpenAI launches an official ChatGPT app for iOS

OpenAI says it’ll work with organizations to digitize coaching knowledge if needed, utilizing a mixture of optical character recognition and automated speech recognition instruments and eradicating delicate or private data if needed.

At first, OpenAI’s seeking to create two forms of knowledge units: an open supply knowledge set that’d be public for anybody to make use of in AI mannequin coaching and a set of personal knowledge units for coaching proprietary AI fashions. The non-public units are supposed for organizations that want to maintain their knowledge non-public however need OpenAI’s fashions to have a greater understanding of their area, OpenAI says; to this point, OpenAI’s labored with the Icelandic Authorities and Miðeind ehf to enhance GPT-4’s capacity to talk Icelandic and with the Free Regulation Undertaking to enhance its fashions’ understanding of authorized paperwork.

“Total, we’re searching for companions who need to assist us educate AI to grasp our world with the intention to be maximally useful to everybody,” OpenAI writes.

So, can OpenAI do higher than the various data-set-building efforts that’ve come earlier than it? I’m not so positive — minimizing knowledge set bias is an issue that’s stumped many of the world’s experts. On the very least, I’d hope that the corporate’s clear in regards to the course of — and in regards to the challenges it inevitably encounters in creating these knowledge units.

Regardless of the weblog publish’s grandiose language, there additionally appears to be a transparent business motivation, right here, to enhance the efficiency of OpenAI’s fashions on the expense of others — and with out compensation to the information homeowners to talk of. I suppose that’s properly inside OpenAI’s proper. But it surely appears a bit tone deaf in mild of open letters and lawsuits from creatives alleging that OpenAI’s educated lots of its fashions on their work with out their permission or cost.

See also  Anonybit raises $3M to further build out biometric security

Source link

You may also like

logo

Welcome to our weekly AI News site, where we bring you the latest updates on artificial intelligence and its never-ending quest to take over the world! Yes, you heard it right – we’re not here to sugarcoat anything. Our tagline says it all: “because robots are taking over the world.”

Subscribe

Subscribe my Newsletter for new blog posts, tips & new photos. Let's stay updated!

© 2023 – All Right Reserved.