Microsoft is launching a research project to estimate the impact of specific training examples on text, images, and other types of media created by generating AI models.
That’s it Each work list It dates back to the recent December recirculation on LinkedIn.
According to the list of interns seeking research, the project will attempt to demonstrate that the model can be accepted to enable training to make specific data, such as photos and books, on their output.
“The current neural network architecture is opaque in providing a source for generations, and has […] The list reads.[One is,] Rewards, recognition and potential payments are for those who contribute certain valuable data to make the multiple models we want in the future unforeseeable, assuming that the future will surprise us fundamentally. ”
AI-driven text, code, images, video and song generators are Many IP lawsuits Fight against AI companies. These companies usually train their own models based on a large amount of data from public websites, some of which are copyrighted. Many companies believe Theory of rational use Block their data splicing and training practices. But creatives from artists to programmers to authors disagree.
Microsoft itself faces at least two legal challenges from copyright holders.
The New York Times Sue this tech giant And sometime in December, collaborator Openai accused the two companies of infringing on the copyright of The Times by deploying models that train millions of articles. Several software developers There was also a lawsuit against Microsoft, claiming that the company’s Github copy AI coding assistant was illegally trained.
Microsoft’s new research work, the list describes it as “the source of training time”, It is said that Jaron Lanier is involved. Mature technologists and interdisciplinary scientists Research at Microsoft. In April 2023 In New YorkLanier wrote the concept of “data dignity,” which for him meant linking “digital content” to “people who want to be famous for it.”
“When large models provide valuable output, a method of data dignity will track the most unique and influential contributors,” Lanier wrote. For example, if you ask a model for an animated film about adventure cats, my kids talk to cats in adventure, then some major oil painters, cat portraitists, voice actors and writers or writers – or their property – may be calculated to be crucial to the creation of new secrets.
Several companies have tried this. AI model developer Bria recently raised $40 million in venture capital, claiming to compensate data owners “programmatically” based on its “overall impact.” Adobe and Shutterstock also grant regular spending to dataset contributors, although the exact payment amount is often opaque.
Few large labs have established individual contributor payment plans outside of signing licensing agreements with publishers, platforms and data brokers. Instead, they provide copyright owners with means of “opt-out” training. However, some of these exit processes are heavy and only apply to future models, not previously trained models.
Of course, Microsoft’s project may be more than just a proof of concept. There are precedents. Back to possibleOpenai said it is developing similar technologies that will allow creators to specify how they want their work to be included in or excluded from training data. But nearly a year later, the tool has not seen the light of a day, and it often Not considered a priority within.
Microsoft may also be trying “Ethics“Here-or decides on regulation and/or court decisions about disruption of its AI business.
But given the recent stance on fair use by other AI labs, the company is looking at ways to track training data. Several top labs, including Google and Openai, have been released Recommended policy documents The Trump administration weakens copyright protection when it comes to artificial intelligence development. Openai has Clearly call on the U.S. government To codify the rational use of model training, it believes that this will lift developers out of heavy constraints.
Microsoft did not immediately respond to a request for comment.