AI-Fueled Disruption Inbound
Here's how to turn the AI revolution into a major win, for all of us.
A little over a year ago, I testified before the US Senate. Although I had testified in front of Congress before1, I was unprepared for the circus and showmanship of the Senate. Purportedly, I was invited to the hearing to discuss ways to curb networked corporations' growing power. Instead, the hearing was set up as a stage for well-known Senators to lambast Facebook and Google publicly. As a result, most of the hearing was an aggressive takedown of the two other witnesses by well-known Senators. The cross-examination, particularly by Senator Ted Cruz, of a visibly perspiring Facebook executive sent to testify on the company’s behalf, was particularly brutal.
Despite the circus, I took the opportunity to insert new ideas and concepts into the public conversation. One of the ideas I presented was data ownership. Here’s a summary of the thinking behind it:
Data (from pictures to videos to writing to location info) is being collected on all of us and stored online. Some of it is being put online voluntarily by individuals (Instagram to YouTube to Twitter). Some of it is being collected by systems that monitor online and offline behavior.
Big tech companies are using this data, unbeknownst to the people providing it, to create immense value by training AIs. Over the next decade, these AIs will become the most valuable technological systems we have ever built (worth countless trillions in the future). They also may become, if misapplied, the most dangerous technology we have ever built (tyranny in a box).
To ensure people have an ownership stake in this AI-fueled economy and to prevent tyranny, we must make it possible for the people, everybody online, who contributed the data needed to train these AIs to participate in the immense wealth generated by these AIs. The best way to accomplish this would be to enable people to exercise ownership rights over their online data (both the data they post and the data collected on them).
It was clear, at the time, that the Senators found this idea impossible to understand. The impasse was understandable. The concepts involved were too new, and they had little reason to understand them since AIs were still science fiction and worth very little economically.
A couple of months ago, that all changed. AIs became accurate, useful, and valuable.
Let’s dive into three consequences of this event:
Data Ownership?
The public release of ChatGPT has ignited an AI gold rush. A bubble of investment and activity that could match or exceed the Web in the late nineties. Suddenly, AIs were real and commanding billions in valuation and investment.
This upsurge in value suddenly made people interested in AIs, particularly companies with some claim on the data being used to train them. We immediately saw this when Elon Musk tweeted: "As I just learned that OpenAI (the company behind ChatGPT) had access to a Twitter database for training. I put that on pause for now." Not long after that, Musk shut down access to the Twitter API (a way for machines to directly access the Twitter database).
These actions make it clear that Twitter’s data and interaction stream (where bots interact with human beings to generate training data) played a significant role in training these new AIs. It also suggests that Twitter’s existing and future role as a training resource for these AIs is more valuable than the advertising platform that Musk paid for Twitter, but only if Twitter can find a way to exercise some claim to owning the data.
To understand why that’s going to be hard, let’s dive into Getty Images’ civil suit against Stability AI (the firm behind Stable Diffusion, the image-generating AI). Getty Images is a firm that aggregates copyrights on imagery, and it filed a suit against Stability AI, claiming it used 12 million copyrighted images to train Stable Diffusion. Stability AI’s defense is it exercised Fair Use of these images since it only looked at the images, and the data from the images themselves weren’t in the end product.
This defense gets to the heart of the problem and why we need to enable data ownership for everyone. These firms have found a way to extract immense value from simply;
Reading books (In 2002, Google scanned 25 million books in University libraries), Twitter/Reddit posts, Web sites, scientific papers, etc.
Seeing pictures and videos on Instagram, Search Engine Images, CCTV, etc.
Hearing people talk on YouTube, Tik Tok, podcasts, etc.
You can see the disconnect here from traditional notions of ownership. These AIs interact with copyrighted material the way a human being would. They observe, listen, and analyze it to learn. As a result, without added protections like data ownership, this disconnect likely makes it Fair Use.
However, this problem is bigger than the worries of firms with copyrights and other claims to data. It impacts all of us. These AIs won’t stop learning. Eventually, nearly everything we do (on or offline) will be translated into digital form and used to improve these AIs. Furthermore, these AIs will actively compete and undercut us for many of the jobs we currently do.
Hopefully, you can see the problem with this situation and why finding a way for people to exert ownership rights over their digital data is a good step toward an economy that makes it possible for everyone to prosper.
Mimic
Another problem is that these AIs simulate what they learn and can build on this mimicry in new ways. For example: