There is a bill that was put forth in the United States Senate that, on the surface, looks like something positive. It is designed to protect artists and musicians from having their works used to train AI systems. Called the Content Origin Protection and Integrity from Edited and Deepfaked media Act (COPIED Act), it is also designed to help combat deepfakes.
The problem is we are dealing with the impotency of government. That will be the subject of an upcoming article. For nOW, we will focus upon the aim of the bill and how it will unfold if it becomes law.
We are also going to show how this makes the decentralized databases even more crucial.
Image generated by Ideogram
Big Tech Domination
The uproar over copyrighted material being used for AI training is just getting going. Most of the focus is upon OpenAi.
Have you noticed that Google, xAI, and Meta have not been involved in the discussion? Did you see how they are not being sued like OpenAI? This is a very important point.
Whatever one's feeling on OpenAI, it is another competitor in the expanding race. The problem with this bill is that, while it seeks to protect artists, it really provides cover for the major technology companies I just listed.
To be more specific, social media platforms stand to benefit. There is a reason the companies behind Facebook, X, and YouTube are not being targeted: they have a ton of data.
As is often the case, regulation protects the incumbents. This is a bill that could be called the Big Tech Database Protection Act. Those companies with huge databases under their control benefit.
Flipping it around, anyone without years of data that was provided by users are out of luck. We know start ups do not have the resources to buy the data. Even OpenAI would struggle to pay everyone that was used in its model.
Where Is It Located?
One of the issues is who owns the content? This is something the users of Web 2.0 still do not understand. Perhaps this will really drive home the point.
Before getting to that, here is what the bill seeks to accomplish:
The bill would require companies that develop AI tools to allow users to attach content provenance information to their content within two years. Content provenance information refers to machine-readable information that documents the origin of digital content, such as photos and news articles. According to the bill, works with content provenance information could not be used to train AI models or generate AI content.
The bill is designed to give content owners, such as journalists, newspapers, artists, songwriters and others the ability to protect their work, while also setting the terms of use for their content, including compensation. It also gives them the right to sue platforms that use their content without their permission or have tampered with content provenance information.
Again, on the surface it sounds good. There is a problem though.
Where are these songs, as an example, located?
Type in the name of your favorite artist into YouTube. It is likely you will find their music. Certainly, there are many accounts that post music that isn't theirs. However, it is a very good chance that the artist (band) has an account which uploads the music.
Generally the terms of service means that anything uploaded to the company's servers is their property. I am no lawyer but it seems the protection provided by copyright in this instance is perhaps broken. The artist gave permission to the company which, once again, is Big Tech.
Another problem is the Internet is the world's biggest copy machine. Content is reproduced all over the place. We see sites such as MSN that simply post the content under a different URL. While the source is clear, it is still a different site. Of course, copyright could still apply but we suddenly add another layer.
Things get murkier when we consider a lot of articles are summarized. There is software where we can take an article and it will provide a few hundred word summary. We also have to consider the "fair use" also means many parts of an article, over time, can be posted.
The point here is we are dealing with a lot of questions that are less clear than when we started this exploration.
The Need For Databases Like Hive
Hive is a decentralized text database.
It is built on blockchain, which means it is transparent and immutable. This is also permissionless. That is a characteristic which applies to not only those writing to the database but also anyone who seeks to utilize said data.
In other words, this is completely outside the realm of this bill.
Anyone is free to set up an API and pull whatever data is required.
If we step back, we can see how this can help to offset Big Tech domination. Start ups are not going to be able to enter this arena if this bill becomes law. Actually, this is the case due to technology as sites lock down the amount of data that can be pulled.
What does the world look like when our generative AI systems are basically build by Google, X, and Meta? Forget OpenAI, what about the hundreds of other companies that seek to enter the field? They are left out.
The democratization of data is one of the biggest benefits that a blockchain like Hive provides. In fact, the way things are going, it might be its most important use case. AI training is only going to become more prevalent. Big Tech has the advantage in that a large of amount of data is being added to their databases each day.
In other words, the rich get richer.
Going back to YouTube, we see this:
As of June 2022, more than 500 hours of video were uploaded to YouTube every minute. This equates to approximately 30,000 hours of newly uploaded content per hour.
Even though the stats are old, that is almost 750K hours of video upload per day. We see the same with text when we consider all the posting on X and Facebook. The same is true for photos that are uploaded to those companies servers.
There is a data race taking place and this fight is serious. This is at the core of much that will happen in the next decade.
Here is where something like Hive makes a difference. As more data is placed in the database, the use of it grows. Companies can build vector databases which are used in generative AI.
As we progress along this path and society struggles with the effects of this technological progression, something like Hive only becomes more important.
What Is Hive