It is amazing how things progress when one is following a story.
On a number of occasions we discussed the concept of democratized data. In fact, this is what I view to be Hive's number 1 use case at the moment.
People often say find a need and fill it.
The reality is that, in this era of AI and training large models, data is crucial. Naturally, people are quickly realizing the value of said data on their servers. Those who operate on the Internet are confronted with a situation where the ability to control it is taking on added meaning.
With so much money on the line, these entities are taking strides to lock things down.
While technological steps are one thing, it is something else when the law gets involved.
Image generated by Ideogram
Illegal Web Scraping Screams For Data Democratization
Here is a short video that discusses a case that went in front of a United States court that dealt with data scraping. The accused was found to be guilty. This was a corporate case but does bring up some interesting questions.
https://inleo.io/threads/view/taskmaster4450le/re-leothreads-2mo8o3c7p
The one element here is that we are looking at data extraction tied to fraud. Of course, in many parts of the world, there are laws against fraud, regardless of how the information used was acquired.
Leaving that aspect of this aside, what about the act of designing an automated agent to pull data off websites. What happens if they becomes illegal?
Certainly, this is something that will have to filter through the courts and every country will be different. However, we saw a move where developers are being held responsible for what their software does.
Most will remember the case of the Tornado Cash developer who got 64 months for money laundering. Basically he designed a privacy application that allowed for the swapping of cryptocurrency.
Thus, we cannot call it unreasonable to think that some governments will take such action. If that is the case, could developers be held responsible?
Democratized Data
The democratization of data solves this problem.
What this means is generating data that is placed in public databases, such as the Hive blockchain, where anyone is free to utilize it. Since nobody owns it, start ups can garner the data to train their models.
This is not the case with entities such as Reddit and X which are locking down their sites. The ability to scrape the Internet is diminishing.
We also have to factor in lawsuits.
OpenAi has been sued by a number of entities for training their models on data claimed to be under copyright laws. This is going to have to make it through the court system before we know where the rulings stand. Nevertheless, this company faces the potential in billions in verdicts.
It is obvious start ups cannot withstand this.
So what are they to do?
Actually, a better question is what are we going to do? Do we want a future where Big Tech is the only one with access to data? Is the idea of a handful of mega-corporations being the developers of these models appealing to people?
The answer to this question should dictate future behavior.
If one has no problem with this future, then feeding the massive beasts are no problem. Google, Amazon, X, and Meta will see their database grow on a daily basis, allowing them to feed increasing compute they acquire.
On the other hand, if one stands for decentralization and distribution, then these centralized entities are even less appealing.
Web 3.0 = Decentralization
It is no secret that a core tenet of Web 3.0 is the idea of decentralization.
Actually, we are looking at a technology that was brought about with the idea of democratized data from the start. The breakthrough of Bitcoin came from the ability to arrive at consensus without a centralized third party. This means that the ledger, i.e. database, was not under the control of a single entity.
Bitcoin's data, for the most part, is limited to financial transactions. Over the years, other databases are showing up that expanded upon this concept. Hive is an example of a permissionless text database.
We are now seeing this growing in imporance. Some like to cite how "data is the new oil". If that is the case, who is getting more oil is the question?
Is humanity well served by creating another cartel like we see with the physical commodity, only this time in the digital world?
Our success with cartels seems rather clear.
The foundation of the Internet is the database. Everything we do is tied to it. Without databases, we would have nothing on our screen. This applies whether we are discussing Web 2.0 or Web 3.0.
AI training is taking this to another level. We see the value grow, meaning these lead this large entities has keeps growing.
Permissionless databases hold the key to combating this. Even if the law starts to swing in the direction of holding developers responsible, democratized data makes it a meaningless point.
Posted Using InLeo Alpha