A lot of the Web2 data (in the databases) is total trash given the high volume of worthless opinions, memes, shit-postings and such. They also focus on storing users behavioral, shopping and location data instead because that is what they can either directly sell or use for advertising purposes.
The focus should definitely be on creating useful data and useful databases... which contain actual useful information. As far as training data goes (for AI) most of the Web2 databases have nothing to offer.