The Data Bottleneck Is Holding Back AI – Blockchain Can Break It

The Data Bottleneck Is Holding Back AI - Blockchain Can Break It

Table of Contents

Google’s latest decision Trim the search parameter num=100 From a Google search it may seem like a fairly innocuous change, but from the point of view of AI developers it was an overwhelming step. It severely hampered their ability to use the public Internet as a source of training data.

AI systems rely heavily on indexed search results, and by reducing visibility to just 10 results instead of 100, Google has achieved this. It is almost impossible For them to perform a deeper analysis of the web. It’s a decision that actually makes things worse Disabling deficiency High-quality data for AI training.

Developers struggle with limitations on the publicly available content they can access. This was highlighted in a 2024 research paper by Data Provenance, which Analysis of 14,000 websites To try to gain insight into the restrictions imposed to limit access to AI models. The authors concluded that over the past 12 months there has been a “rapid escalation of data caps,” with content creators effectively limiting the ability of algorithms to access their web pages.

Much of what remains of the public Internet has already been repeatedly deleted and fed into today’s best AI models anyway, and the lack of new data is forcing developers to rethink their AI training strategies. Some have turned to using extensive, domain-specific datasets and a more focused approach that involves training models to do one thing specifically, such as mathematics or image generation, rather than building large, general-purpose models. It’s a logical solution. If the datasets were smaller, developers wouldn’t need nearly as much.

Data codes are the solution

The question is, how do we get these narrow datasets and put them in the hands of developers? This is where blockchain technology comes in. It allows anyone to upload their data to a distributed network and create digital tokens that represent their ownership. Blockchain technology facilitates the seamless transfer of those tokens from creators to developers, and also supports “fractional” ownership, where dozens of researchers can purchase portions of tokens to access the data en masse, reducing their costs.

Symbolic datasets offer significant advantages. Not only are they easily divisible and tradable, they are transparent and publicly verifiable as well. Smart contracts can be used to create and enforce revenue streams for data creators, ensuring they receive fair compensation. Blockchain-based interactions are driven by supply and demand, which means that richer and more unique data sets will have more value. In other words, tokenized data can be turned into a new investable asset class that is driving a new wave of innovation in AI.

The beauty of this concept is that everyone can contribute. For example, an agricultural model designed to detect crop diseases could be trained on thousands of images provided by farmers taking pictures of diseased crops with their smartphones. Alternatively, healthcare organizations can donate anonymised images of medical examinations to support the development of diagnostic AI models.

Using blockchain, we can support two key functions that are essential for decentralized data markets to thrive. The value of data depends on its accuracy and reliability, and the transparency of blockchain technology enables anyone to verify its origins and quality. Users who provide a lot of high-quality data will increase their reputation over time because every data set they create can be traced back to them. Communities can play a role too, with individuals reviewing data sets for quality, and receiving rewards based on their honesty.

Cryptocurrency is borderless, which means that tokenized data can include contributors from anywhere in the world. This is not possible with fiat currencies, where high fees and lack of infrastructure make it impossible for many to participate. As long as someone has a smartphone connected to the Internet, they can send and receive micropayments instantly, without a bank account, giving everyone the opportunity to participate in the data economy. This means more diverse data sets originating from every corner of the world, reducing bias in AI outputs.

Rewarding the largest shareholders

Through cryptocurrency-based payments and guaranteed verification, we have the foundation to create thriving decentralized data markets that operate according to standard principles of supply and demand.

Consider the example of agricultural AI models that can detect crop diseases. A farmer from Malawi could play a role in developing the disease by uploading photos of an infected maize crop. Farm images will be tokenized and verified, contributing to a global AI data supply chain coordinated by cryptographic and community management protocols. The quality of those images will determine their value, and therefore the amount of rewards the farmer will receive. When the AI ​​models access those images to process the claim, the interaction will be recorded on the blockchain and the smart contracts will automatically send a small payment to the farmer. The more this model is used, the more queries the data set will have, increasing the rewards the farmer can earn.

For AI startups, this is beneficial because they won’t have to pay huge sums of cash upfront to access training data. Instead, they will pay as they grow, once revenues start flowing in. It is easy to envision how this ecosystem could expand organically over time. Bonuses for submitting data will attract contributors looking for income. They will compete to provide higher quality data, increasing the available volume. This will attract more data-hungry developers to increase the complexity of their models. As adoption of these models increases, the value flowing to data providers also increases, making them more profitable.

The data economy is for everyone

The AI ​​industry is growing like wildfire and the effects of the data shortage are already being felt, as websites limit access and content creators Launching lawsuits Against the aggressors as if there is no tomorrow. Developers are desperately looking for an alternative to web scraping, and decentralized data marketplaces are an attractive solution.

Blockchain technology likely won’t be the only solution to the AI ​​data dilemma. There is merit to other ideas around synthetic data and the creation of data federations, where companies share private data with peers with strict limits on how they can use it. But decentralized data economies are perhaps the most romantic and practical, with their ability to leverage existing infrastructure and incentivize everyone to participate in the AI ​​revolution.

Our offer on Sallar Marketplace