By clicking “Accept All Cookies”, you agree to the storing of first-party and third-party cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Cookie Policy for more information.

Is Data the New Holy Grail for Platforms?

Aurore Falque-Pierrotin
April 21, 2021

The more people search online, the more data they generate, the more Google can improve their algo performance and personalise the search experience, making their engine in turn more valuable to each user. Pretty amazing uh?

This is called “data network effects (nfx)”. Within a data network, they theoretically occur when every node in the network feeds useful data back to the central database. As the aggregated data accretes, the value of the service provided to each individual node grows consequently. Think of Waze where users of the app feed back real time data (e.g. works signalisation, traffic bottlenecks etc.), hence increasing the utility of the app for the other users.

That being said, we previously stated that data network effects — although a powerful potential source of defensibility — typically tend to be weaker than many people often want to believe and might take time to materialise. Indeed, there are a couple of shortcomings to their beautiful theoretical existence.

Are data network effects a myth?

First, the value of incremental data typically goes down. At first, your machine learning algorithm learns very quickly, but you can quickly get to an asymptote, with new data being added to the model not translating into the same level of accuracy improvement. Second, the “cost of data” can go up significantly: it can gradually become more expensive & time consuming to collect useful data and to maintain the data infrastructure (i.e. growing computational costs, extensive data-points labelling, increasing security costs etc.). Third, often the relationship between product usage and the amount of useful new data gathered can be asymmetrical: on TripAdvisor, not everyone is a contributor; most users passively use the platform. And finally, data network effects are only one type of defensibility one can build with data: they are often confused with data advantages that can come from mere scale; data “embedding” can also ensure product stickiness through high switching costs.

In the end, can data really generate a strong moat? Are data nfx a sustainable source of competitive advantage for your company?

It appears that it could be the case under very specific preconditions defined below in a 7-points framework.

The Data-nfx 7 points-framework
  • 1. The product/service value should increase with more incremental data.

Short and positive feedback loops should be built into the product to ensure that the new data generated is quickly incorporated back into the model. And product and/or services’ value increases automatically as more data is added.

  • 2. Higher product/service usage generates more useful data.

Ideally data production is native, with proprietary data generation (vs. inputing third party data into the model); and the new data produced is of high quality.

  • 3. The value created by the data is core to the model’s value proposition & to the way the product/service benefits the user.

Data needs to be central to the way the product benefits its users. For instance, recommendation engines using data to push personalised content are typically only a feature of product or service and not its core: the data nfx of a Netflix or of a Spotify tend to be weaker than advertised. At the end of the day, the main value of such platforms lies in the content itself.

  • 4. There is no asymptotic value of additional data.

This is typically the case if the service provided is real-time or dynamic for instance: highly dynamic datasets entail perpetual learning curves. It could be fine to have an asymptote, yet if it’s the case the threshold at which this asymptote occurs need to be super high.

  • 5. There are no asymmetries in data contribution by members of the network.

All nodes need to contribute to the data network and produce useful data. For instance, the ML model behind Gboard is fed by anonymised typing data coming from all Android devices.

  • 6. The data production/collection process is scalable and cost effective.

Ideally data capture and data labelling can be automated to keep the cost of new data production & incorporation low.

  • 7. The “minimally viable corpus” (i.e. min. amount of data one required to start training the model) needs to be high.

The threshold for the amount of data required before the product starts providing value needs to be high as to provide scale defensibility against competitors entering the market. For instance, one of the key strengths of the research platform Owkin is their proprietary network of hospitals and clinics, providing them with large, real-world health datasets to fuel their disease detection models. This is very linked to the marketplace infamous chicken & egg challenge.

Final considerations

Under those 7 conditions, data network effects could potentially be an interesting source of defensibility. At Samaipata, we invest in digital platforms displaying increasing returns, and we believe that data nfx are becoming a possibility for a much broader group of companies, even at early stages. This is allowed by recent advances in technologies such as i) Big Data tools allowing for cheaper and faster infrastructure to process large amounts of data, ii) advances in Machine Learning / Deep learning and an increasing number of off-the-shelf tools and algorithms to automatically analyse and learn from those large amount of data, and finally iii) the rise of Cloud computing to be able to process it. The next generation of engineers increasingly moving from R&D-types of projects to business cases is also contributing to the market momentum as tech talent is fuelling into the startup scene.

**

At Samaipata, we are always looking for ways to improve. Do not hesitate to send us your thoughts. We strive to partner with early-stage founders and to support them in taking their business to the next level. Check out more ways in which we can help here or for all our other content here

And as always, if you’re a European digital business founder looking for Seed funding, please send us your deck here or subscribe to our Quarterly updates here.

Latest News

See also

More insights to better the world through technology

The Founders Q&A

The Founders Q&A

As we strive for transparency at Samaipata, we want to shed some light on our investment thesis and on some of the key questions entrepreneurs often ask us. If the answers laid out below fit with what you’re looking for from a potential VC investor, we would like to hear from you! We review all opportunities sent to us.
Read more
AI is shaking up the software market, for the better

AI is shaking up the software market, for the better

The software industry will have to bring the power of AI to the broader economy and face a collective innovators dilemma in doing so. Traditional SaaS solutions were originally designed to simplify specific tasks within job descriptions and boost overall productivity. They serve as tools that augment human work but ultimately require a workforce to operate them.
Read more
The Hive Summit 2024: Samaipata Founders' Retreat

The Hive Summit 2024: Samaipata Founders' Retreat

Every year at Samaipata, we host our Founders' Retreat—The Hive Summit—under our Founders’ Success Platform, "The Hive." Celebrating its second edition this year in sunny Mallorca, the Summit brought together the founders of our portfolio companies, operating partners, and some of Samaipata's close friends. This retreat is designed to strengthen the bonds within our community, fostering meaningful connections and facilitating collaboration among all attendees.
Read more
Fueling startups with smart engineering

Fueling startups with smart engineering

In this episode, Stephan delves into strategies for optimising engineering productivity while keeping technical debt in check. He emphasises the importance of fostering collaboration between product and engineering, maintaining a focus on business value, and using key performance metrics to drive continuous improvement.
Read more

Jumpstarting the data engine for startups

In the fast-paced world of startups, navigating the complexities of data management is critical, yet often misunderstood. Emmanuel Martin-Chave, VP of Data at BlaBlaCar and Operating Partner at Samaipata, brings a wealth of experience in scaling data strategies for high-growth companies. With a background that includes leading data teams at BlaBlaCar and contributing to the development of data-driven cultures across various organisations, Emmanuel is well-versed in the challenges and opportunities that data presents to startups.
Read more

Leveraging the power of PR for startup growth

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros.
Read more
arrow icon
arrow icon