An AI product has 2 components -
- Algorithms/ML models
- Data
The algorithms are not a differntiating point. With rise of AutoML etc., algorithms are being commodtized.
The defensible moat for an AI startup is the data.
There are 2 ways to think of data -
- The static data you train your models on,
- The data which is collected from users when using their product which in turn is used to further refine your model.
Static data is mainly collected from web scraping. For eg - scraping Wikipedia or Amazon product reviews or Job postings. You train your ML model and sell to customers for their use case. That is not an enough. This data is not proprietary. If you are thinking that you can get first mover advantage, that’s not a winning strategy especially in B2B.
Static data can also be proprietary. For eg - Tesla collecting road data to train it’s self driving cars. But still it’s not completely static. It’s updated with usage.
What truly builds a moat that is defensible is the data that is collected on your product that only you have access to. For eg - Tinder has lot of data of which kind of profiles are liked or disliked. It can launch a Tinder ‘Platinum’ service where it can recommend profiles suited best to you. It is using the data generated on it’s entire platform to personalize it for each user. That’s the power of product that has data loop. Spotify, Netflix, Tiktok are doing the same.
So when you plan your AI product, the first task is to map out how will you add data loops into your product to make the product better with more usage.