Linkup connects LLMs with premium content sources (legally)

2 hours ago 1
ARTICLE AD

If you’ve used ChatGPT Search or Perplexity you know that being able to search the web and get citations inline greatly improves these AI chatbots. Results are better when they involve timely information, and web search may reduce so-called hallucinations (i.e. when a generative AI outputs incorrect information).

That’s why French startup Linkup is building an API that lets developers access web content from premium, trusted sources and hand the results to a large language model (LLM) to enrich its answers. Many AI developers call this workflow Retrieval-Augmented Generation (or RAG).

More importantly, the future of scraping bots is uncertain. If there’s no pre-existing financial agreement between content publishers and the entities scraping web pages, these bots are lifting content from the open web without paying and many people aren’t happy about that deal — which is increasing regulatory scrutiny around AI training.

There are also now high-profile legal cases in the frame, such as the ongoing lawsuit between OpenAI, the maker of ChatGPT, and the New York Times — so the situation around web scraping could change in the near future. Hence why OpenAI has signed multi-year content licensing deals with major publishers such as AP, Axel Springer, Condé Nast, El País, the Financial Times, Le Monde, and others.

“We set up the company around the time when OpenAI was making deals with news sources… for training or inference purposes, to augment the answers from OpenAI models and their products. And we thought: ‘OK, this is great because we finally have AI companies that pay their sources,’” Linkup co-founder and CEO Philippe Mizrahi told TechCrunch, laying out what propelled the founders to set up a business to connect AI devs with content providers for — hopefully — their mutual benefit.

Currently, content publishers are faced with a difficult decision over what to do about GenAI’s thirst for data. They can block web scrapers using the (non-legally binding) robots.txt metadata file (which indicates whether a website can be used to train an AI model or not). Furthermore, they can sue AI companies that they believe have breached their copyright. Alternatively, they could let bots index their content freely (er, YOLO?). Or they may be able to license content to AI devs to get some recompense for their intellectual property.

But there are thousands of AI companies (or tech companies using AI) that don’t have the scale and reach of OpenAI. At the same time, what’s great about the web is that there’s a long tail of content publishers. But this means that a small content publisher usually doesn’t have enough financial resource to file a lawsuit. It also means that it will be difficult to switch from a scraping model to a licensing model for millions of websites.

That’s why Linkup isn’t just a technical solution. It’s a marketplace; an intermediary between content publishers and companies that want to augment their LLM answers with web content.

Linkup signs content licensing deals with publishers and integrates with their CMS so that it can fetch content from publishers without any scraping. Linkup then pays content partners based on how often their content is accessed by Linkup clients.

Linkup’s founding teamImage Credits:Linkup

“We’re really targeting applications that are implementing AI in their own products,” said Mizrahi. “So, the typical use case is that I create an AI application using a model from Mistral or OpenAI. I build my own pipeline, but I need to enrich this pipeline with external information.”

As a side note, while ChatGPT can browse the web, GPT models can’t. OpenAI provides both a massively popular application (ChatGPT) and LLMs that developers can use with an API (GPT). But web search is a ChatGPT feature.

“There’s an example I like, which is one of our customers… built an internal application for their sales people,” Mizrahi also told us. “On the one hand, they have listed all the advantages of their own products. And thanks to us, they get fresh, quality information on their prospects and put it into a Mistral LLM. And Mistral’s LLM is going to generate a sort of sales pitch for the sales reps, which they’ll have in front of them when they make the calls with the customer leads.”

At first, Linkup decided to focus on corporate and business information. In addition to news websites, the startup works with knowledge databases — think Statista, Xerfi or other resources in the same vein.

It isn’t the only startup working on bringing premium content to LLMs with licensing contracts behind the scenes. The most visible competitor is ScalePost, a startup that works with Perplexity to speed up its licensing deals with publishers.

Linkup raised a €3 million seed round ($3.2 million at current exchange rates) a few months ago from Axeleo Capital, Motier Ventures, Seedcamp, and a hundred business angels. There are around 10 people working for the startup right now, and it plans to hire another 10 staff over the next year.

Read Entire Article