Converge Bio’s ‘everything store’ for biotech LLMs brings in $5.5M seed

3 months ago 48

ARTICLE AD

AI is finding its way into every corner of biotech and pharmaceutical research, but like other industries, it’s never quite as straightforward to implement as one would like. Converge Bio has built a tool for companies to make their biology-focused LLMs actually work, from “enriching” their data to explaining their answers. The company has raised $5.5 million in a seed round to scale its product.

“A model is just a model. It’s not enough,” said CEO and co-founder Dov Gertz. “A pipeline has to be made so companies can actually use the model in their own R&D process. The market is very fragmented, but pharma and biotech want to consume this technology in a consolidated way, in one place. We want to be that place.”

If you’re not a machine learning engineer working in drug discovery, this may not be a familiar problem to you. But basically, there are powerful foundational models out there, large language models trained not on books and the internet but on huge databases of DNA, protein structures, and genomics.

These are powerful and versatile models, but like the LLMs used in products like ChatGPT and Cursor, they require a lot of work to hammer into a shape that people can actually use day to day. That work is especially difficult in specialized domains like microbiology or immunology. Taking a “raw” LLM trained on billions of protein sequences and making it something a lab tech can use as part of their normal research is a non-trivial problem.

As an example, Gertz suggested antibody research. An LLM trained on antibody-specific biology exists, but it’s very general. Converge Bio offers a series of improvements that can be done securely and using a company’s own IP.

From left: Converge Bio’s Iddo Weiner, Chief Scientific Officer; Dov Gertz, CEO; Oded Kalev, CTO. Image Credits:Omer Hacohen / Converge Bio

First is “data enrichment,” augmenting the antibody LLM with important related data like antigen-antibody and protein-protein interactions. Then, loaded with more specific knowledge, it can be fine-tuned on the specific antigen the team is looking to target, and which they may have proprietary in-dish data on.

“Now we have an application: The input is a sequence, the output is binding affinity,” Gertz said. Then the platform provides another important layer: explainability. Researchers can drill down on the output to find out not just that “this sequence works better than this” but locate down to the amino acid or base pair level what part of the sequence seems to be making it work better.

Lastly, it generates new sequences that provide improved outcomes, likewise with explainability. Gertz noted that the explainability has surprised them with its popularity among customers — makes sense, since it allows experts to apply their domain expertise (say, protein interactions) to this newer and more obscure region of bioinformatics and machine learning.

Image Credits:Converge Bio

Converge uses the many open source and free foundation models out there, but is also working on making its own. It already has a proprietary process, Gertz said, for the explainability part. And the data enrichment “curriculum” is entirely theirs as well — not a trivial process. Training methodologies, he pointed out, are one of a few closely guarded secrets by the most successful AI companies.

That’s part of the moat they’re hoping to build, along with the fact that. As Gertz put it, “This is probably the biggest opportunity in biotech in five decades.”

Yet many, perhaps most, biotech companies don’t have a dedicated solution for doing LLM-related work in their field, and actively pursuing niches that generalist solutions don’t apply to.

“The idea is to be the everything store for genAI in biotech, then use that as a wedge to offer more over time,” Gertz said. “The behavior in pharma and bio is, once they have ties to a vendor that they trust, they want to use them in other use cases, be it antibody design or vaccine design. That’s why I think this positioning is best for this moment in the market.”

Investors seem to agree, putting $5.5 million into a seed round led by TLV partners.

The company will be using the money to hire up and acquire customers, as startups often do at this stage, but will also be publishing a scientific paper on antibody design (using its own systems, of course) and training “a proper foundation model.”

Read Entire Article