Transformers: The age of large-language-models

Published in

The Thesis

3 min readJan 15, 2023

I have been meaning to write a thesis on exciting areas that transformers/LargeLanguageModels could disrupt; but this space is evolving so fast, that by the time I finish a draft, it is already outdated. So I have stuck to posting my thoughts on twitter instead.

But after several conversations (and time to think deeply about this space over the holidays), I have arrived at 5 areas/themes that excite me at the moment:

Emergence of Hybrid-models: I am excited about startups building their own models leveraging open-source foundational models, but significantly repurposing them for specific use-cases like drug design, neopeptide screening, creating synthetic labelled images for computer vision training. Such companies will be hard to replicate as their models won’t be simply finetuned using a GPT3 API and a few sample datapoints, but potentially also using different techniques (e.g. Diffusion-GAN).
Serious companies will start training their own models: While GPT3 and other pretrained transformers might be a great way to test a new product idea/business model; most serious businesses who have LLMs as a core part of their offering (e.g. LLM-based search, Copilot for Legal) will train their own models. Using GPT3 will likely create constraints around latency, and cost if used at scale. e.g. Jasper started training own model using Cohere, Codeium building Copilot for coding
‘Picks’ and ‘Shovels’ for LLMs will be an interesting business: As companies move towards training their own models, they will likely leverage few infra services (e.g. Labelling, Training & Inference Infra). This might be an interesting space to keep track of. It will also be interesting to see how the dynamic between closed vs open source infra providers plays out. e.g. SnorkelAI, MosaicML
AI and the “Infinite Interns” analogy: I loved Ben Evan’s articulation that “One of the ways I used to describe machine learning was that it gives you infinite interns”. Use-cases where GenerativeAI has natural fit, at least initially, will be ones where output accuracy is not binary — and where producing several “good-enough” options quickly/cheaply is more valuable. e.g. creating molecule candidates for drug discovery. However, it is critical to note that in most cases, defensibility will be built by also capturing the downstream valuechain (e.g. testing molecules in high-throughput wet-labs, developing them into drugs, launching drugs, …).
Will there be a middle-layer?: An interesting hypothesis is that infra/tooling companies will go up the value chain and actually offer middle-layer LLMs out of the box. These will potentially be verticalized, and help solve for some latency & cost issues compared to existing foundational models. They will provide a more compelling argument to ‘buy’ vs ‘build’ as compared to the foundational models.

If you know any founders building on any of these themes, or if there is a 6th exciting theme, please give me a shout on @sahilpatwa

Transformers: The age of large-language-models

Written by Sahil Patwa