More Data For Better Medicine
The more we learn about biology, the more we realize all we do not know. It started with the genomic revolution and the first human genome sequenced in the early 2000s.
Genomics has now been joined by other datasets like transcriptomics, proteomics, metabolomics, microbiome, etc. forming a new “multiomics” science. We discussed in further detail this evolution in “Multiomics Are The Next Step In Biotechnology”.
These new tools have created a flood of data bringing detailed information about the inner activities of cells sometimes down to the atomic level. A key driver of this data growth has been the collapse of price of sequencing gene and other biological materials like proteins.
This has created enthusiasm about “Big Data” potential in biotech, mimicking the concept of big data from other, more IT-driven fields.
Already in 2018, the magazine Barron’s was asking “Will Big Data Lead to Big Biotech Returns?” and the industry started to ask “Implementing Large-Scale Data Processing and Analysis for Bioprocessing”
Quite a few companies are well-positioned to benefit from the drive to create and analyze at-scale biological data.
AI Merging With Big Data?
A new development in the last few years has been the emergence of AI. While AI made its way into public consciousness mostly in 2023, with LLMs (Large Language Models) like ChatGPT, the biotech industry started to embrace AI many years before that.
And it makes sense because data and AI have a somewhat symbiotic relationship:
- Training AI models requires a lot of data with high quality and annotations.
- AIs can help sort out massive datasets without direct human intervention and connect the dots where manual analysis would not be possible.
The result is that today, a lot of the previously big data-focused companies in the biotech industry are also turning into AI companies.
Contrary to some AI applications still looking for a business model (like image generation), drug discovery and medical research have a pretty straightforward path from AI model to monetization.
Top 10 Big Data Biotech Stocks
1. Illumina
Illumina is the leading genomics company, by far the largest and most established in the industry, with $1.2B in revenues, which grew 11% CAGR in the last 5 years.
This also makes it the prime provider of genomic data to the entirety of the biotech industry.
Like most genome sequencing companies, Illumina makes money when selling sequencers but mostly when selling the consumables used by the sequencers. Revenue per machine usually grows over time as it is progressively used to full-time capacity.
The company”s new genome sequencer model, NovaSeqX, is a hit, with 352 in 2023. This has accelerated the adoption of mass genome sequencing among Illumina’s clients with more multi-omics analyses and a larger scale for single-cell and spatial analyses.
NovaSeqX sales come on top of a very large genome sequencer segment, with more than 25,000 systems installed.
Grail Troubles
When discussing Illumina, a long explanation is required for a new genomics application, cancer detection in a blood sample called liquid biopsy.
Illumina worked on developing this technology and then spun it off into a company called Grail.
Grail is very successful from a technical and commercial standpoint. In Q2 2023, 7,500 providers prescribed Grail’s tests, passing the 100,000 tests performed milestone. It also detected 92% of cancer relapse across 6 different blood cancers.
Several years later, Illumina would reacquire this company at a much higher price.
This caused several problems. First, regulatory authorities in both the USA and the EU raised concerns about monopoly risk, with Illumina the supplier of genome sequencing machines to many of Grail’s competitors. This resulted in a €432M fine from the EU.
Another set of problems came from the conditions of the costly Grail spin-off, money raising, and re-absorption into Illumina.
Activist-investor Carl Icahn has attacked the company’s board and implied that potential dishonest or malicious dealings were done in favor of insiders against the interests of the company’s shareholders. The SEC was also investigating the question. You can also read more about these suspicions and accusations in this series of articles by Non-GAAP investing.
Ultimately, the decision to divest Grail again has been made, with the board approving the decision on June 4th, 2024.
The Grail saga has caused a lot of trouble for Illumina and its shareholders. This however did not impact the company’s position in genome sequencing.
Ultimately, it is likely that Grail cancer detection can grow into a massive business, and make doctors use a lot of Illumina genome sequencers and consumables.
Illumina also acquired in 2023 the bioinformatic software company Partek, expanding the company’s offering beyond sequencers and their consumables.
2. Schrödinger, Inc.
The company specializes in physics-based models to find the best possible molecule for a given goal, balancing out conflicting metrics like potency, solubility, half-life, synthesizability, etc.
It also uses machine learning, but the addition of a physics-based model allows it to be tested in entirely novel fields for which no data set exists to “train” the AI. This allows Schrödinger to go from 1 billion potential molecules to just 8 solid candidates in a matter of days, exclusively through digital calculation.
Schrödinger signed with Bayer a 5-year collaboration agreement in 2020 for revenue of $10M. The idea of the agreement is to use Schrödinger technology together with Bayer in-silico prediction models.
Another recent partnership is with Lilly, which has up to $425M in total milestone payments for successful discovery.
Past collaborations included Takeda, Sanofi Bristol Myers Squibb, and other smaller pharmaceutical companies.
Overall, Schrödinger is building a growing portfolio, including more and more proprietary and fully-owned molecules. While not pre-revenue, the company is still not profitable, focusing on expansion and R&D spending to improve its technology.
The company is also looking at expanding toward new segments beyond drug discovery, like complex biopharmaceuticals or even materials like chemicals, batteries, or polymers.
Investors will want to keep an eye on the new collaborations, as they will reflect the advances of Schrödinger’s technology, as assessed by the leaders in the industry, as well as possible success in expanding the core technology to new markets.
3. Exscientia
The company is using AI to develop precision therapies. It runs a “full stack” AI drug discovery technology with dedicated software at every stage of the drug discovery process.
Exscientia’s technology reduces 70% of the time required for going from a biological target to finding a corresponding drug and an 80% more capital process.
This resulted in 4 compounds in early clinical stages, 30 programs in total, and $6.5B in revenues from milestones with partners. The main focus has been oncology (cancer) and inflammatory diseases.
This might be an interesting option for investors looking at a well-established AI drug discovery company with a very large cash runway and multiple ongoing partnerships for extra safety.
4. 10x Genomics, Inc.
10x Genomics is a leader in spatial biology, which studies the genome and transcriptome in 3D, allowing visualization of the activity of genes at the cellular or even intracellular level.
The company was founded in 2012, with Serge Saxonov among its founders, the director of R&D of the personalized genome testing company 23andMe.
10x Genomics grew using a mix of R&D ($1B+ investing in R&D so far) and acquisitions. Notably, its Visium platform was obtained through the acquisition of Spatial Transcriptomics in 2018.
This is also how 10x Genomics would acquire its Xenium platform by acquiring Readcoor and Cartana in 2020.
In 2020, it would also launch the Chromium platform, which was updated the year after to Chromium X.
Through the acquisition of Tetramer Shop in 2021, 10x Genomics would also launch BEAM (Barcode Enabled Antigen Mapping) in 2022. It allows researchers to identify components of the immune system in detail. This could be very impactful in research on immunity and new diseases.
Revenues grew by 17% year-to-year in Q2 2023, driven by Xenium sales, with the 100-unit sold milestone passed in August 2023.
The company also earned in September 2023 a critical victory against its main rival, Nanostring. Nanostring is for now banned from selling its CosMx Spatial Molecular Imager (SMI) instruments in most of the EU for infringing on 10x Genomic patents.
The company is still at an early stage, somewhat similar to the early days of Illumina. For now, spatial biology is confined to the world of academic and fundamental research. But like many biotechnologies, it might one day become widespread, slowly become a medical tool, and then into a “routine” test. In any case, the growing pool of installed machines should drive sales of consumables and revenue growth.
5 . Oxford Nanopore Technologies plc (ONT.L)
Oxford Nanopore is using a unique genome sequencing technology relying on flow cells. This allows DNA to be “read” when crossing the nanopores, not through chemical means but directly by measuring an electric current. So, in a way, this is the first time a computer can read a genetic sequence (DNA & RNA) in real-time.
Another unique advantage of the company’s technology is that it can read longer genetic sequences than conventional sequencing methods. Long sequences and real-time reading can help to get better and quicker results, which is important for cancer analysis or infectious diseases like antibiotic-resistant bacteria.
Lastly, electrical measurement allows for smaller and more portable sequencers, an improvement from the massive machines used until now. This allows the company to produce a wide array of sequencers, including slower, smaller, and much cheaper machines, starting at $1,000. This could radically expand the sequencing market, with mobile or low-cost sequencing not an option previously.
Because of its radically new technology, it is unclear where Oxford will fit in a more mature genome sequencing ecosystem.
It could fully replace the incumbent technology of chemical/optical reading of genomes.
Or it could become a successful but niche application for low-volume or mobile sequencing or for sequencing requiring a high-accuracy reading of long genetic sequences.
The company also plans to expand into a reading of proteins, post-translational modification of proteins or small molecules, and other measurements at the very edge of life sciences.
6 . Ginkgo Bioworks Holdings, Inc.
The company is producing on-demand organisms for specific applications. It has widely diversified its applications with many research programs and partnerships:
Many of these modifications rely on CRISPR or similar gene editing technologies, notably its CAR-T cancer cell therapies.
By providing a ready platform for cell engineering, Ginkgo is becoming a key service provider in the biotech industry, going beyond the pharmaceutical industry and into agriculture, biosecurity, and industrial chemical processes.
It provides expertise and speed and can help reduce fixed costs and the quantity of capex needed for a research project.
This is demonstrated by the very diverse array of clients and partners the company has had over the last few years.
What makes Gingko a big data company is the unique breadth among countless applications and organism types of its cell banks, datasets, and experiments.
It is an attractive stock for investors looking to bet on gene editing and cell engineering technologies, but not one application in particular. This is also typically more interesting for growth-focused investors.
The large majority of CRISPR companies are focused on human medicine and genetic diseases, leaving Gingko opportunities open for agriculture, bioengineering, energy, and bio-products (including cannabinoids).
Together with the quick expansion of genetic datasets, gene editing tools, and AI (including open source), this could prove a massive opportunity for Gingko Bioworks.
7. BenevolentAI SA (BAI.AS)
BenevolentAI uses AI-enabled drug discovery to develop treatments for atopic dermatitis as well as potential treatments for chronic diseases and cancer.
Where other companies use AI to predict cell activity or protein 3D configuration, Benevolent’s BenAI engine investigates the existing database of scientific papers (35+ million) to unlock new insights.
It then integrates these potential findings into a process including experimental validation of the idea, in-silico analysis, and indication expansion/drug repurposing.
The idea is that many existing drugs or known biological mechanisms could be repurposed for new treatments. Overall, such a strategy should yield new therapies quicker, as a lot of the regulatory work is already done (for example, phase I of clinical trials demonstrated the safety of the drug).
The company has an ongoing collaboration with AstraZeneca to develop drugs for fibrosis and chronic kidney disease (initial deal from 2019), expanded to include heart failure and Systemic Lupus Erythematosus (SLE) in 2022.
It also partnered with Merck KGaA to leverage its expertise in oncology and neuroinflammation and support the company’s AI-driven drug discovery plans by focusing on finding viable small molecule candidates.
Previously, it achieved a novel indication expansion leading to FDA approval with Eli Lilly for baricitinib, as a potential COVID-19 treatment.
8. AbCellera
AbCellera is specialized in developing new categories of antibodies-based medicine.
Notably, it is working on GPCR & Ion Channel Platform, a therapeutic target for which antibodies could not be developed before. Their other platform is T-Cell Engagers, which boosts efficiency and reduces the toxicity of antibody-based cancer treatments.
Over 10 years, the company has developed 100+therapeutic programs with a large array of partners, with 50% in oncology. 13 molecules have already reached the clinical trial stage, with 2 already authorized for treatment.
A key part of AbCellera’s process is access to a large selection of possible antibodies. And then picking the right ones with high-throughput single-cell screening powered by machine vision.
9. Therapeutics
Bioxcell is focused on a concept they call “drug re-innovation”. Drug re-innovation leverages AI to analyze drugs that have already been proven to be safe, but have been abandoned by their developer for various reasons.
It also investigates approved products for new applications.
The concept generation using big data and AI only takes 6 months (instead of several years for novel molecules), followed by 12 months of validation of the hypothesis leveraging computer vision, deep learning, decision matrix, and in silico validation.
Re-innovation has seen notable successes recently, notably when combined with reformulation to remove the side effects or improve a low efficiency that had led to the drug candidates being abandoned in the first place.
This model already bore fruit, with the approval of IGALMI (for treatment of agitation associated with schizophrenia or bipolar disorders) in less than 4 years from project start to approval.
In the case of IGALMI, the previous poor bioavailability was solved by changing the method of administration of the drug and combining it with a metabolic stabilizer.
The company already has two advanced programs in phase 3 of clinical trials, as well as 5 other programs in the pipeline.
The first program, for agitation associated with Alzheimer’s dementia (AAD) with a novel agent, a new formulation of latrepirdine, an antihistamine drug (allergies).
The second is an extension of IGALMI’s application, for agitation associated with bipolar disorders or schizophrenia in a at-home setting.
Bioxcell’s success with IGALMI shows the potential of big data which can be expanded beyond new drug discovery, and into improving the existing arsenal of drugs, either through reformulation or in finding new applications of known safe drugs.
10 . Recursion Pharmaceuticals
Recursion Pharmaceuticals leverages AI in drug discovery,
The company’s approach aims to significantly reduce the time and cost associated with bringing new drugs to market.
Creating solid datasets has been the focus of the company since inception looking to solve several problems with biodatas:
- Analog data, from faxes to pdf or scanned printouts.
- Siloed data, with little to no annotations.
- Hard to replicate research.
To solve these problems, Recursion created one of the world’s largest automated wet lab, and digitized millions of their own experiments (2.2 million experiments per week).
They also own one of the world’s fastest supercomputer to train their LLMs and Ais for drug discovery. Models were trained on a library of more than 2 billion images and infer 6 trillion relationships between all possible combinations of genes and compounds.
Recursion established a partnership with AI leader Nvidia and might release some of its AI models to commercial partners via NVIDIA’s new BioNeMo platform. It will also gives Recursion priority access to NVIDIA’s latest GPUs through NVIDIA DGX™ Cloud.
Recursion’s R&D proprietary pipeline is mostly focused on rare diseases and oncology, with 3 candidate drugs in phase 2 of clinical trials.
For more complex sectors, like neuroscience, undruggable oncology, the company prefers to establish partnership wits established companies in these sectors.
For example Roche in neuroscience and Bayer in undruggable oncology targets.
Lastly, the company has established relations to license out its technology and data, especially when data exchange can be negotiated to boost the information both companies can use in the future.