Atom51

Hello! 👋

We're interested in building the most advanced toolkit to design protein therapeutics.
Our conviction lies in harnessing the power to bend proteins at will. We strongly believe that proteins (Ammino Acids more specifically) are the bits of biology and that we can design orders of magnitude better therapeutics than we can today.
If we have substantially better insight into proteins, we believe that these major problems with protein therapeutics are solvable:

Protein penetration into the cell.
Adverse Immune Response.
Protein Degradation, etc.

Biology has had a way of bypassing all these problems for millennia. Small molecules, while powerful simply lack the sophistication for better medicine. We must try to leave them behind.

What are we doing differently?

There's substantial work being done in the computational protein space since 2021. We are starting from first principles to build towards our vision. We might independently arrive at conclusions drawn already by other teams - we consider that a strong positive reinforcement rather than a negative data point. We need to figure out a few key components:

📯One representation learning method to rule them all 🗡: The current SOTA methods still can't produce inclusive protein representations that can incorporate all facets of proteins from sequence, structure, temporal dynamics, post-translational modifications, single point mutations, additional bound ligands/substrates, protein complexes, etc. into one representation learning scheme. This limits us to only extrapolate knowledge from one or a limited set of domains. We are investigating if Joint Embedding Architecture (popularized by Yann Lecun) is the way forward. Our first experiment is to simply integrate sequence and structure (since it's quite standardized at this point). Our work: https://github.com/atom-51/jespr.
One pre-training objecting to bring them all 🤝: We need to find the pre-training objective for protein learning which is as powerful as 'next word prediction' for language. Our bet currently is on contrastive loss for protein representations in different modalities. This also ties into our first point on how to learn more inclusive representations.
And in the light🔥, one protein-landscape mapping to bind them⛓️: Target discovery is absolutely crucial. We can now almost reliably sequence and predict 3-D structures of all proteins. By adding a few key pieces such as information for spacial transcriptomics → proteomics to extrapolate cellular localization and interaction prediction, we should start accurately mapping the Gene Ontology landscape. We're still exploring how best to approach this problem.

Are we funded or incorporated?

We are currently neither funded nor incorporated and have bootstrapped ourselves. We are still figuring out the best path forward and till then we'll work off non-dilutive grants and open-source our work. We are also supported by our university's (Imperial College London) compute resources (no IP boundation).
We are currently not looking to raise equity-based investments and will continue developing through grants. Our goal is to build fast without distraction.
Contrary to popular belief, capital (except for compute) is not yet a bottleneck, and neither is human capital. If you have any GPU compute cluster we can ssh into (anything better than 4xRTX-6000s), we'd wish for god 👼 to bless your soul 🙌. You can also support us by reviewing our work and letting us know what you think.
Our belief in protein therapeutics + deep learning is unwavering. We believe these problems are solvable.
We're a team of three: harsh, aditya, and digvijay

Atom51

github or email

What are we doing differently?

Are we funded or incorporated?