​ Logo Atom51 ​

Atom51

github or email

​

Hello! πŸ‘‹

​
​

What are we doing differently?

​

There's substantial work being done in the computational protein space since 2021. We are starting from first principles to build towards our vision. We might independently arrive at conclusions drawn already by other teams - we consider that a strong positive reinforcement rather than a negative data point. We need to figure out a few key components:

  • πŸ“―One representation learning method to rule them all πŸ—‘: The current SOTA methods still can't produce inclusive protein representations that can incorporate all facets of proteins from sequence, structure, temporal dynamics, post-translational modifications, single point mutations, additional bound ligands/substrates, protein complexes, etc. into one representation learning scheme. This limits us to only extrapolate knowledge from one or a limited set of domains. We are investigating if Joint Embedding Architecture (popularized by Yann Lecun) is the way forward. Our first experiment is to simply integrate sequence and structure (since it's quite standardized at this point). Our work: https://github.com/atom-51/jespr.
  • One pre-training objecting to bring them all 🀝: We need to find the pre-training objective for protein learning which is as powerful as 'next word prediction' for language. Our bet currently is on contrastive loss for protein representations in different modalities. This also ties into our first point on how to learn more inclusive representations.
  • And in the lightπŸ”₯, one protein-landscape mapping to bind them⛓️: Target discovery is absolutely crucial. We can now almost reliably sequence and predict 3-D structures of all proteins. By adding a few key pieces such as information for spacial transcriptomics β†’ proteomics to extrapolate cellular localization and interaction prediction, we should start accurately mapping the Gene Ontology landscape. We're still exploring how best to approach this problem.

​

Are we funded or incorporated?

  • We are currently neither funded nor incorporated and have bootstrapped ourselves. We are still figuring out the best path forward and till then we'll work off non-dilutive grants and open-source our work. We are also supported by our university's (Imperial College London) compute resources (no IP boundation).
  • We are currently not looking to raise equity-based investments and will continue developing through grants. Our goal is to build fast without distraction.
  • Contrary to popular belief, capital (except for compute) is not yet a bottleneck, and neither is human capital. If you have any GPU compute cluster we can ssh into (anything better than 4xRTX-6000s), we'd wish for god πŸ‘Ό to bless your soul πŸ™Œ. You can also support us by reviewing our work and letting us know what you think.
  • Our belief in protein therapeutics + deep learning is unwavering. We believe these problems are solvable.
  • We're a team of three: harsh, aditya, and digvijay
​
​