Deep Semantic Protein Representation for Annotation, Discovery, and Engineering
Citations Over Time
Abstract
Abstract Computational assignment of function to proteins with no known homologs is still an unsolved problem. We have created a novel, function-based approach to protein annotation and discovery called D-SPACE (Deep Semantic Protein Annotation Classification and Exploration), comprised of a multi-task, multi-label deep neural network trained on over 70 million proteins. Distinct from homology and motif-based methods, D-SPACE encodes proteins in high-dimensional representations (embeddings), allowing the accurate assignment of over 180,000 labels for 13 distinct tasks. The embedding representation enables fast searches for functionally related proteins, including homologs undetectable by traditional approaches. D-SPACE annotates all 109 million proteins in UniProt in under 35 hours on a single computer and searches the entirety of these in seconds. D-SPACE further quantifies the relative functional effect of mutations, facilitating rapid in silico mutagenesis for protein engineering applications. D-SPACE incorporates protein annotation, search, and other exploratory efforts into a single cohesive model.
Related Papers
- → JAFA: a protein function annotation meta-server(2006)36 cited
- → Protein Functional Annotation by Homology(2008)4 cited
- → Network‐Based Prediction of Protein Function(2009)7 cited
- On Graph-Based Approaches for Protein Function Annotation and Knowledge Discovery(2021)
- → Proteins with Known Structure but Previously Unknown Function Identified as Hydrolases(2022)