The goal of the proposed project is to develop ultra fast and very precise algorithms that predict protein interactions. Our ultimate goal is to provide a new set of tools for structural biology, medicine, and pharmaceutics. Our research program is dictated by our knowledge and experience in each theoretical, computational and experimental aspects of protein interactions. Our approach will involve theoretical studies, development of new algorithms, their validations and subsequent software implementation.

Our project is divided into eight tasks, which are grouped in two main groups. This division is motivated by our experience with the problem and by complementarity of individual tasks. The first group of tasks aims at extending current exhaustive search algorithms in the direction of larger systems and faster search. In the second group of tasks we aim at using current knowledge-base in structural biology and structural genomics for developing knowledge-driven methods and algorithms that will assist in predictions of protein interactions. In all of the proposed tasks GPU acceleration will be incorporated for CPU-critical operations.

Task 1: Polar/Cartesian FFT Search

The overall goal of this task is to use high order Hermite polynomial expansions to represent steep short-range protein-protein interaction (PPI) potentials, and to incorporate this approach in the SPF FFT engine in order to allow fast rotational and translational correlation searches.

Task 2: Exhaustive FFT Search With Symmetries

Currently, nearly all existing search algorithms treat symmetrical constraints a posteriori. Precisely, they perform the search without symmetry and only then sort out obtained results based on the applied symmetrical constraints.  We will instead develop an FFT-based search algorithm that treats symmetry constraints implicitly and thus performs the search much faster and samples the search space more regularly.

Task 3: Rigid-Body Modeling for Cryo Electron Microscopy

Having an electron density map of not very high resolution, one of the challenges, as has been announced recently, is the rigid-body docking of a protein model into the density (Lawson et al. 2011). This problem is very similar to the rigid-body protein-protein docking problem, and was recently solved by us in the contest of shape recognition (3D-Blast algorithm) (Mavridis and Ritchie 2010). We will adapt 3D-Blast algorithm for cryoEM density maps and will:

  • Extend it for Cartesian basis functions (Task 1), which seem to encode cubic densities better than angular basis functions.
  • Add symmetrical constraints (Task 2).
  • Eventually provide a combination score, the best match into the density map, with the best complementarity between symmetrical units.
Task 4: PEPSI-Dock – Knowledge-Driven Macromolecular Docking

Deciphering the complete network of PPIs in a genome using experimental and computational techniques is one of the main goals in Systems Biology (Wass et al. 2011). Thus, there is currently considerable interest and debate as to whether it is possible to predict PPI networks using knowledge of their 3D structures. The main goal of this task is to couple the advanced docking tools that will be developed in this proposal with our recently developed KB-Dock database in order to explore this question much more thoroughly than has previously been possible.

Task 5: Exhaustive Search with Pairwise Knowledge-Based Potentials

We will  develop an exhaustive-search FFT-based method that incorporates a very detailed knowledge-based interaction potential. This will combine the speed of FFT-accelerated techniques (which are, however, rather inaccurate) with the precision of state-of-the-art scoring functions (which are too slow otherwise).

Task 6: PEPSI-Blast – Pose-Invariant 3D Protein Shape Recognition

The aim of this task is to define pose-invariant shape signatures of all known protein structures (i.e. some 12,000 distinct domains) in the CATH database. 

Task 7: Annotating of CryoEM Density Maps

Medium resolution cryoEM electron density maps pose another challenge, as was announced recently (Lawson et al. 2011). At sub-nanometer resolution, α-helices become resolvable, and as the resolution improves further, β-sheets become discernible, eventually showing strand separation. In this intermediate (~5-10 Å) resolution range, tools for automatic identification and localisation of secondary structure elements become quite valuable. We will use supervised learning and exhaustive search techniques to automatically identify regions inside the map that belong to different structural traits, α-helices or β-sheets.

Task 8: Validation: Modeling Protein–Protein Complexes

The aim of this task is to validate the methods and tools in the preceding tasks by modelling different types of multi-domain systems, i.e. cyclic nucleotide–gated (CNG) channels, chemoreceptors, gap junctions, etc.

  • Lawson, CL, ML Baker, C Best, C Bi, M Dougherty, P Feng, G van Ginkel, B Devkota, I Lagerstedt, SJ Ludtke, RH Newman, TJ Oldfield, I Rees, G Sahni, R Sala, S Velankar, J Warren, JD Westbrook, K Henrick, GJ Kleywegt, HM Berman, and W Chiu. 2011. “Emdatabank.Org: Unified Data Resource for Cryoem.” Nucleic Acids Res 39 (Database issue): D456–64.
  • Mavridis, L, and DW Ritchie. 2010. “3d-Blast: 3d Protein Structure Alignment, Comparison, and Classification Using Spherical Polar Fourier Correlations.” Pac Symp Biocomput 281–92.
  • Wass, MN, G Fuentes, C Pons, F Pazos, and A Valencia. 2011. “Towards the Prediction of Protein Interaction Partners Using Physical Docking.” Mol Syst Biol 7469.