![]()
![]()
Eligible for INRA funding
Date : 16/3/2009
Thesis subject : "Développement d'une méthode de prédiction de novo de la structure 3D des protéines"
Laboratory 1
INRA
Département MIA-PHASE-MICA
Centre Jouy en Josas
Unité MIG 1077
Domaine de Vilvert
78 Jouy en Josas
Thesis supervisor 1 : Jean-François Gibrat, DR2 INRA, HDR
Laboratory 2
INRA
Département MIA
Centre Jouy en Josas
Unité MIA 341 MathCell
Domaine de Vilvert
78 Jouy en Josas
Thesis supervisor 2 :Antoine Vigneron, CR1 INRA
Before sending candidature, please contact supervisors :
Antoine Vigneron 01 34 65 22 19
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
Jean-François Gibrat 01 34 65 28 97
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
Profile and subjects :
Bioinformatique, Biophysique, Informatique
Summary of project
This thesis proposal is about an important problem in molecular biology: finding the connection between the amino-acid sequence of a protein, its 3D-structure,and its function. More precisely, we want to study how the amino-acid sequence determines the unique native 3D-structure of a protein, and what is the influence of this 3D-structure on its function. In addition to its fundamental nature, an answer to this question would be very important for practical problems in biotechnology (protein engineering), and genomics and metagenomics data analysis.
This thesis aims at developing a new method, which would allow to handle difficult cases that cannot be handled by current methods: modeling by homology and folding recognition. This new method is a de novo modeling technique, which does not require (as opposed to the methods mentioned above) to know the 3D-structure of a similar protein.
Scientific stakes
This thesis is on a fundamental question, which is important for applications in two fields:
- biotechnology, and in particular protein engineering. We want to understand, at atomic level, the connection between the sequence, the 3D structure, and the function of proteins. It could help us modify the properties of these proteins, for instance to design enzymes that operate at a higher temperature, to modify the specificity of a protein, to design new drugs, or to find aromatic molecules that match a given olfactory receptor...
- in silico analysis of genomics and metagenomics data. The goal of genome and metagenome sequencing is to identify all the genes and to determine the function of the corresponding proteins. The fast progress of sequencing techniques will allow to generate, in the very near future, tens of millions of protein sequences, that we will need to analyze. Initially, we only know the primary sequence of these proteins. Finding the function of all these proteins through experiments requires a considerable amount of work. However, it is possible to significantly reduce this amount of work using bioinformatics techniques, which allow to obtain important information on the function of the proteins by analyzing their amino-acid sequences.
Scientific context
De novo approaches try to build the 3D-structure of a protein by assembling fragments of amino-acid sequences that come from protein databases. A number of north American research teams have demonstrated the efficiency of these methods by finding the 3D structure of small proteins as accurately as experimental methods. This type of techniques require the following buildingblocks:
- a model of the polypeptidic sequence.
- an empirical "energy function" that accounts for the main propertiesof the 3D-structure of the proteins, and such that the native structure correspondsto an optimum of this function.
- an empirical method to find the optimum of the empirical function in the conformation space. The main difficulty is to find the right balance between a detailed model using an accurate energy function, and a simplified model. In the first case, we have a good description of the 3D structure but the manifold corresponding to the energy function is hard to explore. In the second case, it is easier to explore, but we may lose some important features of the 3D structure. We started to work on these problems in the MIG team with a PhD student, J. Martin, who focussed on the first step of these methods: predicting the local structure of a protein. Then we worked with the MIA team (K. Kieu and A. Vigneron), and a postdoctoral researcher, A. Elfallahi. We developed a method that includes the three above building blocks. We used a detailed model of the 3D-structure and we explored the conformation space with various heuristics. A. Elfallahi obtained a permanent position before the end of his contract, so did not have enough time to thoroughly test the model. However, the preliminary results show that our energy function needs some tuning. Thus, we may need to use a hierarchical approach, that is, start with a simpler model and an appropriate, simpler energy function. Ideally, this simpler model should allow us to reduce the conformation space to a few regions, and one these regions should contain at least one structure within the "convergence radius" around the native structure. Then, using a more detailed model, we should be able to get closer to the native structure. So this thesis will consist in developing and testing various couples of 3D-modelsand energy functions of increasing complexity.