IGBMC is one of the leading European centres in biomedical research. It is devoted to the study of higher eukaryotes genome and to the control of genetic expression as well as the functional analysis of genes and proteins. This knowledge is applied to studies of human pathologies.

Machine Learning Developments For The Analysis Of High Resolution Biophysical Data-Sets.

Reference : PhD Bruno Kieffer

Nuclear Magnetic Resonance spectroscopy (NMR) and Mass spectrometry (MS) are two powerful biophysical methods used intensively in the biological laboratory. Both methods allow to study protein in solution in their native form, measure their interactions with different molecular partners to and gain structural information from the different species present in solution. With the sensitivity improvement afforded in the recent years, many new experimental approaches are now developed, with gains in through-put and in analytical power. Both techniques have also in common to produce large amount of data in high resolution, which overwhelms the capacity of the scientist to analyze directly and requires the use of automatic extraction methods. The laboratory has been involved for a long time in the development of analytical methods, covering new experimental set-up, innovative numerical techniques, and automatic analysis of spectroscopic data. We maintain a large software enabling the handling and analysis of spectroscopic data in a big data framework.


The present project aims at developing machine learning approaches, to the analysis of NMR and MS spectroscopic data-sets. The work involved the recognition of signals from artefacts and the rejection of noise using dimension reduction, Markov state models or Bayes analysis.  New approaches developed in the field of Artificial Intelligence, such as deep neural networks, random forests or other classifiers, will be used to develop higher analysis, in the aim of detecting biologically relevant information from a large corpus of high resolution spectra, with joint analysis of phylogenic, genetic and mutation data. Modeling of the measurement processes and of the molecular events taking place in solution will be also performed.


The methods developed in the frame of this work will be used to study protein-ligand, protein-protein and protein-nucleic acid systems, with application on the regulation of gene expression by the nuclear receptor (NR) family. The Androgen Receptor, and in particular the long disordered N-Terminal domain will be the primary target of the biological application.


 Acquired skills at the end of the PhD thesis: Machine Learning technology, MS spectrometry, NMR spectroscopy, handling of big-data projects,

Candidate’s background: Proteomics ; Biophysics ; Algorithmic ;

Your application

Application Deadline : Nov. 1, 2018