Key words: de novo genes, non-exonic translations, proteogenomics, phylogenomics, systems biology, evolution, sORFs, protein signaling, and big data
My lab is interested in interpreting the functional effects of disease variants in (a) protein signaling, and (b) the non-coding regions of the genome. We use – Whole Genome Sequencing, Transcriptomics, and Proteomics – ‘Big Data’ and analyze them together in a cloud-based framework developed by us using machine-learning and mathematical modeling approaches. We call this approach ‘systems proteogenomics’. We use this approach to primarily interpret functional consequences of disease-associated genetic variants in cancer and in psychiatric diseases such as schizophrenia.
Cloud computing provides us with a reliable, scalable, high performance computing infrastructure, without requiring us to purchase and maintain complex hardware. Moreover, it speeds our development of bioinformatics workflows, because software tools are deployed as machine images (e.g. using Docker) and then linked to form a pipeline using, for example, the common workflow language (CWL).
We are also developing experimental and computational strategies to perform ‘systems proteogenomics’ at the level of single cells.
Brief sketch of our research is represented below.