Our projects fall into four broad categories

Big data and Machine Learning

We develop computational pipelines for reusable analysis and deployment of bioinformatics algorithms on large scale genomics, transcriptomic, and proteomic data in an OpenStack environment using Docker images.

We develop machine learning based algorithms to identify and prioritize disease-associated variants in non-genic regions.

Systems Proteogenomics

We develop and employ ‘systems proteogenomics’ approach for  systematic characterization of functional elements in non-genic regions – such as de novo genes and sORFs.

We develop python based frameworks for systematic characterization of effects of variants on protein signaling.


We perform comparative genomics and develop machine-learning phylogenomic frameworks to understand the evolutionary processes leading to the retention of non-genic functional elements.

We investigate the migration and evolution of living organisms, such as the African Cichlids fishes, using system proteogenomic approaches.

We are member of the Cambridge Evolutionary Genetics Group.

Single-cell proteogenomics and behavior

We develop computational strategies for proteogenomic analysis at the level of single cell

We have a long term interest in understanding learning behavior exhibited by single celled ciliates. Ciliates are also a major interest for us because of their scrambled genome.