Research interests


Bioinformatics and machine learning. Mainly focuses on algorithms for characterizing the biological systems.

Current projects:

  1. Predicting protein-protein interactions with machine learning methods
    Proteins in a cell seldom act alone. To better understand the biological system, a comprehensive set of protein interactions is indispensable. High-throughput experimental approaches such as Yeast two-hybrid system have brought us an unprecedented opportunity to decipher the protein interaction networks. However, a large portion of the interactions are absent in the data due to the noisy nature of high-throughput data. In this project, we utilize the high-throughput protein interaction data to decipher the protein interaction networks.

  2. Protein complex prediction from affinity purification-mass spectrometry experiments
    Protein complexes are molecular machines of life. Knowledge of protein complexes and their constituent proteins is essential for charaterizing biological systems. Recent technical advances such as affinity purification-mass spectrometry experiments allow systematic identification of protein complexes. However, the bait-prey experiments only provide the complex co-membership of proteins (i.e. the bait protein is co-member with each hit protein in at least one complex.). Furthermore, the experiments are usually associated with high false rates. In this project, we build automated method to predict complex membership from affinity purification-mass spectrometry experiments.

  3. Domain functional module discovery from protein complexes
    Protein complexes are involved in many important cellular processes such as transcription and translation. Our study attempt to reveal the function linkage among complexes through analyzing their domains. The relationship between protein complexes and their constitution domains are represented as a bipartite graph. A special type of cliques, max-edge bicliques, are discovered from the graph. The bicliques simultaneously group protein complexes and their domains. Protein complexes of similar functions are related by the bicliques. By working at protein domains level, we are able to reveal many function linkage among protein complexes which are otherwise missed at protein level. In the mean time, domains are grouped into functional modules, revealing a modular organization of protein complexes.

  4. Cancer tissue classification with microarray data
    One important application of gene expression analysis is to classify tissue samples according to their gene expression levels. Gene expression data are typically characterized by high dimensionality and small sample size, which makes the classification task quite challenging. In this project, we propose a data-dependent kernel for cancer classification with microarray data. This kernel function is engineered so that the class separability of the training data is maximized.