EECS800 Special Topics in Mining Biological Data

Fall 2006
Instructor:

Jun(Luke) Huan
Assistant Professor
Department of Electrical Engineering and Computer Science 
University of Kansas 
Lawrence, KS, 66047, USA 

Class Meeting: M/W: 9:00-10:15 at Eaton Hall 2001

Syllabus of the Class

Announcement:
Presentation papers are available here
Some of the presentation papers can only be downloaded from KU campus due to license issues. 
 



Schedules:


Date  Topic Slides Related Papers Notes
Aug. 21, 2006 Introduction intro.ppt
open for registration
Aug. 23, 2006 Association rules asso.ppt R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. SIGMOD, 207-216, 1993. 
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB, 487-499, 1994. 
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. SIGMOD, 1-12, 2000. 

last day for registration
Aug. 28, 2006 Association rules 2 asso2.ppt
R. J. Bayardo, Efficiently mining long patterns from databases, SIGMOD, 1998
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules., ICDT'99. 

Aug 30, 2006 Association rules 3 asso3.ppt
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. KDD'97. 
R. Ng, L.V.S. Lakshmanan, J. Han & A. Pang. Exploratory mining and pruning optimizations of constrained association rules. SIGMOD'98. 
Presentation papers are posted. Topic selection is due Sept. 11. First come first select
Sept 3, 2006
No class,
Happy Labor Day



Sept 6, 2006 Microarray I microarrayI.ppt CH 14. of  the book: Bioinformatics: Genes, Proteins, and Computers, Christine Orengo, David Jones, Janet Thornton edit, Bios Scientific Publishers, 2003. (ISBN: 1-85996-0545)
Gao Cong, Anthony K. H. Tung, Xin Xu, Feng Pan, Jiong Yang, FARMER: Finding Interesting Rule Groups in Microarray Datasets, SIGKDD'04 
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, Mohammed J. Zaki, Carpenter: finding closed patterns in long biological datasets, SIGMOD'03
Presentation paper selection is due this Friday. 
Sept 11, 2006 Sequential pattern spatt.ppt R. Agrawal and R. Srikant. "Mining sequential patterns". ICDE'95, 3-14, Taipei, Taiwan. 
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth", ICDE'01 Heidelberg, Germany. 

Sept 13, 2006 Protein sequence motifs  motif.ppt Sigrist C.J.A., Cerutti L., Hulo N., Gattiker A., Falquet L., Pagni M., Bairoch A., Bucher P. PROSITE: a documented database using patterns and profiles as motif descriptors Brief Bioinform. 3:265-274(2002).
Hulo N., Bairoch A., Bulliard V., Cerutti L., De Castro E.,Langendijk-Genevaux P.S., Pagni M., Sigrist C.J.A. The Prosite database, Nucleic Acids Res. 34:D227-D230(2006).

Sept 18, 2006 Pattern mining in trees and graphs tree_graph.ppt Michihiro Kuramochi, George Karypis, Frequent Subgraph Discovery. In: Proceedings of the 2001 International Conference on Data Mining (ICDM2001), 2001. 
Xifeng Yan, Jiawei Han. gSpan: Graph-Based Substructure Pattern Mining. In: Proceedings of the 2002 International Conference on Data Mining (ICDM2002), 2002. 

Sept 20, 2006 Mining molecular structures protein.ppt Christian Borgelt, Michael R. Berthold. Mining Molecular Fragments: Finding Relevant Substructures of Molecules. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM2002), 2002.
Srinivasan Parthasarathy, Matt Coatney: Efficient Discovery of Common Substructures in Macromolecules. ICDM 2002: 362-369
Project assignment distributed
Sept 25, 2006 Data types  data.ppt Chapter 2, Introduction to Data Mining, by Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Publisher: Addison-Wesley, ISBN-10:0321321367 
Sept 27, 2006 Clustering  clustering.ppt
P. Berkhin. Survey of clustering data mining techniques, 2002.
P. Arabie, L. J. Hubert, and G. De Soete. Clustering and Classification. World Scientific, 1996 

Oct 2, 2006 Subspace clustering clusteringII.ppt R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD'98
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD, 226-231, 1996.

Oct 4, 2006 Model-based clustering biclusterin.ppt Yizong Cheng, George M. Church, Biclustering of Expression Data, ISBM'00 
J. Yang, W. Wang, H. Wang, and P. Yu. Delta-cluster: capturing subspace correlation in a large data set. ICDE, 517-528, 2002. 

Oct 9, 2006 PCA pca.ppt
Yeung KY Ruzzo WL, Principal component analysis for clustering gene expression data, Bioinformatics. 2001 Sep;17(9):763-74.
William Dillon and Matthew Goldstein, Multivariate analysis, 1984

Oct 11, 2006
Graph mining in analyzing gene expression
mgraph.ppt H. Hu, X. Yan, Yu, J. Han and X. J. Zhou,Mining coherent dense subgraphs across massive biological networks for functional discovery, ISMB'05.
X. Yan, X. Jasmine Zhou, and J. Han, Mining closed relational graphs with connectivity constraints, by SIGKDD'05.

Oct 16, 2006
Text mining and gene ontology ontology.ppt
1.A Road Map to Text Mining and Web Mining, University of Texas resource page. http://www.cs.utexas.edu/users/pebronia/text-mining/
1.Computational Linguistics and Text Mining Group, IBM Research, http://www.research.ibm.com/dssgrp/

Oct 18, 2006 Classification overview classification.ppt
S. K. Murthy. Automatic construction of decision trees from data: A multi-disciplinary survey, data mining and knowledge discovery. KDD Journal, 2(4), 345-389, 1998.
Oct 23, 2006 Classification 2
FLD, SVM
classificationII.ppt
C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-168, 1998.
Oct 25, 2006 Classification 3
Rule based classifier
classificationIII.ppt B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. KDD, 1998
Gao Cong, Kian-Lee Tan, Anthony K. H. Tung, Xin Xu. "Mining Top-k Covering Rule Groups for Gene Expression Data". SIGMOD'05, 2005.

Oct 28, 2006 Graph models graphModels.ppt
Dana Pe'er, Bayesian Network Analysis of Signaling Networks: A Primer , Sci. STKE, 26 April 2005
Vol. 2005, Issue 281, p. pl4
A brief introduction to Bayesian networks, http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html.

Nov 1, 2006
Semi-supervised learning
semiLearning.ppt
K. Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained K-means Clustering with
Background Knowledge
. In ICML’01

Nov 6, 2006
Kernel classifier
(by Bin Han)
    
Darrin P. Lewis, Tony Jebara and William Stafford Noble. "Nonstationary kernel combination." Proceedings of the International Conference on Machine Learning, Jason Weston, Christina Leslie, Eugene Ie, Dengyong Zhou, Andre Eliseeff and William Stafford Noble. "Semi-supervised protein classification using cluster kernels." Bioinformatics. 21(15):3241-3247, 2005.
Nov 8, 2006
Systems biology
(by Brett Barker)

Hiroaki Kitano, Systems Biology: A Brief Overview, Science, 2002, Vol. 295. no. 5560, pp. 1662 – 1664
S. Mangan, and U. Alon, Structure and function of the feed-forward loop network motif. The Proceedings of the National Academy of Sciences, vol. 100, no. 21, page: 11980-11985, 2003
N. Przulj Biological Network Comparison Using Graphlet Degree Distribution, Proceedings of the 2006 European Conference on Computational Biology, ECCB '06, Eilat, Israel, September 10-13, 2006, Bioinformatics, in press, 2006.

Nov 13, 2006
High performance data mining
(by Daniel Leung)

Rakesh Agrawal and John C. Shafer, "Parallel Mining of Association Rules", IEEE Trans. On Knowledge and Data Engineering, 1996
Ruoming Jin, Ge Yang, Gagan Agrawal, "Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance," IEEE Transactions on Knowledge and Data Engineering, vol.
17,  no. 1,  pp. 71-89,  Jan.,  2005.

Nov 15, 2006
Proteomics
(by Mathew Ku)

Nature reviews on proteomics: a group of papers published on nature for reviewing frontiers of proteomics. http://www.nature.com/reviews/focus/proteomics/index.html
Mustafa Kirac, Gultekin Ozsoyoglu, Jiong Yang, “Annotating proteins by mining protein interaction networks”, ISMB’06

Nov 20, 2006
Data Integration
(by Cindy Lin)

Imran Mansuri, Sunita Sarawagi, Integrating unstructured data into relational databases, ICDE’06
Daehee Hwang, Alistair G. Rust, Stephen Ramsey, Jennifer J. Smith, Deena M. Leslie, Andrea D. Weston, Pedro de Atauri, John D. Aitchison, Leroy Hood, Andrew F. Siegel, and Hamid Bolouri, A data integration methodology for systems biology, The Proceedings of the National Academy of Sciences,  vol. 102, no. 48, 17296-17301

Nov 22, 2006
Happy Thanksgiving



Nov 27, 2006
Bionetworks
(by Yi Jia)

Jure Leskovec, Jon Kleinberg, and Christos Faloutsos, “Graphs Over Time: Densification Laws, Shrinking Diameters, and Possible Explanations”, KDD’05
H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery”, in Proc. 2005 Int. Conf. on Intelligent Systems for Molecular Biology (ISMB 2005)
Ye, Osterman, Overbeek, Godzik Automatic Detection of Subsystem/Pathway Variants in Genome Analysis, ISMB 2005