Schedules:
| Date |
Topic |
Slides |
Related
Papers |
Notes |
| Aug.
21, 2006 |
Introduction |
intro.ppt |
|
open
for registration |
| Aug.
23, 2006 |
Association
rules |
asso.ppt |
R.
Agrawal, T. Imielinski, and A. Swami. Mining
association rules between sets of items in large databases. SIGMOD,
207-216, 1993.
R. Agrawal and R. Srikant. Fast
algorithms for mining association rules. VLDB, 487-499, 1994.
J. Han, J. Pei, and Y. Yin. Mining
frequent patterns without candidate generation. SIGMOD, 1-12,
2000. |
last day for registration |
| Aug. 28, 2006 |
Association rules 2 |
asso2.ppt |
|
|
| Aug 30, 2006 |
Association rules 3 |
asso3.ppt |
|
Presentation papers are
posted. Topic
selection is due Sept. 11. First come first select |
Sept
3, 2006
|
No
class,
Happy Labor Day
|
|
|
|
| Sept
6, 2006 |
Microarray
I |
microarrayI.ppt |
CH
14. of the book: Bioinformatics: Genes, Proteins, and Computers,
Christine Orengo, David Jones, Janet Thornton edit, Bios Scientific
Publishers,
2003. (ISBN: 1-85996-0545)
Gao Cong, Anthony K. H. Tung, Xin Xu, Feng Pan, Jiong Yang, FARMER:
Finding Interesting Rule Groups in Microarray Datasets,
SIGKDD'04
Feng Pan, Gao Cong, Anthony K. H. Tung, Jiong Yang, Mohammed J. Zaki, Carpenter:
finding
closed patterns in long biological datasets, SIGMOD'03 |
Presentation
paper selection is due this Friday. |
| Sept 11, 2006 |
Sequential pattern |
spatt.ppt |
R. Agrawal and R. Srikant.
"Mining
sequential patterns". ICDE'95, 3-14, Taipei, Taiwan.
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan:
Mining Sequential Patterns Efficiently by Prefix-Projected Pattern
Growth",
ICDE'01 Heidelberg, Germany. |
|
| Sept 13, 2006 |
Protein sequence
motifs |
motif.ppt |
Sigrist C.J.A., Cerutti
L., Hulo N., Gattiker
A., Falquet L., Pagni M., Bairoch A., Bucher P. PROSITE:
a documented database using patterns and profiles as motif descriptors
Brief Bioinform. 3:265-274(2002).
Hulo N., Bairoch A., Bulliard V., Cerutti L., De Castro
E.,Langendijk-Genevaux
P.S., Pagni M., Sigrist C.J.A. The
Prosite database, Nucleic Acids Res. 34:D227-D230(2006). |
|
| Sept
18, 2006 |
Pattern
mining in trees and graphs |
tree_graph.ppt |
Michihiro
Kuramochi, George Karypis, Frequent
Subgraph Discovery. In: Proceedings of the 2001 International
Conference
on Data Mining (ICDM2001), 2001.
Xifeng Yan, Jiawei Han. gSpan:
Graph-Based Substructure Pattern Mining. In: Proceedings of the
2002
International Conference on Data Mining (ICDM2002), 2002. |
|
| Sept
20, 2006 |
Mining
molecular structures |
protein.ppt |
Christian
Borgelt, Michael R. Berthold. Mining
Molecular Fragments: Finding Relevant Substructures of Molecules.
In:
Proceedings of the 2002 IEEE International Conference on Data Mining
(ICDM2002),
2002.
Srinivasan Parthasarathy, Matt Coatney: Efficient
Discovery of Common Substructures in Macromolecules. ICDM 2002:
362-369 |
Project
assignment distributed |
| Sept 25, 2006 |
Data types |
data.ppt |
Chapter 2, Introduction to
Data Mining,
by Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Publisher:
Addison-Wesley,
ISBN-10:0321321367 |
| Sept 27, 2006 |
Clustering |
clustering.ppt |
|
|
| Oct
2, 2006 |
Subspace
clustering |
clusteringII.ppt |
R.
Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic
subspace clustering of high dimensional data for data mining
applications.
SIGMOD'98
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A
density-based algorithm for discovering clusters in large spatial
databases.
KDD, 226-231, 1996. |
|
| Oct
4, 2006 |
Model-based
clustering |
biclusterin.ppt |
Yizong
Cheng, George M. Church, Biclustering
of Expression Data, ISBM'00
J. Yang, W. Wang, H. Wang, and P. Yu. Delta-cluster:
capturing subspace correlation in a large data set. ICDE, 517-528,
2002. |
|
| Oct
9, 2006 |
PCA |
pca.ppt
|
Yeung
KY Ruzzo WL, Principal
component analysis for clustering gene expression data,
Bioinformatics.
2001 Sep;17(9):763-74.
William Dillon and Matthew Goldstein, Multivariate analysis, 1984 |
|
Oct 11, 2006
|
Graph mining in analyzing
gene
expression
|
mgraph.ppt |
H. Hu, X.
Yan, Yu, J. Han and X. J. Zhou,Mining
coherent dense subgraphs across massive biological networks for
functional
discovery, ISMB'05.
X. Yan, X.
Jasmine Zhou, and J. Han, Mining
closed
relational graphs with connectivity constraints, by SIGKDD'05.
|
|
Oct
16, 2006
|
Text
mining and gene
ontology |
ontology.ppt
|
|
|
| Oct
18, 2006 |
Classification
overview |
classification.ppt
|
S.
K. Murthy. Automatic
construction of decision trees from data: A multi-disciplinary survey,
data mining and knowledge discovery. KDD Journal, 2(4), 345-389,
1998. |
|
| Oct 23, 2006 |
Classification 2
FLD, SVM |
classificationII.ppt
|
C. J. C. Burges. A
Tutorial on Support Vector Machines for Pattern Recognition. Data
Mining and Knowledge Discovery, 2(2), 121-168, 1998. |
|
| Oct 25, 2006 |
Classification 3
Rule based classifier |
classificationIII.ppt |
B. Liu, W. Hsu, and Y.
Ma. Integrating
classification and association rule mining. KDD, 1998
Gao Cong, Kian-Lee Tan, Anthony K. H. Tung, Xin Xu. "Mining
Top-k Covering Rule Groups for Gene Expression Data". SIGMOD'05,
2005. |
|
| Oct
28, 2006 |
Graph
models |
graphModels.ppt
|
Dana
Pe'er, Bayesian
Network Analysis of Signaling Networks: A
Primer , Sci. STKE, 26 April 2005
Vol. 2005, Issue 281, p.
pl4
A brief introduction to Bayesian networks,
http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html. |
|
Nov
1, 2006
|
Semi-supervised
learning
|
semiLearning.ppt
|
K.
Wagstaff, C. Cardie, S. Rogers, and S. Schroedl. Constrained
K-means Clustering with
Background Knowledge. In ICML’01
|
|
Nov
6, 2006
|
Kernel
classifier
(by Bin Han)
|
|
Darrin
P. Lewis, Tony
Jebara and William Stafford Noble.
"Nonstationary
kernel combination." Proceedings of the International
Conference on Machine Learning, Jason Weston, Christina
Leslie, Eugene Ie, Dengyong Zhou, Andre Eliseeff and William Stafford
Noble.
"Semi-supervised
protein classification using cluster kernels." Bioinformatics.
21(15):3241-3247, 2005. |
|
Nov
8, 2006
|
Systems
biology
(by Brett Barker)
|
|
Hiroaki
Kitano, Systems
Biology: A Brief Overview, Science, 2002, Vol.
295. no. 5560, pp. 1662 – 1664
S. Mangan, and U. Alon, Structure
and function of the feed-forward
loop network motif. The
Proceedings of the National Academy of Sciences, vol. 100, no.
21, page: 11980-11985, 2003
N. Przulj Biological
Network Comparison Using Graphlet Degree Distribution,
Proceedings of
the 2006 European Conference on
Computational
Biology, ECCB '06, Eilat, Israel, September 10-13, 2006,
Bioinformatics, in
press,
2006. |
|
Nov
13, 2006
|
High
performance data mining
(by Daniel Leung)
|
|
Rakesh
Agrawal and John C.
Shafer, "Parallel
Mining of Association Rules", IEEE Trans. On
Knowledge and Data Engineering, 1996
Ruoming Jin, Ge Yang, Gagan Agrawal, "Shared
Memory Parallelization of Data Mining Algorithms: Techniques,
Programming Interface, and Performance," IEEE Transactions on
Knowledge and Data Engineering, vol.
17, no. 1, pp. 71-89, Jan., 2005.
|
|
Nov
15, 2006
|
Proteomics
(by Mathew Ku)
|
|
Nature
reviews on
proteomics: a group of papers published on nature for reviewing
frontiers of proteomics.
http://www.nature.com/reviews/focus/proteomics/index.html
Mustafa Kirac, Gultekin Ozsoyoglu, Jiong Yang, “Annotating
proteins by
mining protein interaction networks”, ISMB’06
|
|
Nov
20, 2006
|
Data
Integration
(by Cindy Lin)
|
|
Imran
Mansuri, Sunita
Sarawagi, Integrating unstructured data into
relational databases, ICDE’06
Daehee Hwang, Alistair
G. Rust, Stephen Ramsey,
Jennifer J. Smith, Deena M. Leslie, Andrea D. Weston, Pedro de Atauri,
John D.
Aitchison, Leroy Hood, Andrew F. Siegel, and Hamid Bolouri, A
data integration methodology for systems
biology, The Proceedings of the
National Academy of Sciences, vol.
102, no. 48, 17296-17301
|
|
Nov 22, 2006
|
Happy Thanksgiving
|
|
|
|
Nov
27, 2006
|
Bionetworks
(by Yi Jia)
|
|
Jure
Leskovec, Jon
Kleinberg, and Christos Faloutsos, “Graphs Over Time:
Densification
Laws, Shrinking Diameters, and Possible Explanations”, KDD’05
H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining
Coherent Dense
Subgraphs across Massive Biological Networks for Functional Discovery”,
in Proc. 2005 Int. Conf. on Intelligent Systems for Molecular Biology
(ISMB 2005)
Ye, Osterman, Overbeek, Godzik Automatic
Detection of Subsystem/Pathway
Variants in Genome Analysis,
ISMB 2005
|
|
|