KDD07 Workshop on MMIS

KDD 2007 Workshop on

Mining Multiple Information Sources

In conjunction with
The 13th International Conference on Knowledge Discovery and Data Mining
(KDD 2007)

August 12-15, 2007, San Jose, CA USA

Proceedings [pdf] [MMIS-08]

[Call For Papers] [Accepted Papers] [Workshop Program] [Program Committee]

Mining Multiple Information Sources: Local, Global, and Inter Pattern Discovery

Recent developments in storage technology and network architectures have made it possible and affordable for scientific institutes, commercial enterprises, and government agencies to gather and store data from multiple sources. The increasing globalization has also demanded that many business applications involve storing information at geographically distributed locations for analysis. Examples include market basket transaction data from different branches of a wholesale store, data collections of a particular branch in different time periods, census data of different states in a particular year, and data of a certain state in different years. For years, knowledge discovery and data mining (also referred to as KDD) has proven to be crucial for discovering novel and actionable patterns hidden in the data. Discovering patterns from multiple information sources provides a unique way to reveal complex relationships, such as correlations, contrasts, and similarities across multiple collections.

Although the capability of distributed data storage brings us opportunities to improve the quality of data management and decision making, the nature of these distributed data repositories also generates significant challenges for inter-repository pattern discovery. Here, we list three major ones: (1) how to efficiently identify quality knowledge from a single data source, where patterns reveal local knowledge for each particular data repository, commonly referred to as local patterns; (2) how to integrate and unify multiple information sources into one single view such that previous unseen patterns can be discovered, commonly referred to as global patterns; and (3) how to discover the relationships of the patterns hidden across multiple information sources, where the features of the patterns (such as pattern frequencies and their utilities) across different data repositories define inter-repository relationships, which we refer to as inter patterns.

In the past, researchers proposed many approaches to handle multiple information sources, but solutions have been mainly focused on scaling mining algorithms for the discovery of local patterns and global patterns. The aim of this workshop is to bring together data mining experts to revisit the problem of pattern discovery from multiple information sources, and identify and synthesize current needs for such purposes. Representative questions to be addressed include but are not limited to:

Mining from heterogeneous information sources

Database similarity assessment
Automatic schema mapping and relationship discovery
Data source classification and clustering

Local pattern analysis and fusion

Data cleansing, data preparation, data/pattern selection, conflict and inconsistency resolution
Incremental and scalable data mining algorithms
Stream data mining algorithms

Global pattern synthesizing and assessment

Merging local rules for global pattern discovery
Pattern summarization from multiple datasets
Incremental data mining algorithms for multiple information sources

Inter pattern discovery and comparison

Multi-dimensional pattern search and comparison
Pattern comparison across multiple data sources
Inter pattern discovery from complex data sources

Security and privacy issues in multiple information sources
Interactive data mining systems

Query languages for mining multiple information sources
Query optimization for distributed data mining
Distributed data mining operators in supporting interactive data mining queries

Paper Types

We solicit two types of papers: Regular paper and Short paper (4 pages for short paper and about 8 pages for regular papers inclusive of all references and figures, however, papers up to 12 pages will also be reviewed and included in proceedings).

All papers should be submitted in ACM proceedings format (two columns, 9pt font, approx. 1in margins). Please follow ACM Proceedings guideline in preparing your paper, which can be found at: http://www.acm.org/sigs/pubs/proceed/template.html.

We strongly encourage authors to prepare their manuscripts in PDF (preferred) or postscript format. Please ensure that any special fonts used are included in the submitted documents.

The workshop proceedings will be published by the ACM Digital Library and distributed during the workshop

Submission

Interested authors should submit their paper(s) as an email attachment to workshop co-chairs. Upon the receiving of each submission, the workshop co-chairs will organize the peer-review process immediately.

Important Dates

June 4, 2007 (extended): Submission Due Date
June 20, 2007: Author notification
June 22, 2007: Submission of Camera-ready papers
August 12, 2007: Workshop in San Jose, CA