KDD 2007 Workshop on

Mining Multiple Information Sources

In conjunction with
The 13th International Conference on Knowledge Discovery and Data Mining
(KDD 2007)

August 12-15, 2007, San Jose, CA USA

Proceedings [pdf] [MMIS-08]

[Call For Papers]  [Accepted Papers] [Workshop Program]  [Program Committee]

Mining Multiple Information Sources: Local, Global, and Inter Pattern Discovery    

Recent developments in storage technology and network architectures have made it possible and affordable for scientific institutes, commercial enterprises, and government agencies to gather and store data from multiple sources. The increasing globalization has also demanded that many business applications involve storing information at geographically distributed locations for analysis. Examples include market basket transaction data from different branches of a wholesale store, data collections of a particular branch in different time periods, census data of different states in a particular year, and data of a certain state in different years. For years, knowledge discovery and data mining (also referred to as KDD) has proven to be crucial for discovering novel and actionable patterns hidden in the data. Discovering patterns from multiple information sources provides a unique way to reveal complex relationships, such as correlations, contrasts, and similarities across multiple collections.

Although the capability of distributed data storage brings us opportunities to improve the quality of data management and decision making, the nature of these distributed data repositories also generates significant challenges for inter-repository pattern discovery. Here, we list three major ones: (1) how to efficiently identify quality knowledge from a single data source, where patterns reveal local knowledge for each particular data repository, commonly referred to as local patterns; (2) how to integrate and unify multiple information sources into one single view such that previous unseen patterns can be discovered, commonly referred to as global patterns; and (3) how to discover the relationships of the patterns hidden across multiple information sources, where the features of the patterns (such as pattern frequencies and their utilities) across different data repositories define inter-repository relationships, which we refer to as inter patterns.

In the past, researchers proposed many approaches to handle multiple information sources, but solutions have been mainly focused on scaling mining algorithms for the discovery of local patterns and global patterns. The aim of this workshop is to bring together data mining experts to revisit the problem of pattern discovery from multiple information sources, and identify and synthesize current needs for such purposes. Representative questions to be addressed include but are not limited to:

  1. Mining from heterogeneous information sources
    1. Database similarity assessment
    2. Automatic schema mapping and relationship discovery
    3. Data source classification and clustering
  1. Local pattern analysis and fusion
    1. Data cleansing, data preparation, data/pattern selection, conflict and inconsistency resolution
    2. Incremental and scalable data mining algorithms
    3. Stream data mining algorithms
  1. Global pattern synthesizing and assessment
    1. Merging local rules for global pattern discovery
    2. Pattern summarization from multiple datasets
    3. Incremental data mining algorithms for multiple information sources
  1. Inter pattern discovery and comparison
    1. Multi-dimensional pattern search and comparison
    2. Pattern comparison across multiple data sources
    3. Inter pattern discovery from complex data sources
  1. Security and privacy issues in multiple information sources
  2. Interactive data mining systems
    1. Query languages for mining multiple information sources
    2. Query optimization for distributed data mining
    3. Distributed data mining operators in supporting interactive data mining queries


Paper Types    Top

We solicit two types of papers: Regular paper and Short paper (4 pages for short paper and about 8 pages for regular papers inclusive of all references and figures, however, papers up to 12 pages will also be reviewed and included in proceedings).

All papers should be submitted in ACM proceedings format (two columns, 9pt font, approx. 1in margins). Please follow ACM Proceedings guideline in preparing your paper, which can be found at: http://www.acm.org/sigs/pubs/proceed/template.html.

We strongly encourage authors to prepare their manuscripts in PDF (preferred) or postscript format. Please ensure that any special fonts used are included in the submitted documents.

The workshop proceedings will be published by the ACM Digital Library and distributed during the workshop


Submission    Top


Interested authors should submit their paper(s) as an email attachment to workshop co-chairs. Upon the receiving of each submission, the workshop co-chairs will organize the peer-review process immediately.

Important Dates    Top

  • June 4, 2007 (extended): Submission Due Date
  • June 20, 2007: Author notification
  • June 22, 2007: Submission of Camera-ready papers
  • August 12, 2007: Workshop in San Jose, CA