The Second ACM International Workshop

Mining Multiple Information Sources

In conjunction with
The 14th International Conference on Knowledge Discovery and Data Mining
(KDD 2008)

August 24-27, 2008, Las Vegas, USA

[Workshop poster] [MMIS-07]

[Call For Papers]  [Accepted Papers] [Workshop Program]  [Program Committee]

Mining Multiple Information Sources

As data collection sources and channels continuous evolve, mining and correlating information from multiple information sources has become a crucial step in data mining and knowledge discovery. On one hand, comparing patterns from different databases and understanding their relationships can be extremely beneficial for applications such as Bioinformatics, Sensor Networking, and Business Intelligence. In particular, important information such as pattern trends and evolving rules buried in each individual database, are very hard to discover by examining a single dataset only whereas comparatively mining multiple databases will enable users to discover interesting patterns across a set of data collections that would not have been possible otherwise. On the other hand, many data mining and data analysis tasks such as classification, regression, and clustering, can significantly improve their performance if information from different sources can be properly leveraged and if the mining process has the power to survey all the data sources involved.

Unleashing the full power of multiple information sources is, however, a very challenging problem, considering that schemas used to represent each data collections might be different (data heterogeneousity), data distributions and patterns underlying different data sources may undergo continuous changes (concept evolving), and mining tasks for each data source might also be different (mining diversity). Even though existing researches have demonstrated several approaches to utilize multiple information sources, these methods are still rather ad-hoc and inadequately address some of the fundamental research issues in this field: (1) Harnessing Complex Data Relationship: Multiple information sources represent a collection of highly correlated data, issues such as data integration, data integration, model integration, and model transferring across different domains, play fundamental roles in supporting KDD from multiple information sources; (2) Integrative and Cooperative Mining: For heterogeneous information sources with diverse mining tasks, the mining should be able to unify all data to generate enhanced global models, as well as help individual data collections to cooperatively achieve their respective mining goals; and (3) Differentiation and Correlation: Differentiate and coordinate the difference between data sources at the knowledge level is one crucial step for users to gain a high-level understanding of their data.

The aim of this workshop is to bring together data mining experts to revisit the problem of pattern discovery from multiple information sources, and identify and synthesize current needs for such purposes. Representative questions to be addressed include but are not limited to:  

  1. Harnessing Complex Data Relationship
    1. Database similarity assessment
    2. Automatic schema mapping and relationship discovery
    3. New mapping framework for multiple information sources
    4. Data source classification and clustering
    5. Data cleansing, data preparation, data/pattern selection, conflict and inconsistency resolution
  1. Integrative and Cooperative Mining
    1. Model integration for heterogeneous information sources
    2. Mode transferring across different data domains
    3. Incremental and scalable data mining algorithms
    4. Multi-tasks multi-sources co-learning for multiple information sources
  1. Differentiation and Correlation
    1. Local pattern analysis and fusion
    2. Global pattern synthesizing and assessment
    3. Merging local rules for global pattern discovery
    4. Pattern summarization from multiple datasets
    5. Multi-dimensional pattern search and comparison
    6. Pattern comparison across multiple data sources
    7. Inter pattern discovery from complex data sources
  1. Stream data mining algorithms
    1. Clustering and classification of data of changing distributions
    2. Data stream processing, storage, and retrieval systems
    3. Sensor networking
  1. Security and privacy issues in multiple information sources
  2. Interactive data mining systems
    1. Query languages for mining multiple information sources
    2. Query optimization for distributed data mining
    3. Distributed data mining operators in supporting interactive data mining queries


Paper Types    Top

We solicit two types of papers: Regular paper and Short paper (4 pages for short paper and about 8 pages for regular papers inclusive of all references and figures, however, papers up to 12 pages will also be reviewed and included in proceedings).

All papers should be submitted in ACM proceedings format (two columns, 9pt font, approx. 1in margins). Please follow ACM Proceedings guideline in preparing your paper, which can be found at:

We strongly encourage authors to prepare their manuscripts in PDF (preferred) or postscript format. Please ensure that any special fonts used are included in the submitted documents.

The workshop proceedings will be published by the ACM Digital Library and distributed during the workshop

Extended versions of selected workshop papers will be published in an edited book (Springer, pending approval)

Paper Submission    Top


For submission of the paper, please use Easychair system at

Please register at Easychair first if you did not use EasyChair before.

If you are experiencing any difficulties, please contact workshop co-chairs. Upon the receiving of each submission, the workshop co-chairs will organize the peer-review process immediately.

Important Dates    Top

  • May 30, 2008: Submission Due Date
  • June 18, 2008: Author notification
  • TBA: Submission of Camera-ready papers
  • August 24, 2008: Workshop in Las Vegas, CA