Recent developments in storage technology and network architectures have made it possible and affordable for scientific institutes, commercial enterprises, and government agencies to gather and store data from multiple sources. The increasing globalization has also demanded that many business applications involve storing information at geographically distributed locations for analysis. Examples include market basket transaction data from different branches of a wholesale store, data collections of a particular branch in different time periods, census data of different states in a particular year, and data of a certain state in different years. For years, knowledge discovery and data mining (also referred to as KDD) has proven to be crucial for discovering novel and actionable patterns hidden in the data. Discovering patterns from multiple information sources provides a unique way to reveal complex relationships, such as correlations, contrasts, and similarities across multiple collections.
Although the capability of distributed data storage brings us opportunities to improve the quality of data management and decision making, the nature of these distributed data repositories also generates significant challenges for inter-repository pattern discovery. Here, we list three major ones: (1) how to efficiently identify quality knowledge from a single data source, where patterns reveal local knowledge for each particular data repository, commonly referred to as local patterns; (2) how to integrate and unify multiple information sources into one single view such that previous unseen patterns can be discovered, commonly referred to as global patterns; and (3) how to discover the relationships of the patterns hidden across multiple information sources, where the features of the patterns (such as pattern frequencies and their utilities) across different data repositories define inter-repository relationships, which we refer to as inter patterns.
In the past, researchers proposed many approaches to handle multiple information sources, but solutions have been mainly focused on scaling mining algorithms for the discovery of local patterns and global patterns. The aim of this workshop is to bring together data mining experts to revisit the problem of pattern discovery from multiple information sources, and identify and synthesize current needs for such purposes. Representative questions to be addressed include but are not limited to:
We solicit two types of papers: Regular paper and Short paper (4 pages for short paper and about 8 pages for regular papers inclusive of all references and figures, however, papers up to 12 pages will also be reviewed and included in proceedings).
All papers should be submitted in ACM proceedings format (two columns, 9pt font, approx. 1in margins). Please follow ACM Proceedings guideline in preparing your paper, which can be found at: http://www.acm.org/sigs/pubs/proceed/template.html.
We strongly encourage authors to prepare their manuscripts in PDF (preferred) or postscript format. Please ensure that any special fonts used are included in the submitted documents.
The workshop proceedings will be published by the ACM Digital Library and distributed during the workshop
Interested authors should submit their paper(s) as an email attachment to workshop co-chairs. Upon the receiving of each submission, the workshop co-chairs will organize the peer-review process immediately.