Multi-Classifier Systems for Effective Mining From Noisy Data Streams

 

Dynamic Classifier Selection for Effective Mining from Noisy Data Streams

Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular solution is to separate stream data into chunks, learn a base classifier from each chunk, and then integrate all base classifiers for effective classification. In this paper, we propose a new dynamic classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams. The proposed algorithm dynamically selects a single "best" classifier to classify each test instance at run time. Our scheme uses statistical information from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative studies demonstrate the efficiency and efficacy of our method. Such a DCS scheme appears to be promising in mining data streams with dramatic concept drifting or with a significant amount of noise, where the base classifiers are likely conflictive or have low confidence.

In this section, we present a new dynamic classifier selection scheme, called AO-DCS (Attribute-Oriented Dynamic Classifier Selection). We use attribute values of instances to partition the evaluation set into subsets for evaluation purpose. If the instances in a dataset have only one attribute, and we use this attribute to partition the evaluation set into disjoint subsets with each subset corresponding to one value of this attribute, the classification accuracy of each base classifier with these subsets is the performance of the classifier with the instances characterized by each attribute value. Then each base classifier's performance will reflect its domain of expertise. Our AO-DCS takes the following three steps.
1. Statically partition the evaluation set into subsets by using the attribute values of the instances, as shown in Fig. 1. We denote the aggregation of all constructed subsets by R.
2. Evaluate the classification accuracy of each base classifier on all subsets in R, as shown in Fig. 2. We call this accuracy "attribute-oriented" classification accuracy.
3. For a test instance, use its attribute values to select the corresponding subsets from R, and select the base classifier Cj that has the highest classification accuracy from the selected subsets as the "best" classifier to classify the test instance, as shown in Fig. 3.

 

Fig.2. Evaluating each base classifier on constructed subsets

 

Fig. 3. Dynamic classifier selection of AO-DCS

 

In many applications, data is not static but arrives in data streams, and the stream data is also characterized by drifting concepts. In other words, the underlying data generation models, or the concepts that we try to learn from the data, are constantly evolving, even dramatically. Meanwhile, real-world stream data is never perfect and can often suffer from corruptions (noise) that may impact interpretations of the data, models created from the data and decisions made based on the data. Due to the inherent huge volume of the size, it is actually hard to apply general data cleansing mechanisms to stream data for better data quality. Therefore, the above two facts (concept drifting and noise) imply that the classifiers learned from a small portion of the stream may vary significantly in performance, which makes DCS a promising solution for effective mining from a real-world data stream. In this section, we propose a framework to apply AO-DCS to mine noisy data streams. This framework is general enough to allow any exiting learning algorithms to be incorporated to handle any real-world data streams.

As shown in Fig. 4, we first partition streaming data into a series of chunks, S1, S2, .. Si,.., each of which is small enough to be processed by an induction algorithm at one time. Then we learn a base classifier Ci from each chunk Si. To evaluate all base classifiers (in the case that the number of base classifiers is too large, we can keep only the most recent K classifiers) and determine the "best" one for each test instance, we will dynamically construct an evaluation set Z (using the most recent instances, because they are likely consistent with the current test instances). When classifying a test instance, Ik, we will employ AO-DCS (and the evaluation set Z) to integrate existing base classifiers and select the "best" classifier to classify Ik.

In real-world situations, many data streams contain a certain level of noise. There are two common types of noise: attribute noise and class noise. In this paper, we assume the data stream suffers from a certain level of class noise (which means the errors are introduced in the class labels), and will extensively evaluate the performance of different DCS schemes in handling noisy data, where various levels of class noise are manually introduced (before the data partitioning) to simulate real-world scenarios. This should provide interested readers with valuable knowledge about mining from real-world data streams.

Figure 4. Applying DCS in noisy data streams