Research on video data mining from the association perspective

 

Video Data Mining: Semantic Indexing and Event Detection from the Association Perspective

Advances in the media and entertainment industries, including streaming audio and digital TV, present new challenges for managing and accessing large audio-visual collections. Current content management systems support retrieval using low-level features, such as motion, color, and texture. However, low-level features often have little meaning for naïve users, who much prefer to identify content using high-level semantics or concepts. This creates a gap between systems and their users that must be bridged for these systems to be used effectively. To this end, in this paper we first present a knowledge-based video indexing and content management framework for domain specific videos (using basketball video as an example). We will provide a solution to explore video knowledge by mining associations from video data. The explicit definitions and evaluation measures (e.g., temporal support and confidence) for video associations are proposed, by integrating the inherent feature of video data. Our approach uses video processing techniques to find visual and audio cues (e.g. court field, camera motion activities, and applause), introduces multilevel sequential association mining to explore associations among the audio and visual cues, classifies the associations by assigning each of them with a class label, and uses their appearances in the video to construct video indices. Our experimental results demonstrate the performance of the proposed approach.

While the strategies presented in this paper are specific to basketball videos, mining associations for video knowledge exploration is an essential idea we want to convey here. From this point of view, further research could be conducted on the following aspects: (1) extend the current framework to other domains and evaluate the performance of the video mining algorithm in environments with more events. We believe the most promising domains is the surveillance video, where the routine vehicles in security areas normally comply with some associations like enter ® stop ® drop off ® leave, and a vehicle which does not comply with this association might be problematic and deserves a further investigation. However, due to the inherent differences between different video domains (e.g. the concept of shot and video text do not exist in surveillance videos), we may need more activities to analyze the video content details for association mining, e.g., extract trails and status of moving objects to characterize associations; (2) we have adopted various video processing techniques to explore visual and audio cues for association mining, and it will inevitably incur information loss from the original video sequences to transferred symbolic streams, more studies are needed to address this issue in the mining activities; and (3) the mining algorithms in this paper are mainly derived from the existing data mining schemes (with some extensions for video mining scenarios), extensive studies are needed to explore efficient mining algorithms which are unique for mining knowledge from video data


Figure 1. Knowledge-based basketball video database management

 


Figure 2. System framework for association-based video data mining

Mining Video Associations for Efficient Database Management

To support more efficient video database manage-ment, this paper explores the concept of video asso-ciation mining, with which the association patterns are characterized by sequentially associated video shots and their cluster information. Given detected shots of video V, we first cluster them into visually distinct groups, and then construct a sequential se-quence by integrating the temporal order and cluster type of each shot. An association mining scheme is designed to mine sequentially associated clusters from the sequence. Those detected associations will convey valuable knowledge for video content management. Finally, we discuss potential applications of video as-sociations, and propose an association-based video summarization scheme. The experimental results demonstrate the effectiveness of our strategies.

In this paper, we will address the new research area of video association mining. We will present a definition for video associa-tion, and design a video association mining algorithm. As shown in Fig. 3, to mine associations from given video V, we first group its shots into different clusters, each of which consists of visually similar shots. Hopefully, these clusters will help us in determining the relationships among video shots. Then, we sequentially assem-ble the cluster information of each shot by its temporal order to form a shot cluster sequence. The non-relational video database is hereby transformed into a relational dataset. We mine sequential associations from the sequence to find clusters with strong correla-tions. These strongly associated clusters will convey abundant video knowledge and could be easily applied in many potential application systems.

Figure 3. Video association mining - system architecture

The Existence of Video Associations

Generally, there are two kinds of videos in our daily life: videos with some content structure and videos without any content struc-ture. The former are videos such as movies and news where scenarios are used to convey video content. They are usually edited (or postprocessed) by editors (or directors), where various kinds of shots are packed as scenes to convey video scenarios, as shown in Fig. 4. There are two typical video scenes: (1) scenes that consist of visually similar shots (the shots are taken from different viewpoints of the same objects), as demonstrated in Fig.4 (a); and (2) scenes that consist of visually distinct shots (the shots are taken from different objects), as shown in Fig. 4(b)-(c).
In the first type of scenes, most video shots are visually similar. Take Fig.4 (a) as an example, if we denote each of shots by "A", all shots form a sequence "AAAAAAA", and the self-coherence of "A" indicates an inherent sequential association of "A" and itself. We name this type of association as an intra-association, which means that all items in the association are the same, as demon-strated in Fig.5 (a). In the second type of scenes, sequential asso-ciation exists too. In the dialog scene of Fig.4, if we denote the actor by "A", the actress by "B" and the shot with both of them by "C", all shots form a sequence "ABABACAB", and the co-occurrence of "A" and "B" implies a certain association among them. We name this type of association as an inter-association, which means items in the association are different, as shown in Fig.5 (b). A similar association pattern from transaction databases has been given in [Agrawal and Srikant, 1995], where an example of such a pattern is that a customer typically rents "Start Wars", then "Empire Strike Back", and then "Return of the Jedi". Even if these transactions may not be consecutive, they usually happen sequentially. This fact demonstrates the existence of associations in video data, especially for videos with content structure informa-tion.