Research on Information Retrieval

InsightVideo: Towards hierarchical video content organization for efficient browsing, summarization and retrieval

Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access. In this paper we introduce InsightVideo, a video analysis and retrieval system, which joins video content hierarchy, hierarchical browsing and retrieval for efficient video access. We propose several video processing techniques to organize the content hierarchy of the video. We first apply a camera motion classification and key-frame detection strategy that operates in the compressed domain to extract video features. Then, shot grouping, scene detection and pairwise scene clustering strategies are applied to construct the video content hierarchy. We introduce a video similarity evaluation scheme at different levels (key-frame, shot, group, scene, and video.) By integrating the video content hierarchy and the video similarity evaluation scheme, hierarchical video browsing and retrieval are seamlessly integrated for more efficient video content access. We also construct a progressive video retrieval scheme to refine user queries through the interaction of browsing and retrieval. Experimental results and comparisons of camera motion classification, key frame extraction, scene detection, and video retrieval are presented to validate the effectiveness and efficiency of the proposed algorithms and the performance of the system.

The process flow for the InsightVideo system is illustrated in Figure 1. The system consists of three parts:(1) video analysis and feature extraction, (2) hierarchical video content organization, and (3) progressive video content access. To extract video features, a shot segmentation algorithm is applied to each input video. Then, for each segmented shot, the camera motion classification strategy is utilized to qualitatively classify camera motion information. Based on identified motion information, key-frame extraction is executed to select the key-frame(s) for each shot. The detected camera motion and low-level features of the key-frames and shots will be utilized for video similarity evaluation. After the video features have been extracted, the video content table is constructed by shot grouping, scene detection, and scene clustering strategies to generate a three layer video content hierarchy (group, scene, clustered scene). Based on this video content hierarchy and the extracted video features, we propose a progressive video content access scheme in which we first address the video similarity evaluation scheme at different levels and then integrate the hierarchical video browsing and retrieval for video content access and progressive retrieval. Using hierarchical video browsing, a user is provided with an overview of video content from which a query example can be selected. Then, video retrieval is invoked to produce a list of similar units, and the user can browse the content hierarchy of retrieved results to refine the query. By iteratively executing the retrieval and browsing, a user's query can be quickly refined to retrieve the unit of interest.

Figure 1. InsightVideo system architecture

Figure 2. Video shot segmentation results

Figure 3. Camera motion based video retrieval system

Figure 4. Video group detection result

Figure 5. Video scene detection result

Figure 6. Hierarchical video content browsing

Figure 7. Video retrieval joint spatial, temporal, and granularity features of the video

Figure 8. Progressive video content access

 

Joint visual feature and semantic in image retrieval system with relevance feedback

Relevance feedback is a powerful and widely used technique in content-based image retrieval (CBIR) systems. However, most relevance feedback approaches use only weighted feature sum (WFS) of the feedback images to optimize the query for refining image similarity assessment. Such approaches do not work very well in most cases, especially when the user wants to express an "OR" relationship among the queries. We propose three methods, Weighted Distance Sum (WDS), Minimal Distance (MD), and Minimal Distance Rank (MDR), to measure the similarity between images in database and the feedback images in query refinement. After experimental comparisons we propose a relevance feedback scheme using the MDR method and the MD method to describe the user's multiple intentions. Based on this scheme, an image retrieval and semi-automatic annotation system, iFind, which integrates query refinement and semantic information, is presented. Experiments show that the proposed methods can result in substantial improvement in retrieval accuracy and can be especially useful for retrieval or annotating large image databases.

Figure 9. iFind system main interface

Figure 10. Relevance feedback interface