Content-based access to video objects: Temporal segmentation, visual summarization, and feature extraction


Gunsel B. , TEKALP A., VAN BEEK P.

SIGNAL PROCESSING, cilt.66, ss.261-280, 1998 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 66 Konu: 2
  • Basım Tarihi: 1998
  • Doi Numarası: 10.1016/s0165-1684(98)00010-3
  • Dergi Adı: SIGNAL PROCESSING
  • Sayfa Sayıları: ss.261-280

Özet

The classical approach to content-based video access has been 'frame-based', consisting of shot boundary detection, followed by selection of key frames that characterize the visual content of each shot, and then clustering of the camera shots to form story units. However, in an object-based multimedia environment, content-based random access to individual video objects becomes a desirable feature. To this effect, this paper introduces an 'object-based' approach to temporal video partitioning and content-based indexing, where the basic indexing unit is 'lifespan of a video object', rather than a 'camera shot' or a 'story unit'. We propose to represent each video object by an adaptive 2D triangular mesh. A mesh-based object tracking scheme is then employed to compute the motion trajectories of all mesh node points until the object exits the field of view. A new similarity measure that is based on motion discontinuities and shape changes of the tracked object is defined to detect content changes, resulting in temporal lifespan segments. A set of 'key snapshots' which constitute a visual summary of the lifespan of the object is automatically selected. These key snapshots are then used to animate objects of interest using tracked motion trajectories for a moving visual representation. The proposed scheme provides such functionalities as object-based search/browsing for interactive video retrieval, surveillance video analysis, and object-based content manipulation/editing for studio postprocessing and desktop multimedia authoring. The approach is applicable to any video data where the initial appearance of object(s) can be specified, and the object motion can be modeled by a piecewise affine transformation. The system is demonstrated using different types of video: virtual studio productions (composited video), surveillance video, and TV broadcast video. (C) 1998 Elsevier Science B.V. All rights reserved.