CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Description-driven media resource adaptation Export

Ghent University (7 February 2007), pp. 1-280.

Citation Format

[Posts]

View FullText article


wmdeneve's tags for this article

adaptation avc bflavor bsd bsdl content_adaptation h264 mpeg-21 stx video xml xslt

X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

The last decade has witnessed a significant number of innovative developments in the multimedia ecosystem. Advanced media formats have emerged for the efficient representation, storage, and presentation of digital media resources. New network technologies have been devised, wired and wireless, providing access to audio-visual information services such as online music stores, movie download services, and video blogs. A plethora of networked mobile devices has popped up as well, ranging from cell phones to personal entertainment devices, often having sufficient processing power for the playback of multimedia presentations. From these observations, it is clear that the multimedia landscape is characterized by a vast diversity in terms of media formats, network capabilities, and device properties. The ever-increasing heterogeneity in the multimedia consumption chain poses a number of new challenges. One such challenge is the realization of the Universal Multimedia Access paradigm, which is the notion that multimedia content should be accessible at any place, at any time, and with any device. As acknowledged by the Moving Picture Experts Group (MPEG), the successful realization of ubiquitous and seamless access to multimedia content requires an appropriate reaction from different knowledge domains. The answer of MPEG's coding community consists of the specification of scalable or layered coding schemes. Indeed, the picture rate and spatial resolution of scalable video resources can for instance be adapted in a straightforward way to meet the different constraints that are imposed by a particular usage context (e.g., constraints in terms of available bandwidth, screen resolution, and so on). The answer of MPEG's metadata community consists of the development of a number of description tools. These tools are for example used to describe the properties of media resources and the capabilities or constraints of usage environments. The resulting descriptions enable the construction of a format-agnostic content adaptation system that is able to maximize the user experience, for the consumption of a particular multimedia presentation in a well-defined usage environment. In this dissertation, we have first studied a number of concepts and design principles of the state-of-the-art H.264/MPEG-4 Advanced Video Coding standard, typically abbreviated as H.264/AVC. The H.264/AVC specification incorporates the latest advances in standardized video coding technology. As demonstrated by our experiments, as well as by other scientific and technical sources, H.264/AVC provides up to 50% bit rate savings for equivalent perceptual quality relative to the performance of prior video coding standards. The design of the H.264/AVC standard is, besides efficiency, also characterized by a flexibility for use over a broad variety of network types and application domains (efficient and flexible are two terms that are often used in the context of H.264/AVC). In this context, we have reviewed several content adaptation tools that are part of the initial version of the H.264/AVC specification. Examples include the exploitation of multi-layered temporal scalability and the use of Flexible Macroblock Ordering (FMO) for region of interest coding. The emphasis of our review was put on providing a complete and detailed picture regarding H.264/AVC's temporal adaptivity provisions. As such, our overview contains an extensive discussion with respect to the use of sub-sequences and sub-sequence layers, coding patterns based on hierarchical bidirectionally coded pictures (B pictures), and Supplemental Enhancement Information messages (SEI messages) for communicating the bitstream structure to a bitstream extractor or decoder. Sub-sequences are employed to constrain H.264/AVC's coding flexibility in a minimal way. This allows the execution of meaningful adaptations in the temporal dimension. More powerful adaptivity features and SEI messages are incorporated in a newly developed amendment to the H.264/AVC specification, which is commonly referred to as H.264/AVC Scalable Video Coding (SVC). This amendment includes explicit support for spatial and quality scalability; temporal adaptivity tools are inherited from the first version of the H.264/AVC standard. Further, the principles of Bitstream Syntax Description-driven (BSD-driven) content adaptation were also discussed in this dissertation. A BSD contains a description of the high-level structure of a binary media resource, typically expressed using the eXtensible Markup Language (XML). This XML-based description, i.e. the BSD, can be transformed to reflect a desired adaptation of the binary media resource. The transformed BSD can subsequently be used to automatically create an adapted media resource by relying on a format-independent content adaptation engine. This adapted media resource is then suited for consumption in a particular usage environment. Two different approaches for BSD-driven content adaptation were studied in more detail: a standardized framework driven by the MPEG-21 Bitstream Syntax Description Language (MPEG-21 BSDL) and a framework based on the use of the Formal Language for Audio-Visual Object Representation, extended with XML features (XFlavor). Both technologies provide different means for the automatic translation of the structure of a binary media resource into an XML-based BSD, and for the subsequent generation of a tailored bitstream using a transformed BSD. The high-level structure of a number of common video coding and container formats was described using MPEG-21 BSDL and XFlavor. Particular attention was paid to the construction of a description in MPEG-21 BSDL for the first version of H.264/AVC, as this effort exposed a few shortcomings in the schema language in question, requiring the development of a number of non-normative extensions to MPEG-21 BSDL. Besides testing the expressive power of MPEG-21 BSDL and XFlavor, we also evaluated their performance in the context of the different media formats described, targeting applications such as BSD-driven temporal adaptation and demultiplexing. Our experiments resulted in the identification of several performance bottlenecks, in particular the slow and memory-consuming generation of BSDs using BSDL's BintoBSD Parser (which allows the format-agnostic and automatic generation of BSDs), the verbose BSDs produced by XFlavor-based parsers, and the memory-consuming transformation of BSDs using eXtensible Stylesheet Language Transformations (XSLT). The performance issues of BSDL's BintoBSD Parser are due to the storage of an entire BSD in the system memory, needed to correctly steer the processing behavior of this parser. To enable a more efficient generation of BSDs, we have proposed BFlavor (BSDL + XFlavor), a new description tool that is the result of a cross-fertilization between MPEG-21 BSDL and XFlavor. Indeed, BFlavor harmonizes BSDL and XFlavor by combining their strengths and by eliminating their weaknesses. In particular, the processing efficiency and expressive power of XFlavor, together with the ability of BSDL to create high-level BSDs, were our key motives for its development. As such, the use of BFlavor-based BSD producers, which are format-specific but generated automatically by a format-independent process, is an efficient alternative to the use of BSDL's format-neutral but inefficient BintoBSD Parser. The development of BFlavor can be considered the main contribution of this research. The expressive power and performance of a hybrid, BFlavor-driven content adaptation chain, compared to tool chains entirely based on either BSDL or XFlavor, were illustrated by several experiments. One series of experiments particularly targeted the exploitation of multi-layered temporal scalability in H.264/AVC, paying special attention to the combined use of sub-sequences and SEI messages. BFlavor was the only tool to offer an elegant and practical solution for the BSD-driven adaptation of H.264/AVC bitstreams in the temporal domain. In this dissertation, we have also outlined the BSD-based construction of placeholder slices and pictures for a number of video coding formats. These artificial syntax structures allow to eliminate a number of unwanted side-effects, resulting from a BSD-driven content adaptation step in the compressed domain. The use of placeholder slices and pictures was discussed in more detail in the context of the BSD-based exploitation of temporal and Region Of Interest (ROI) scalability in the first edition of the H.264/AVC standard, respectively providing a solution for synchronization and conformance issues. A final contribution consists of introducing a real-time work flow for the BSD-driven adaptation of H.264/AVC bitstreams in the temporal domain. The key technologies used were BFlavor for the efficient generation of BSDs, Streaming Transformations for XML (STX) for the efficient transformation of BSDs, and BSDL's format-neutral BSDtoBin Parser for the efficient construction of tailored video bitstreams. Extensive performance data were provided for several use cases, involving the exploitation of temporal scalability by dropping slices, the enhanced exploitation of temporal scalability by relying on placeholder slices, and the creation of video skims (i.e., video summaries). The latter application is made possible by enriching a BSD with additional metadata to steer the BSD transformation process. To conclude, we hope we have convinced the reader that this dissertation, although limited in its scope, contributed to bridging the gap between content and context, supporting the vision that in the end, the user, and not the terminal and the network, is to be considered the real point of attention in the multimedia consumption chain.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.