CiteULike is a free online bibliography manager. Register and you can start organising your references online.

Semantic Data De-duplication for archival storage systems Export

Computer Systems Architecture Conference, 2008. ACSAC 2008. 13th Asia-Pacific In Computer Systems Architecture Conference, 2008. ACSAC 2008. 13th Asia-Pacific (2008), pp. 1-9.

Citation Format

[Posts]

View FullText article


dmeister's tags for this article

2008 datadeduplication

X Reviews [Write a review of this article]

X Notes for this article

dmeister has 0 private notes and 1 public note for this article.

Extremlly similar to the ADMAD system

dmeister (public note) - 2009-06-23 15:36:58

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

In archival storage systems, there is a huge amount of duplicate data or redundant data, which occupy significant extra equipments and power consumptions, largely lowering down resources utilization (such as the network bandwidth and storage) and imposing extra burden on management as the scale increases. So data de-duplication, the goal of which is to minimize the duplicate data in the inter-file level, has been receiving broad attention both in academic and industry in recent years. In this paper, semantic data de-duplication (SDD) is proposed, which makes use of the semantic information in the I/O path (such as file type, file format, application hints and filesystem metadata) of the archival files to direct the dividing a file into semantic chunks (SC). While the main goal of SDD is to maximally reduce the inter-file level duplications, directly storing variable SCes into disks will result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. So an efficient data storage scheme is also designed and implemented: SCes are further packaged into fixed sized Objects, which are actually the storage units in the storage devices, so as to speed up the I/O performance as well as ease the data management. Primary experiments have demonstrated that SDD can further reduce the storage space compared with current methods (from 20% to near 50% according to different datasets), and largely improves the writing performance (about 50%-70% in average).


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.