CiteULike is a free online bibliography manager. Register and you can start organising your references online.

A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency Export

Computers, IEEE Transactions on, Vol. 58, No. 7. (17 November 2008), pp. 890-901.

Citation Format

[Posts]

View FullText article


X Reviews [Write a review of this article]

X Find related articles from these CiteULike users

X Find related articles with these CiteULike tags

X Posting History

X Abstract

This paper presents the algorithm and implementation of a new high-performance functional unit for floating-point four-dimensional vector inner product (4D dot product; DP4), which is most frequently performed in 3D graphics application. The proposed IEEE-compliant DP4 unit computes Z = AB + CD + EF + GH in one path and keeps the intermediate rounding by IEEE-754 rounding to nearest even. The intermediate rounding is merged with shift alignment, and intermediate carry-propagated addition and normalization are omitted to reduce latency in the proposed architecture. The proposed DP4 unit is implemented with 0.18-mum CMOS technology and has 12.8-ns critical path delay, which is reduced by 45.5 percent compared to a previous DP4 implementation using discrete multipliers and adders. The proposed DP4 unit also reduces the cycle time of 3D graphics applications by 12.4 percent on the average compared to the usual 3D graphics FPU based on four-way multiply-add-fused units.


X BibTeX record

X RIS record


Privacy Statement | Terms & Conditions
CiteULike organises scholarly (or academic) papers or literature and provides bibliographic (which means it makes bibliographies) for universities and higher education establishments. It helps undergraduates and postgraduates. People studying for PhDs or in postdoctoral (postdoc) positions. The service is similar in scope to EndNote or RefWorks or any other reference manager like BibTeX, but it is a social bookmarking service for scientists and humanities researchers.