![]() |
CiteULike | ![]() |
tatemura's CiteULike | ![]() |
![]() |
|
![]() |
Register | ![]() |
Log in | ![]() |
Scalable regular expression matching on data streamsIn SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data (2008), pp. 161-172.
|
Reviews
[Write a review of this article]
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
Posting History
AbstractRegular Expression (RE) matching has important applications in the areas of XML content distribution and network security. In this paper, we present the end-to-end design of a high performance RE matching system. Our system combines the processing efficiency of Deterministic Finite Automata (DFA) with the space efficiency of Non-deterministic Finite Automata (NFA) to scale to hundreds of REs. In experiments with real-life RE data on data streams, we found that a bulk of the DFA transitions are concentrated around a few DFA states. We exploit this fact to cache only the frequent core of each DFA in memory as opposed to the entire DFA (which may be exponential in size). Further, we cluster REs such that REs whose interactions cause an exponential increase in the number of states are assigned to separate groups -- this helps to improve cache hits by controlling the overall DFA size.
BibTeX record
RIS record