Facet discovery for structured web search: a query-log mining approach
In recent years, there has been a strong trend of incorporating results from structured data sources into keyword-based web search systems such as Bing or Amazon. When presenting structured data, facets are a powerful tool for navigating, refining, and grouping the results. For a given structured data source, a fundamental problem in supporting faceted search is finding an ordered selection of attributes and values that will populate the facets. This creates two sets of challenges. First, because of the limited screen real-estate, it is important that the top facets best match the anticipated user intent. Second, the huge scale of available data to such engines demands an automated unsupervised solution. In this paper, we model the user faceted-search behavior using the intersection of web query-logs with existing structured data. Since web queries are formulated as free-text queries, a challenge in our approach is the inherent ambiguity in mapping keywords to the different possible attributes of a given entity type. We present an automated solution that elicits user preferences on attributes and values, employing different disambiguation techniques ranging from simple keyword matching, to more sophisticated probabilistic models. We demonstrate experimentally the scalability of our solution by running it on over a thousand categories of diverse entity types and measure the facet quality with a real-user study.