In search of the right literature search engine(s)
BackgroundCollecting scientific publications related to a specific topic is crucial for different phases of research, health care and ‘effective text mining’. Available bio-literature search engines vary in their ability to scan different sections of articles, for the user-provided search terms and/or phrases. Since a thorough scientific analysis of all major bibliographic tools has not been done, their selection has often remained subjective. We have considered most of the existing bio-literature search engines (http://www.shodhaka.com/startbioinfo/LitSearch.html) and performed an extensive analysis of 18 literature search engines, over a period of about 3 years. Eight different topics were taken and about 50 searches were performed using the selected search engines. The relevance of retrieved citations was carefully assessed after every search, to estimate the citation retrieval efficiency. Different other features of the search tools were also compared using a semi-quantitative method.ResultsThe study provides the first tangible comparative account of relative retrieval efficiency, input and output features, resource coverage and a few other utilities of the bio-literature search tools. The results show that using a single search tool can lead to loss of up to 75% relevant citations in some cases. Hence, use of multiple search tools is recommended. But, it would also not be practical to use all or too many search engines. The detailed observations made in the study can assist researchers and health professionals in making a more objective selection among the search engines. A corollary study revealed relative advantages and disadvantages of the full-text scanning tools.ConclusionWhile many studies have attempted to compare literature search engines, important questions remained unanswered till date. Following are some of those questions, along with answers provided by the current study:a) Which tools should be used to get the maximum number of relevant citations with a reasonable effort? ANSWER: Using PubMed, Scopus, Google Scholar and HighWire Press individually, and then compiling the hits into a union list is the best option. Citation-Compiler (http://www.shodhaka.com/compiler) can help to compile the results from each of the recommended tool.b) What is the approximate percentage of relevant citations expected to be lost if only one search engine is used? ANSWER: About 39% of the total relevant citations were lost in searches across 4 topics; 49% hits were lost while using PubMed or HighWire Press, while 37% and 20% loss was noticed while using Google Scholar and Scopus, respectively. c) Which full text search engines can be recommended in general? ANSWER: HighWire Press and Google Scholar.d) Among the mostly used search engines, which one can be recommended for best precision? ANSWER: EBIMed.e) Among the mostly used search engines, which one can be recommended for best recall? ANSWER: Depending on the type of query used, best recall could be obtained by HighWire Press or Scopus.