A brave new (virtual) world: distributed searches, relevance scoring and facets |
| |
Authors: | Todd King Tom Narock Raymond Walker Jan Merka Steven Joy |
| |
Institution: | (1) Institute of Geophysics and Planetary Physics, University of California, Los Angeles, CA, USA;(2) Goddard Earth Science and Technology Center, University of Maryland Baltimore County, Baltimore, MD, USA;(3) NASA/Goddard Space Flight Center, Heliospheric Physics Laboratory, Greenbelt, MD, USA |
| |
Abstract: | Our ability to deal with complex systems has improved through information system research which includes improved modeling
(both data and system), the use of semantics and advances in distributed computing. The past decade has seen an explosion
in the amount and variety of geosciences data and the emergence of true open data repositories through which scientists can
freely access this data. Those data are found in thousands of repositories located around the world. Virtual observatories
have been created to address the challenge of helping scientists search those repositories to find and access the required
data. This challenge is been addressed by using technologies such as the Internet (with ample connectivity and bandwidth),
the Web, cheap computing power, cheap storage and standards for critical components. Many scientific disciplines are developing
virtual observatories. Yet some of the most compelling science questions cross multiple domains. While semantics can provide
cross domain reasoning, often the first step in answering a question is determining what resources are available which may
be relevant to a topic. The topic can be expressed as simple phrases or word sequences. Using a common relevance scoring method
at all locations can enable a federated search across loosely coupled providers. The results of which can be organized into
facets to aid the user in selecting the most promising resources with which to pursue the scientific investigation. We describe
an approach to developing and deploying relevance scoring methods and faceted results in this brave new (virtual) world. We
have found that a scoring method which considers both the presence of terms and the proximity of these terms relative to the
order of the terms in the query improves the assessment of relevance. We call this Term Presence-Proximity (TPP) scoring and
describe a method for calculating a normalized score. TPP scoring compares favorably with other scoring approaches. |
| |
Keywords: | Relevance scoring facets virtual observatory search |
本文献已被 SpringerLink 等数据库收录! |
|