The Nu Expression for Probabilistic Data Integration |
| |
Authors: | Evgenia I Polyakova and Andre G Journel |
| |
Institution: | (1) Department of Geological and Environmental Sciences, Stanford University, Palo Alto, CA 94305, USA |
| |
Abstract: | The general problem of data integration is expressed as that of combining probability distributions conditioned to each individual
datum or data event into a posterior probability for the unknown conditioned jointly to all data. Any such combination of
information requires taking into account data interaction for the specific event being assessed. The nu expression provides
an exact analytical representation of such a combination. This representation allows a clear and useful separation of the
two components of any data integration algorithm: individual data information content and data interaction, the latter being
different from data dependence. Any estimation workflow that fails to address data interaction is not only suboptimal, but
may result in severe bias. The nu expression reduces the possibly very complex joint data interaction to a single multiplicative
correction parameter ν
0, difficult to evaluate but whose exact analytical expression is given; availability of such an expression provides avenues
for its determination or approximation. The case ν
0=1 is more comprehensive than data conditional independence; it delivers a preliminary robust approximation in presence of
actual data interaction. An experiment where the exact results are known allows the results of the ν
0=1 approximation to be checked against the traditional estimators based on assumption of data independence. |
| |
Keywords: | Data integration Data interaction vs dependence Updating probabilities Conditional independence |
本文献已被 SpringerLink 等数据库收录! |