Metric hull as similarity-aware operator for representing unstructured data


Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu


ANTOL Matej JÁNOŠOVÁ Miriama DOHNAL Vlastislav

Rok publikování 2021
Druh Článek v odborném periodiku
Časopis / Zdroj Pattern Recognition Letters
Fakulta / Pracoviště MU

Fakulta informatiky

Klíčová slova Similarity operators; Metric space; Data aggregation
Popis Similarity searching has become widely utilized in many online services processing unstructured and complex data, e.g., Google Images. Metric spaces are often applied to model and organize such data by their mutual similarity. As top-k queries provide only a local view on data, a data analyst must pose multiple requests to observe the entire dataset. Thus, group-by operators for metric data have been proposed. These operators identify groups by respecting a given similarity constraint and produce a set of objects per group. The analyst can then tediously browse these sets directly, but representative members may provide better insight. In this paper, we focus on concise representations of metric datasets. We propose a novel concept of a metric hull which encompasses a given set by selecting a few objects. Testing an object to be part of the set is then made much faster. We verify this concept on synthetic Euclidean data and real-life image and text datasets and show its effectiveness and scalability. The metric hulls provide much faster and more compact representations when compared with commonly used ball representations.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.