Method for Large-Scale Set Similarity Joins

Manuel Widmoser* (Inventor), Nikolaus Augsten (Inventor), Daniel Kocher (Inventor), Willi Mann (Inventor)

*Corresponding author for this work

Research output: Patent

Abstract

Provided is a computer-implemented method to find similar sets to a selected query set within a collection of sets, wherein each set represents a process. Each set is transformed to a representation in a vector space. Moreover, each set comprises a prefix.

The method comprises creating and storing a data structure representing an inverted metric index in a storage device. The similar sets to the selected query set are identified by a probing step followed by a candidate verifying step. In the probing step, the inverted metric index is filtered by means of a predefined subspace of the vector space. In the candidate verifying step, each set of the filtered inverted
metric index is identified as a similar set if its distance value is smaller or equal to a predefined distance threshold value.
Original languageEnglish
Patent numberEP4235451
IPCG06F16/22
Publication statusPublished - 30 Aug 2023

Fields of Science and Technology Classification 2012

  • 102 Computer Sciences

Cite this