Wednesday, 28 October 2009

More on the Similarity Ontology

I recently gave a talk at ISMIR 2009 about MuSim the similarity ontology. The talk was well attended and well received, but I wanted to review the core concepts and promote the demo implementation as well.

In a nut shell, similarity is very complex and when we say two things are similar we really mean that they are similar in some specific sense. So our ontology focusing on reifying similarity - not saying just what things are similar, but also how was it determined they are similar. As such, we treat similarity as a class rather than a property.

We define a class sim:Similarity for describing similarity statements. We then define the class sim:AssociationMethod for describing a method for determining similarity. By associating a similarity statement of type sim:Similarity with a method of type sim:AssociationMethod we are describing in what sense the elements involved in our statement are similar. We can further reify our method by providing provenance (who made the method) and even fully disclose our method by pointing to a graph describing our workflow. The diagram below illustrates a basic example involving two music tracks.
Musim block diagram
Here is the same example in Turtle:

@prefix sim: <http://purl.org/ontology/similarity/> .
@prefix mo: <http://purl.org/ontology/mo/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <#> .

:TrackA a mo:Track .
:TrackB a mo:Track .

_:SimilarityStatment a sim:Similarity ;
sim:element :TrackA ;
sim:element :TrackB ;
sim:method :TimbreSim .

:TimbreSim a sim:AssociationMethod ;
foaf:maker :Me ;
sim:description <http://some.graph.uri> .

:Me a foaf:Person .

Hopefully this example is somewhat illustrative, although if you're not familar with RDF or the Turtle syntax for serializing RDF it's probably not terribly useful. The important thing to understand is that :TimbreSim is our method for deriving our similarity statement. We specify the person who made this methdo (:Me) and we point to a Named Graph that describes our workflow. We could also attach a textual description and other information.

Ofcourse this example is only illustrative and not very realistic. Enter our demo implementation classical.catfishsmooth.net . Here we've taken some public domain classical music found on Musopen as well as some information about classical composers and their network of influence as specified by the Classical Music Navigator - this gives us a source of composer-to-composer similarity. We also use the Sonic-Annotator and some of the QMUL VAMP Plugins to analyze the audio files and get a couple methods of determining track-to-track similarity.

In this implementation, we have 3 similarity methods:
Note that MuSim would not be very interesting without the SPARQL query language. We can query a collection of similarity statements and combine similarity methods in interesting ways. In our catfishsmooth implementation we store our similarity triples and related data in 4Store - a scalable open-source RDF store from Garlik. This allows us to include a SPARQL endpoint and, even better, a Snorql SPARQL explorer. In the Snorql explorer, you will find a series of example queries on the right-hand side of the interface. The most advanced query allows us to combine all three similarity methods. We use a recording of Wagner's piece Der Meistersinger von Nürnberg as the seed. We find other tracks that are in the same key, similar by timbre, and composed by composers who had influenced Wagner.

I've had some exciting (but brief) discussions with Mert Bay and Stephen Downie about integrating MuSim with the MIREX framework such that particpants in the audio similarity task might optionally apply their algorithm to some creative commons dataset (probably Jamendo) and publish MuSim data for re-use and additional evaluation.

We haven't really covered here the workflow description syntax or concepts. This will be the focus of much future work. Our design leaves this specification open-ended, but we have used the N3-Tr framework developed by Yves Raimond in his PhD thesis to describe workflows. You can see some examples in our implementation.

For more details on MuSim you can read our ISMIR paper (pdf) or view the ontology specification.

0 comments:

Related Posts with Thumbnails