Friday, 10 July 2009

ISMIR tutorial site is up

In case you missed it, we've launched a site to serve as the resource for our tutorial at ISMIR 2009 in Kobe Japan. The tutorial is called Share and Share alike - you can say anything about music with Linked Data. The tutorial will focus on applying Semantic Web technologies to music informatics. If you can't make it to Kobe, this site will hopefully still be very useful and interesting. If you would like to suggest any web resources please tag them on delicious as linked_music_data. Enjoy!

Monday, 22 June 2009

sparql myspace

As part of the dbtune.org myspace service, I've started caching data in a sparql endpoint. Anytime anyone queries the service, the rdf is dumped into a triple store. When the same query is repeated, the data just comes from the triple store. If the data about a particular artist is more than 30 days old, the page is re-translated from myspace.

Here is an example query that lists all artist users from Hungary by name in order of total friends.

PREFIX mysp:<http://purl.org/ontology/myspace#>
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
PREFIX mo:<http://purl.org/ontology/mo/>

SELECT ?name ?friends from <http://dbtune.org/myspace/>
WHERE {?artist mysp:country ?country ;
a mo:MusicArtist ;
foaf:name ?name ;
mysp:totalFriends ?friends .
filter ( regex(str(?country), 'Hungary') )
}

You can see the results here or you can try your own queries using the endpoint here.

Due to human error on my part, the first 6.5 million triples do not include the genre tags for music artists. Terribly sorry about that. Also, we do not include the 'total plays' information as we are consistently getting the wrong value for this field for some mysterious reason. In the end, my PhD is not _only_ about reverse engineering the myspace website ;-)

I'll keep this thing running until we're low on disk space or the server catches fire, which ever comes first.

All the code for the service is in the motools project. I'm using CherryPy to handle requests, a strange menagerie of BeautifulSoup, Regex, and string matching for the screen scraping (yes I've heard of XSLT), and the Chris Sutton classic MoPy for the RDF serialization. The backend triple store is Virtuoso which we connect to using the ODBC interface.

We will talk about uses for this resource and others during our upcoming tutorial at ISMIR 2009 in Kobe Japan. If you can't make it, don't worry, the website for the tutorial is coming soon and we plan to _everything_ there.

Thursday, 14 May 2009

Open Hackday London wrap up

Last weekend I participated in the Yahoo! sponsored Open Hackday London. A good time was had by all. Your typical hackday setup - loads of free cola and food, presentations about various APIs and such, and then a friendly hack competition.

Our hack was entitled Boss of Myspace. We actually walked away with the BBC Backstage prize and even got some positive reviews as well. I created what was basically a human-readable version of our dbtune.org/myspace service. Yves contributed some SPARQL for converting mo:available_as properties into XSPF playlists. However, in the end we used the Yahoo! Media Player which was happier just swallowing embedded html links to audio files. We also used the super cool Yahoo! BOSS search API on the front end to allow for a more fuzzy search - you can enter something close to the artist name and still get the results you want (hopefully).

Ben Heitman from Deri contributed some minimalist css just in time for the deadline. Finally, Ben Fields contributed a really nice feature (post deadline actually working while the other projects at the Hackday were being presented). Ben used the Echonest API to get similar artist recommendations. Just today, I managed to integrate these in the service with some of my first jQuery coding ;-) They are clickable and seem to work really well for head, medium, and hi-tail artists. Planning to present recommendations from the Last.fm API side-by-side in the future as well as recs from other sources.

Please note that I _think_ the streaming audio will not work in the States - strange thing I can't explain but have encountered before when working with Myspace audio streams. Let me know if anybody State-side can test this and confirm.

Here are some of my personal favorite entry points:

Enjoy!!!

And oh yeah! Yves made a nice little RDF-based hack as well that he describes on his blog.

Wednesday, 1 April 2009

WebSci09 report

I recently attended WebSci09 in Athens Greece. This was the first conference of it's kind focusing on web science as a new research discipline. See WSRI's nifty cross-discipline intersection diagram below:

The conference was quite engaging and interesting and rather cross disciplinary. I met a lot of really cool people and had some good discussions about my new pet project MuSim, but more about that in another post. All (or what seems to be all) of the papers are available through WSRI's cool new experimental on-line journal. A few papers I found particularly interesting include Patricia Victor et al's Trust- and Distrust-based recommendations, Denny Vrandecic et al's analysis of new features and user responses in Wikipedia, and Yeung and Noll's approach to measuring expertise in collaborative tagging systems.

And let us not forget one of the few music-centric entries in this inagural WebSci conference, Jeff Pan and Stuart Taylor's very cool but very alpha semantic-web-powered MusicMash. Using their own TrOWL backend, the site aggregates data about a particular music artist - throws it in a triple store as Music Ontology RDF - and serves up a nice human-readable webpage as well. Although I must say, I've had mixed success playing with the website.

Actually my poster presentation was not strictly music-related. I presented the k-pie graph layout algorithm which allows for the visualization of a network where the nodes have a set of semantic labels associated with them. However my main application and demo where very music-centric. The visualization for a sample of Myspace artists and genres looks something like this:


I have created an open source implementation that uses the Jung framework if you're interested you can get it from this public svn. I also provided an interactive demo at the conference that allowed one to click through the graph and listen to music. The code for that is _really_ hacky and I'm not releasing it yet, but if you're really keen just contact me. Also, I'm not convinced it was truly useful as a visual music discovery interface - really wish I could attend Paul and Justin's upcoming tutorial at ISMIR 09 but I'm afraid I'll be busy presenting another tutorial ;-)

Just to wet your appetite a bit, here's a screen shot of what the interactive demo looks like - fun to play with, but really needs a lot of work.

plz click to enlarge.

In summary, WebSci09 was a success, met lots of cool people and I believe there are many more successful WebSci's on the horizon.

Thursday, 5 February 2009

Classical Music in the Web of Data

I have finally re-published the classical music composer influence data set I've been working with in a proper Linked Data fashion. You might remember this data set from this relatively recent post or this much older post where I describe some complex networks statistics and the original collection of the data set. This rather modestly sized data set contains just under 8,000 triples. Of course you can make SPARQL queries against the data here or you can browse around w/ your favorite data browser or even your plain-old html web browser. I've implemented content negotiation using the data-resource-page paradigm you might recognize from DBpedia.org.

I've done most of this with Openlink Virtuoso - the all singing all dancing web of data server solution. Although some things in Virtuoso didn't work exactly right at first and it's a bit rough around the edges, it is an awesome piece of software. It's so feature-rich it can be a bit daunting, but the Openlink guys are quick to help on the mailing list.

In the near future we will be publishing our MySpace data set from ISMIR 2008 in a similar fashion...

Tuesday, 30 December 2008

Last post of the year

I have been slacking on the blog posts and I really have no excuse. I've been working on a GUI for browsing structured data about classical music composers. There is a very alpha version of the software available here.


I've also been busy with my new favorite CMS drupal working on the omras2 website. You can browse a fancy list of omras-related publications here.

Finally, really big news in my personal life - I got married to my lovely wife Larisa!!!

Tuesday, 28 October 2008

Sparqling a funk legend

A while ago I had the privilege of witnessing funk pioneer, JB Horns member, and life-long saxophone badass Maceo Parker perform. It was fantastic. Maceo led the band with swagger and style - using many of the same hand gestures and moves Soul Brother #1 used to lead the Horns back in the day. He finished his encore with the funk classic "Pass the Peas".

Afterwords I was curious - did Maceo write "Pass the Peas" or was it Fred Wesley or maybe Pee Wee Ellis??? So I decided to ask the Semantic Web. We'll use the SPARQL (pronounced "sparkle") to make queries. There is a nice gentle tutorial from IBM here which also describes using the Java-based Jena package. If you're a python type you might try sparql-wrapper. However, to start out and to see what data is where you'll probably want to use a web-based interface (most SPARQL endpoints will have some web-based interface where you can enter a query). We'll start with the Virtuoso SPARQL endpoint for DBpedia to see what wikipedia knows about Maceo Parker...

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?Concept
WHERE {<http://dbpedia.org/resource/Maceo_Parker> rdf:type ?Concept}

If you're familiar with SQL you might be able to guess what this means. We're asking DBpedia to tell us all the concepts related to Maceo Parker. The "PREFIX" keyword is setting up a namespace. Then we are going to "SELECT" any values of the variable "?Concept" that fit the graph pattern we establish in our "WHERE" clause. That pattern is Maceo_Parker rdf:type ?Concept - or any concept that describes what type of entity Maceo Parker is. You can copy and paste the query into the DBpedia interface and see that Maceo Parker is a Person, a AfricanAmericanMusician, a AmericanJazzSaxophonist, and a number of other things.

Well that's a good amount of information right there. We can infer we've got the right Maceo Parker but unfortunately DBpedia doesn't have a lot of discography data, so we can't really answer our original question. Luckily my good friend, former colleague, and Semantic Web mentor Yves Raimond created a SPARQL endpoint that queries the Musicbrainz database. The service is hosted on DBtune.org/musicbrainz and a web interface can be found here.

So let's construct a new query.

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

SELECT ?maceo ?songs ?title WHERE {
?maceo owl:sameAs <http://dbpedia.org/resource/Maceo_Parker> .
?songs foaf:maker ?maceo .
?songs dc:title ?title
}

Again, we start with a few namespace prefixes. This time we're going to select three different variables - "?maceo", "?songs", and "?title". We're using the same resource URI for Maceo Parker as we did when querying DBpedia. However, the DBtune interface refers to Maceo with a different URI. But the DBtune/musicbrainz is inter-linked with DBtune using the "owl:sameAs" property. So the first line of our WHERE clause will find DBtune/musicbrainz URIs that are the same as the DBpedia URI for Maceo Parker. The second line will fill the variable ?songs with URIs for entities for which Maceo Parker is the "foaf:maker". In DBtune/musicbrainz the foaf:maker property is use to associate a song with the song writer. Finally in the third line of our WHERE clause we retrieve all the titles of the songs made by Maceo Parker. Again, you can try to query yourself here - you will find that Maceo Parker is indeed credited with writing "Pass the Peas" as well as a quite lenghty list of other funk gems.