In the previous post, I told how I learned to stop being afraid of Europeana and love SPARQL. As a proof, I gathered statistics on how many video resources there are from different Finnish municipalities. Proportionally, taking into account of the number of inhabitants, the #1 video corner in Finland is Saarijärvi. My Finnish readers, please note the EDIT section towards the end of that posting. For some strange reason I first claimed it to be Helsinki. Sorry about that, Saarijärvi.
BTW, did you know that there is a connection between Saarijärvi and Pamela Anderson? I certainly did not.
What is it that is there?
My so-called research problem with Europeana, nicely summarized by Mikko Rinne, was that in most of the cases, the semantic information about the shooting location of the videos was missing. Therefore, I had to query the name of the municipality around several elements like description, title and subject.
The main contributor of Finnish videos in Europeana is KAVA, National Audiovisual Archive of Finland, in cooperation with European Film Gateway. The videos are digitized newsreels from 1943 to 1964, shown at the Elonet site of KAVA. While perusing the site, I noticed that KAVA is currently growdsourcing metadata about Finnish fiction films. This is a wise move. There is only so much resources to put into this kind of work by KAVA itself. Who knows, maybe my exercise is of some help at some stage, although there are strong caveats e.g. due to the clumsy search logic that returns false positives here and there.
There do exists some spatial data too, enriched by Europeana itself I understand. The most interesting metadata element for me was edm:hasMet with the value of GeoNameID of the municipality. The same element is also used for geolocation coordinates, and Europeana offers a neat interactive map interface built upon them.
How can I find out which GeoNameID belongs to which municipality? Luckily, DBpedia has done the job, see e.g. the resource of Saarijärvi and the property list of owl:sameAs.
Some 8% of municipalities lack the ID, but that's good enough for my purposes. With the list of municipality names, I gathered the IDs by querying the SPARQL endpoint of DBpedia. The names themselves I had downloaded previously from the National Land Survey of Finland via the indispensable R package soRvi. With the IDs at hand, I turned to Europeana again. This time, I was interested in how much geonamed items there were in different categories.
GeoNameID
Europeana resources are divided in four media types: image, sound, text and video. Here I visualize the raw numbers in few separate graphs, roughly based on the number of items. Otherwise it would be difficult to see any nuances between municipalities. The R code of stacked bars is adapted from the Louhos Datavaalit examples. Note that what I did not succeed in doing yet was to sort the bars based on the size of item counts; maybe some misunderstanding from my part on how the factor levels are working.
The first thing you notice is that text items outnumber all others. As far as I know, they consist mainly of newspaper articles digitized by the National Library of Finland. This is no news (pun not intended). Of all newspapers published in Finland 1771-1900, the Library has already digitized the most.
In the third graph, one municipality stands out: Rauma. Quite a lot of images, even more texts. Interesting. I was born in rural Laitila, located some 30 km SE from Rauma, so of course I was keen on knowing what kind of material Europeana has got in such quantities from such a familiar spot. FYI, Rauma was given town rights in 1442. This small coastal municipality is known of its wooden Old Town, a Unesco heritage site.
Rauma turned out to be two-headed. It was not just my childhood neighbour, Finnish Rauma, but also Norwegian Rauma, established in 1964 and named after Rauma River. The reason for false hits was that the GeoNameID of both places has been saved in all Rauma instances. By mistake, I guess. Anyway, Finland brings texts and Norway images - which is probably just right, Norway is so much more gorgeous.
Under CONSTRUCTion (couldn't resist)
After all Europeana SPARQLing, I decided to try the idea that Mikko had thrown in his blog comment: why not offer links to these resources? Yes, there are false hits - be aware of e.g. Ii, Salo, Rautavaara, Vaala and Kolari for reasons that relate to Finnish language and my REGEX FILTER statements - but the majority should be decent.
Although I've been practising SPARQL queries for some time now, I am a complete newbie when it comes to linked data modeling, RDF and all that jazz. BTW the SPARQL package, contributed by a friend of mine, Tomi Kauppinen, et al. has worked like a charm. So, I ventured along with the help (again) of Bob DuCharme's book and blog. It was actually quite exciting to be able to create new RDF triples with the SPARQL CONSTRUCT statement! Then, when I found rrdf which, out of the box, offers functions to store, combine and save triples, I was ready to try. While at it, I decided to gather data about all AV resources, not just video.
Here they are now, my first RDF triples from my very first in-memory triple store, containing data about Europeana Finnish resources featuring image, sound and video media types. The triples are serialized as RDF/XML and Turtle/N3. RDF/XML was done with the rrdf save.rdf function, and conversion to Turtle/N3 was also easy with the Apache Jena command-line tool rdfcat.
Rauma I left un-tripled - although I could have added an IF function to trap it, and then FILTER out all images and texts.
I would be more than happy if you'd like to comment on anything related to this exercise, especially on the CONSTRUCT part!
The R codes of querying DBpedia and drawing bar charts, and CONSTRUCTing RDF triples.