In this first part I will show you how to extract RDF from Wikipedia with the help of the DBpedia Extraction Framework. In future version we are going to import this data into Virtuoso and create a Solr/Lucene Index for it.
First of all you need to install Java 7, Scala, Mercurial and Maven. Then open a Terminal and go to the directory where you want to install the extraction framework. Then you can checkout the DBpedia Extraction Framework from here (this is one line!):
$ hg clone http://dbpedia.hg.sourceforge.net:8000/hgroot/dbpedia/extraction_framework dbpedia-extraction
If this is done go to a command line and enter:
$ cd /path/to/mercurialrepo/dbpedia-extraction
$ mvn install
$ cd dump
Once this is done you need to edit the config files according to your needs. First we need to configure what we want to download. You can download this config file and adjust it. Now we need to configure what we want to extract. Here are two config files which work for English and German. I’ve only activated some of the Extractors, a full list can be found here. Also make sure that you have only “extractor.$Language_Code” entries for languages you want to extract. Otherwise you will get error message for trying to extract from non existing data. The last thing to edit is the pom.xml in the dump directory. Go to the download launcher in the pom file and adjust the name of you config file (no need for change if you use the config I supplied). Then go to the extraction launcher and also change the name of the configuration file according to your needs. Now you should be able to start the download via:
$ mvn mvn scala:run -Dlauncher=download
and run the extraction with:
$ mvn scala:run -Dlauncher=extraction
The data directory specified in the German and English config files should now contain several files n-triples or turtle files. Congratulations! If anything went wrong please drop me a comment!
I used to tail my log files in the console/terminal with “$ tail -f somelog.log”. But this is not really working if you have Eclipse in fullscreen mode (in Mac OS X). I recently stumbled across a very nice plugin, which basically let’s you do the same thing in a “Window-View” just like the Console- or Outline view of Eclipse. The plugin supports also multiple log files, so you can easily switch between a debug an normal log file. Furthermore you can define some rules (word or regex-match) which then can, if they match, change the particular line’s back- and foreground color… exceptions can’t hide 🙂 To install this extension copy the following URL and go to Eclipse > Help > Install new Software and install the “Log Viewer Feature”. Restart eclipse and open the Log Viewer with Window > Show View > Other > Log Viewer. You then need to add all the log files you want to tail and create your custom rules (see picture for more infos). Please also visit and star this project on it’s Google Code site.
Since the recent OS update enabled fullscreen applications, all I wanted to do was using Eclipse in Fullscreen mode. But since the most recent eclipse release does not support this feature it seemed impossible. But now Alex Blewitt came to the rescue. In his recent blog post he describes what is necessary to make eclipse fullscreen compatible. So go ahead and read it here or open Eclipse > Install New Software > Add “http://github.bandlem.com/” and install the extension. Thanks Alex!
Did you ever wanted to toggle breakpoints, view or even edit the variable’s content during execution of your php script? Sick and tired of „var_dump()“ and „echo“ scattered all around your code? Then you should keep reading…