Tag Archives: linked open data

Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuoso 7.2.1 and Docker (optional)

So you’re the guy who is allowed to setup a local DBpedia mirror or more generally a local Linked Data mirror for your work group? OK, today is your lucky day and you’re in the right place. I hope you’ll be able to benefit from my many hours of trials and errors. If anything goes wrong (or everything works fine), feel free to leave a comment below.

Versions of this guide

There are four older versions of this guide:

  • Oct. 2010: The first version focusing on DBpedia 3.5 – 3.6 and Virtuoso 6.1
  • May 2012: A bigger update to DBpedia 3.7 (new local language versions) and Virtuoso 6.1.5+ (with a lot of updates making pre-processing of the dumps easier)
  • Apr. 2014: Update to DBpedia 3.9 and Virtuoso 7
  • Nov. 2011: Update to DBpedia 2014 and other Datasets and Virtuoso 7.1.0

In this step by step guide I’ll tell you how to install a local Linked Data mirror of the DBpedia 2015-04, hosting a combination of the regular English and (exemplary) the i18n German datasets adding up to nearly 850 M triples.

I’ll also mention how you can add the following datasets / vocabularies adding up to nearly 6 G triples:

As DBpedia is quite modular and has many internationalized (i18n) versions it has its own section in this guide, the other datasets don’t, as they maximally need minor repacking and a single line to load as explained below.

Used Versions

  • DBpedia 2015-04
  • Virtuoso OpenSource 7.2.1
  • Ubuntu 14.04 LTS or Debian 8

Prerequisites

A strong machine with root access and enough RAM: We used a VM with 4 Cores and 32 GBs of RAM for DBpedia only. If you intend to also load Freebase and other datasets i recommend at least 64 GBs of RAM (we actually ended up using a 16 Core, 256 GB RAM Server in our research group). For installing i recommend more than 128 GB free HD space for DBpedia alone, 512 GB if you want to load Freebase as well, especially for downloading and repacking the datasets, as well as the growing database file when importing (mine grew to 64 GBs for DBpedia and 320 GB with all the datasets mentioned above).

This guide applies to a clean install. Please check that there’s no older version of Virtuoso installed with dpkg -l | grep virtuoso ; which isql ; which isql-vt (no output is good). If there is, please know what you’re doing. Virtuoso 6 and 7 use different default locations for their DBs, but in general newer versions should be able to upgrade older DB files if correctly configured to use the same DB file. In general i’d suggest to either uninstall the older version and its config files and then install the new one according to this guide or to isolate the newer one with the docker approach mentioned below.

For the impatient and docker affine

As an alternative to the following sections, which will explain how to build everything from source yourself and go into details about the DBpedia dump files, i also provide a docker image (source) that you can use to automate and simplify the process a lot:

dump_dir=~/dumps/dbpedia/2015-04
db_dir=~/virtuoso_db
mkdir -p "$dump_dir"
cd "$dump_dir"

# downloading
wget -r -nc -nH --cut-dirs=1 -np -l1 \
    -A '*.nt.bz2' -A '*.owl' -R '*unredirected*' \
    http://downloads.dbpedia.org/2015-04/core/

# repacking
apt-get install pigz pbzip2
for i in */*.nt.bz2 ; do echo $i ; pbzip2 -dc "$i" | pigz - > "${i%bz2}gz" && rm "$i"; done
mkdir classes
cd classes
wget http://downloads.dbpedia.org/2015-04/dbpedia_2015-04.owl
cd

# install some VAD packages for DBpedia into our db which we'll keep in db_dir
docker run -d --name dbpedia-vadinst \
    -v "$db_dir":/var/lib/virtuoso-opensource-7 \
    joernhees/virtuoso run &&
docker exec dbpedia-vadinst wait_ready &&
docker exec dbpedia-vadinst isql-vt PROMPT=OFF VERBOSE=OFF BANNER=OFF \
    "EXEC=vad_install('/usr/share/virtuoso-opensource-7/vad/rdf_mappers_dav.vad');" &&
docker exec dbpedia-vadinst isql-vt PROMPT=OFF VERBOSE=OFF BANNER=OFF \
    "EXEC=vad_install('/usr/share/virtuoso-opensource-7/vad/dbpedia_dav.vad');" &&
docker stop dbpedia-vadinst &&
docker rm -v dbpedia-vadinst &&

# starting the import
docker run --rm \
    -v "$db_dir":/var/lib/virtuoso-opensource-7 \
    -v "$dump_dir"/classes:/import:ro \
    joernhees/virtuoso import 'http://dbpedia.org/resource/classes#' &&
# docker import of the actual data (will use 64 GB RAM and take about 1 hour)
docker run --rm \
    -v "$db_dir":/var/lib/virtuoso-opensource-7 \
    -v "$dump_dir"/core:/import:ro \
    -e "NumberOfBuffers=$((64*85000))" \
    joernhees/virtuoso import 'http://dbpedia.org' &&

# running the local endpoint on port 8891 with 32 GB RAM:
docker run --name dbpedia \
    -v "$db_dir":/var/lib/virtuoso-opensource-7 \
    -p 8891:8890 \
    -e "NumberOfBuffers=$((32*85000))" \
    joernhees/virtuoso run

# access one of the following for example:
# http://localhost:8891/sparql
# http://localhost:8891/resource/Bonn
# http://localhost:8891/conductor (user: dba, pw: dba)

The manual version

Download and build Virtuoso

We’ll download Virtuoso OpenSource: either from SourceForge or GitHub (make sure you get v7.2.1 as in this guide or a newer version).

Unlike in earlier versions of this guide we’ll now first build the .deb packages and then install them with apt-get.

As building will install a lot of extra packages that you only need for building, i prepared another docker image (source) that will do the whole building job inside a container for you and put the resulting .deb packages (and DBpedia VAD) into your ~/virtuoso_deb folder:

docker run --rm -it -v ~/virtuoso_deb:/export/ joernhees/dpkg_build \
    https://github.com/openlink/virtuoso-opensource/releases/download/v7.2.1/virtuoso-opensource-7.2.1.tar.gz \
    -j5
# this should run for about 15 minutes
# compilation by default sadly does not create the dbpedia VAD package, so
# to do that, the above command stops after compilation in interactive mode.
# in there just execute this:
cd /tmp/build/virtuoso*/ &&
./configure --with-layout=debian --enable-dbpedia-vad &&
cd binsrc &&
make &&
cp dbpedia/dbpedia_dav.vad /export &&
exit

If you used this, you can skip the following down to installing the .deb packages.

If not, to do the building manually run this to download the file, put it in your home dir on the server, then extract it and switch to the directory:

mkdir ~/virtuoso_deb
cd ~/virtuoso_deb
wget https://github.com/openlink/virtuoso-opensource/releases/download/v7.2.1/virtuoso-opensource-7.2.1.tar.gz
tar -xvzf virtuoso-7.2.1.tar.gz
cd virtuoso-opensource-7.2.1  # or newer, depending what you got

Afterwards you can use the following to install the build dependencies and actually build the .deb packages:

# install build tools
sudo apt-get install -y build-essential devscripts
# to install Virtuoso build dependencies
mk-build-deps -irt'apt-get --no-install-recommends -yV' && dpkg-checkbuilddeps
# to build Virtuoso with 5 processes in parallel
# choose something like your server's #CPUs + 1
dpkg-buildpackage -us -uc -5

This will take about 15 min.
Afterwards if everything worked out, you should have the *.deb files in ~/virtuoso_deb.

We continue to also build the DBpedia VAD:

./configure --with-layout=debian --enable-dbpedia-vad && \
cd binsrc && make \
cp dbpedia/dbpedia_dav.vad ~/virtuoso_deb/

Finally, let’s create a small local repository out of the .deb files you just built. The advantage of this is that you can simply install virtuoso-server with its dependencies with apt. In theory you could also resolve them manually and install everything with dpkg -i ..., but where’s the fun in that?

cd ~/virtuoso_deb
dpkg-scanpackages ./ | gzip > Packages.gz

Installing Virtuoso

No matter if you used the docker or manual building approach for the .deb packages of Virtuoso, you should now be able to install them with apt-get install ... after telling it where to look for the files for example by doing this:

sudo echo "deb file:~/virtuoso_deb ./" >> /etc/apt/sources.list.d/virtuoso_local_packages.list
sudo apt-get update

After this just install Virtuoso with the following command (it should warn you about untrusted sources of the Virtuoso packages, which is because we just built them ourselves):

sudo apt-get install virtuoso-server \
  virtuoso-vad-bpel \
  virtuoso-vad-conductor \
  virtuoso-vad-demo \
  virtuoso-vad-doc \
  virtuoso-vad-isparql \
  virtuoso-vad-ods \
  virtuoso-vad-rdfmappers \
  virtuoso-vad-sparqldemo \
  virtuoso-vad-syncml \
  virtuoso-vad-tutorial

The above will ask you for a DBA password. Please pick one.

Installing the VAD packages here will actually not install them in the Virtuoso DB file, but just move them in the right place so they can for example be installed as mentioned later.

To also move the DBpedia VAD in place for later you can just run this:

sudo cp ~/virtuoso_deb/dbpedia_dav.vad /usr/share/virtuoso-opensource-7/vad/

Configuring Virtuoso

Now change the following values in /etc/virtuoso-opensource-7/virtuoso.ini, the performance tuning stuff is according to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning:

# note: Virtuoso ignores lines starting with whitespace and stuff after a ;
[Parameters]
# you need to include the directory where your datasets will be downloaded
# to, in our case /usr/local/data/datasets:
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
# IMPORTANT: for performance also do this
[Parameters]
# the following two are as suggested by comments in the original .ini
# file in order to use the RAM on your server:
NumberOfBuffers = 2720000
MaxDirtyBuffers = 2000000
# each buffer caches a 8K page of data and occupies approx. 8700 bytes of
# memory. It's suggested to set this value to 65 % of ram for a db only server
# so if you have 32 GB of ram: 32*1000^3*0.65/8700 = 2390804
# default is 2000 which will use 16 MB ram ;)
# Make sure to remove whitespace if you uncomment existing lines!
[Database]
MaxCheckpointRemap = 625000
# set this to 1/4th of NumberOfBuffers
[SPARQL]
# I like to increase the ResultSetMaxrows, MaxQueryCostEstimationTime
# and MaxQueryExecutionTime drastically as it's a local store where we
# do quite complex queries... up to you (don't do this if a lot of people
# use it).
# In any case for the importer to be more robust add the following setting
# to this section:
ShortenLongURIs = 1

Afterwards restart Virtuoso:

sudo /etc/init.d/virtuoso-opensource-7 stop

You should now have a running Virtuoso server.

DBpedia URIs (en) vs. DBpedia IRIs (i18n)

The DBpedia 2015-04 consists of several datasets: one “standard” English version and several localized versions for other languages (i18n). The standard version mints URIs by going through all English Wikipedia articles. For all of these the Wikipedia cross-language links are used to extract corresponding labels in other languages for the en URIs (e.g., core/labels-en-uris_de.nt.bz2). This is problematic as for example articles which are only in the German Wikipedia won’t be extracted. To solve this problem the i18n versions exists and create IRIs in the form of de.dbpedia.org for every article in the German Wikipedia (e.g., core-i18n/de/labels_de.nt.bz2).

This approach has several implications. For backwards compatibility reasons the standard DBpedia makes statements about URIs such as http://dbpedia.org/resource/Gerhard_Schr%C3%B6der while the local chapters, like the German one, make statements about IRIs such as http://de.dbpedia.org/resource/Gerhard_Schröder (note the ö). In other words and as written above: the standard DBpedia uses URIs to identify things, while the localized versions use IRIs. This also means that http://dbpedia.org/resource/Gerhard_Schröder shouldn’t work. That said, clicking the link will actually work as there is magic going on in your browser to give you what you probably meant. Using curl curl -i -L -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Gerhard_Schröder or SPARQLing the endpoint will nevertheless not be so nice/sloppy and can cause quite some headache. Observe how the following two SPARQL queries return different results: select * where { dbpedia:Gerhard_Schröder ?p ?o. } vs. select * where { <http://dbpedia.org/resource/Gerhard_Schr%C3%B6der> ?p ?o. }. In order to mitigate this historic problem a bit DBpedia actually offers owl:sameAs links from IRIs to URIs: core/iri-same-as-uri_en.nt.bz2 which you should load, so you at least have a link to what you want if someone tries to get info about an IRI.

As the standard DBpedia provides labels, abstracts and a couple other things in several languages, there are two types of files in the localized DBpedia folders: There are triples directly associating the English URIs with for example the German labels ({core,core-i18n/de}/labels-en-uris_de.nt.bz2) and there are the localized triple files which associate for example the DE IRIs with the German labels (core-i18n/de/labels_de.nt.bz2).

Downloading the DBpedia dump files, de-duplication & Repacking

For our group we decided that we wanted a reasonably complete mirror of the standard DBpedia (EN) (have a look at the core directory, which contains all datasets loaded into the public DBpedia SPARQL Endpoint), but also the i18n versions for the German DBpedia loaded in separate graphs, as well as each of their pagelink datasets in yet another separate graph each. For this we download the corresponding files in (NT) format as follows. If you need something different do so (and maybe report back if there were problems and how you solved them).

# see comment above, you could also get another DBpedia version...
mkdir -p /usr/local/data/datasets/dbpedia/2015-04
cd /usr/local/data/datasets/dbpedia/2015-04
wget -r -nc -nH --cut-dirs=1 -np -l1 -A '*.nt.bz2' -A '*.owl' -R '*unredirected*' http://downloads.dbpedia.org/2015-04/{core/,core-i18n/en,core-i18n/de,dbpedia_2015-04.owl}

As already mentioned, the DBpedia 2015-04 introduced a core folder which contains all files loaded on the public DBpedia endpoint. Be aware that if you download other folders like above you’ll be downloading some files twice in other folders (e.g., labels-en-uris_de.nt.bz2 can be found in both, the core folder and the core-i18n/de folder). Quite obvious, but especially the core-i18n/en folder contains very many duplicate files from core. If want to see which downloaded files are duplicates (independent of their name) and especially which core-i18n/en files were not loaded on the public endpoint, so are not in core, you can do the following:

# compute md5 hashes for all downloaded files
find . -mindepth 2 -type f -print0 | xargs -0 md5sum > md5sums

# first check if there are duplicates in other folders without core
LC_ALL=C sort md5sums | grep -v '/core/' | uniq -w32 -D
ba3fc042b14cb41e6c4282a6f7c45e02  ./core-i18n/en/instance-types-dbtax-dbo_en.nt.bz2
ba3fc042b14cb41e6c4282a6f7c45e02  ./core-i18n/en/instance_types_dbtax-dbo.nt.bz2

So it seems the ./core-i18n/en/instance-types-dbtax-dbo_en.nt.bz2 and ./core-i18n/en/instance_types_dbtax-dbo.nt.bz2 files are actually the same.

To list all the files in core-i18n/en which are duplicates do this:

# list all dup files in core-i18n/en
LC_ALL=C sort md5sums | uniq -w32 -D | grep '/core-i18n/en'
068975f6dd60f29d13c8442b0dbe403d  ./core-i18n/en/skos-categories_en.nt.bz2
14a770f293524a5713f741a1a448bcfa  ./core-i18n/en/short-abstracts_en.nt.bz2
1904ad5bc4579fd7efe7f40673c32f79  ./core-i18n/en/specific-mappingbased-properties_en.nt.bz2
1958649209bc90944c65eccd30d37c6c  ./core-i18n/en/infobox-property-definitions_en.nt.bz2
2774d36ce14e0143ca4fa25ed212a598  ./core-i18n/en/external-links_en.nt.bz2
314162db2acb516a1ef5fcb3a2c7df2b  ./core-i18n/en/geonames_links_en.nt.bz2
3b42f351fc30f6b6b97d3f2a16ef6db3  ./core-i18n/en/instance-types-transitive_en.nt.bz2
3b61b11bdcb50a0d44ca8f4bd68f4762  ./core-i18n/en/revision-ids_en.nt.bz2
43a8b17859c50d37f4cab83573c2992e  ./core-i18n/en/instance_types_sdtyped-dbo_en.nt.bz2
4c847b2754294c555236d09485200435  ./core-i18n/en/instance-types_en.nt.bz2
63e2cde88e7bdefb6739c62aa234fc1e  ./core-i18n/en/category-labels_en.nt.bz2
64cbbac14769aadf560496b4d948d5e1  ./core-i18n/en/interlanguage-links-chapters_en.nt.bz2
75f2d135459c824feee1d427e4165a4f  ./core-i18n/en/transitive-redirects_en.nt.bz2
82fe80c3868a89d54fec26c919a4fa50  ./core-i18n/en/revision-uris_en.nt.bz2
8407c84d262b573418326bdd8f591b95  ./core-i18n/en/mappingbased-properties_en.nt.bz2
87df057913a05dbb5666f360d20fa542  ./core-i18n/en/freebase-links_en.nt.bz2
8cc921fbab5d02ad83b1fda2f87c23f0  ./core-i18n/en/wikipedia-links_en.nt.bz2
9152e34db96df2dd4991e78b7e53ff3f  ./core-i18n/en/article-categories_en.nt.bz2
94b48e9df78f746e60a9d0c1aafa3241  ./core-i18n/en/infobox-properties_en.nt.bz2
a254ce4596d045cc047959831edd318a  ./core-i18n/en/disambiguations_en.nt.bz2
ab29899e43fab1c6f060cdb8955c5b19  ./core-i18n/en/images_en.nt.bz2
ae046e03be0cf29eac1e3b8a8b3d6b03  ./core-i18n/en/persondata_en.nt.bz2
b4710d36b8dc915f07f5cec2d9971a27  ./core-i18n/en/page-ids_en.nt.bz2
ba3fc042b14cb41e6c4282a6f7c45e02  ./core-i18n/en/instance-types-dbtax-dbo_en.nt.bz2
ba3fc042b14cb41e6c4282a6f7c45e02  ./core-i18n/en/instance_types_dbtax-dbo.nt.bz2
bd90ce4064a120794b5eb5a8d024a97d  ./core-i18n/en/long-abstracts_en.nt.bz2
e4c422d1d23c69eff3b9d7d7df3f2f80  ./core-i18n/en/homepages_en.nt.bz2
eafc557cde69fd1cd8f78565c385ee16  ./core-i18n/en/iri-same-as-uri_en.nt.bz2
ef48deae48c9c9c5e17585e3f0243663  ./core-i18n/en/labels_en.nt.bz2
fa8800165c7e80509a4ebddc5f0caf90  ./core-i18n/en/geo-coordinates_en.nt.bz2

# to delete the duplicates from /core-i18n/en, leaving just one of each:
LC_ALL=C sort md5sums | uniq -w32 -D | grep '/core-i18n/en' | uniq -w32 | cut -d' ' -f3 | xargs rm

# afterwards these should be left:
ls -1 core-i18n/en
core-i18n/en/anchor-text_en.nt.bz2
core-i18n/en/article-templates_en.nt.bz2
core-i18n/en/flickr-wrappr-links_en.nt.bz2
core-i18n/en/genders_en.nt.bz2
core-i18n/en/instance_types_dbtax-dbo.nt.bz2
core-i18n/en/instance_types_dbtax_ext.nt.bz2
core-i18n/en/instance_types_lhd_dbo_en.nt.bz2
core-i18n/en/instance_types_lhd_ext_en.nt.bz2
core-i18n/en/interlanguage-links_en.nt.bz2
core-i18n/en/out-degree_en.nt.bz2
core-i18n/en/page-length_en.nt.bz2
core-i18n/en/page-links_en.nt.bz2
core-i18n/en/pnd_en.nt.bz2
core-i18n/en/redirects_en.nt.bz2
core-i18n/en/topical-concepts_en.nt.bz2

As Virtuoso can only import plain (uncompressed) or gzipped files, but the DBpedia dumps are bzipped, you can either repack them into gzip format or extract them. On our server the importing procedure was reasonably slower from extracted files than from gzipped ones (ignoring the vast amount of wasted disk space for the extracted files). File access becomes a bottleneck if you have a couple of cores idling. This is why I decided on repacking all the files from bz2 to gz. As you can see I do the repacking with the parallel versions of bz2 and gz. If that’s not suitable for you, feel free to change it. You might also want to change this if you want to do it in parallel to downloading. The repackaging process below took about 30 minutes but was worth it in the end. The more CPUs you have, the more you can parallelize this process.

# if you want to save space do this:
apt-get install pigz pbzip2
for i in core/*.nt.bz2 core-i18n/*/*.nt.bz2 ; do echo $i ; pbzip2 -dc "$i" | pigz - > "${i%bz2}gz" && rm "$i" ; done

# else do:
#pbzip2 */*.bz2

# notice that the extraction (and repacking) of *.bz2 takes quite a while (about 30 minutes)
# gzipped data is reasonably packed, but still very fast to access (in contrast to bz2), so maybe this is the best choice.

Data Cleaning and The bulk loader scripts

In contrast to the previous versions of this article the Virtuoso import will take care of shortening too long IRIs itself. Also it seems the bulk loader script is included in the more recent Virtuoso versions, so as a reference only: see the old version for the cleaning script and VirtBulkRDFLoaderExampleDbpedia and
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderScript
for info about the bulk loader scripts.

Importing DBpedia dumps into Virtuoso

Now AFTER the re-/unpacking of the DBpedia dumps we will register all files in the DBpedia dir (recursively ld_dir_all) to be added to the DBpedia graph. If you use this method make sure that only files reside in the given subtree that you really want to import.
Also don’t forget to import the dbpedia_2015-04.owl file!
If you only want one directory’s files to be added (non recursive) use ld_dir('dir', '*.*', 'graph');.
If you manually want to add some files, use ld_add('file', 'graph');.
See the VirtBulkRDFLoaderScript file for details.

Be warned that it might be a bad idea to import the normal and i18n dataset into the same graph if you didn’t select specific languages, as it might introduce a lot of duplicates that are hard to disentangle.

In order to keep track (and easily reproduce) what was selected and imported into which graph, I actually link (ln -s) the repacked files into a directory structure beneath /usr/local/data/datasets/dbpedia/2015-04/importedGraphs/ and import from there instead. To make sure you think about this, I use that path below, so it won’t work if you didn’t pay attention. If you really want to import all downloaded files, just import /usr/local/data/datasets/dbpedia/2015-04/.

Also be aware of the fact that if you load certain parts of dumps in different graphs (such as I did with the pagelinks, as well as the i18n versions of the DE and FR datasets) that only triples from the http://dbpedia.org graph will be shown when you visit the local pages with your browser (SPARQL is unaffected by this)!

So if you only want to load the same datasets as loaded on the official endpoint then importing the core folder (first section below) and dbpedia_2015-04.owl file should be enough.

The following will prepare the linking for the datasets we loaded:

cd /usr/local/data/datasets/dbpedia/2015-04/
mkdir importedGraphs
cd importedGraphs

mkdir dbpedia.org
cd dbpedia.org
# ln -s ../../dbpedia*.owl ./  # see below!
ln -s ../../core/*.nt.gz ./
cd ..

mkdir ext.dbpedia.org
cd ext.dbpedia.org
ln -s ../../core-i18n/en/anchor-text_en.nt.gz ./
ln -s ../../core-i18n/en/article-templates_en.nt.gz ./
ln -s ../../core-i18n/en/genders_en.nt.gz ./
ln -s ../../core-i18n/en/instance_types_dbtax-dbo.nt.gz ./
ln -s ../../core-i18n/en/instance_types_dbtax_ext.nt.gz ./
ln -s ../../core-i18n/en/instance_types_lhd_dbo_en.nt.gz ./
ln -s ../../core-i18n/en/instance_types_lhd_ext_en.nt.gz ./
ln -s ../../core-i18n/en/out-degree_en.nt.gz ./
ln -s ../../core-i18n/en/page-length_en.nt.gz ./
cd ..

mkdir pagelinks.dbpedia.org
cd pagelinks.dbpedia.org
ln -s ../../core-i18n/en/page-links_en.nt.gz ./
cd ..

mkdir topicalconcepts.dbpedia.org
cd topicalconcepts.dbpedia.org
ln -s ../../core-i18n/en/topical-concepts_en.nt.gz ./
cd ..


mkdir de.dbpedia.org
cd de.dbpedia.org
ln -s ../../core-i18n/de/article-categories_de.nt.gz ./
ln -s ../../core-i18n/de/article-templates_de.nt.gz ./
ln -s ../../core-i18n/de/category-labels_de.nt.gz ./
ln -s ../../core-i18n/de/disambiguations_de.nt.gz ./
ln -s ../../core-i18n/de/external-links_de.nt.gz ./
ln -s ../../core-i18n/de/freebase-links_de.nt.gz ./
ln -s ../../core-i18n/de/geo-coordinates_de.nt.gz ./
ln -s ../../core-i18n/de/geonames_links_de.nt.gz ./
ln -s ../../core-i18n/de/homepages_de.nt.gz ./
ln -s ../../core-i18n/de/images_de.nt.gz ./
ln -s ../../core-i18n/de/infobox-properties_de.nt.gz ./
ln -s ../../core-i18n/de/infobox-property-definitions_de.nt.gz ./
ln -s ../../core-i18n/de/instance-types_de.nt.gz ./
ln -s ../../core-i18n/de/instance_types_lhd_dbo_de.nt.gz ./
ln -s ../../core-i18n/de/instance_types_lhd_ext_de.nt.gz ./
ln -s ../../core-i18n/de/instance-types-transitive_de.nt.gz ./
ln -s ../../core-i18n/de/interlanguage-links-chapters_de.nt.gz ./
ln -s ../../core-i18n/de/interlanguage-links_de.nt.gz ./
ln -s ../../core-i18n/de/iri-same-as-uri_de.nt.gz ./
ln -s ../../core-i18n/de/labels_de.nt.gz ./
ln -s ../../core-i18n/de/long-abstracts_de.nt.gz ./
ln -s ../../core-i18n/de/mappingbased-properties_de.nt.gz ./
ln -s ../../core-i18n/de/out-degree_de.nt.gz ./
ln -s ../../core-i18n/de/page-ids_de.nt.gz ./
ln -s ../../core-i18n/de/page-length_de.nt.gz ./
ln -s ../../core-i18n/de/persondata_de.nt.gz ./
ln -s ../../core-i18n/de/pnd_de.nt.gz ./
ln -s ../../core-i18n/de/revision-ids_de.nt.gz ./
ln -s ../../core-i18n/de/revision-uris_de.nt.gz ./
ln -s ../../core-i18n/de/short-abstracts_de.nt.gz ./
ln -s ../../core-i18n/de/skos-categories_de.nt.gz ./
ln -s ../../core-i18n/de/specific-mappingbased-properties_de.nt.gz ./
ln -s ../../core-i18n/de/transitive-redirects_de.nt.gz ./
ln -s ../../core-i18n/de/wikipedia-links_de.nt.gz ./
cd ..

mkdir pagelinks.de.dbpedia.org
cd pagelinks.de.dbpedia.org
ln -s ../../core-i18n/de/page-links_de.nt.gz ./
cd ..

This should have prepared your importedGraphs directory. From this directory you can run the following command which prints out the necessary isql-vt commands to register your graphs for importing:

for g in * ; do echo "ld_dir_all('$(pwd)/$g', '*.*', 'http://$g');" ; done

One more thing (thanks to Romain): In order for the DBpedia.vad package (which is installed at the end) to work correctly, the dbpedia_2014.owl file needs to be imported into graph http://dbpedia.org/resource/classes#.

Note: In the following i will assume that your Virtuoso isql command is called isql-vt. If you’re in lack of such a command, it might be called isql or isql-v, but this usually means you installed it using some other method than described in here

isql-vt # enter Virtuoso isql mode
-- we are in sql mode now
ld_add('/usr/local/data/datasets/remote/dbpedia/2015-04/dbpedia_2015-04.owl', 'http://dbpedia.org/resource/classes#');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org', '*.*', 'http://dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org', '*.*', 'http://de.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org', '*.*', 'http://ext.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/pagelinks.dbpedia.org', '*.*', 'http://pagelinks.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/pagelinks.de.dbpedia.org', '*.*', 'http://pagelinks.de.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/topicalconcepts.dbpedia.org', '*.*', 'http://topicalconcepts.dbpedia.org');

-- do the following to see which files were registered to be added:
SELECT * FROM DB.DBA.LOAD_LIST;
-- if unsatisfied use:
-- delete from DB.DBA.LOAD_LIST;
EXIT;

You can now also register other datasets like Freebase, DBLP, Yago, Umbel and Schema.org … that you want to be loaded after downloading them to the appropriate directories like this:

ld_add('/usr/local/data/datasets/remote/schema.org/2015-11-04/all.nt', 'http://schema.org');
ld_dir_all('/usr/local/data/datasets/remote/umbel/External Ontologies', '*.n3', 'http://umbel.org/umbel/rc');
ld_add('/usr/local/data/datasets/remote/umbel/Ontology/umbel.n3', 'http://umbel.org/umbel');
ld_add('/usr/local/data/datasets/remote/umbel/Reference Structure/umbel_reference_concepts.n3', 'http://umbel.org/umbel/rc');
ld_add('/usr/local/data/datasets/remote/yago/yago3/2015-11-04/yagoLabels.ttl.gz', 'http://yago-knowledge.org/resource');

ld_add('/usr/local/data/datasets/remote/dblp/l3s/2015-11-04/dblp.nt.gz', 'http://dblp.l3s.de');

ld_dir_all('/usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026', '*.nt.gz', 'http://www.wikidata.org');
ld_dir_all('/usr/local/data/datasets/remote/freebase/2015-08-09', '*.nt.gz', 'http://rdf.freebase.com');
ld_dir_all('/usr/local/data/datasets/remote/linkedgeodata/2014-09-09', '*.*', 'http://linkedgeodata.org');

Our full DB.DBA.LOAD_LIST currently looks like this:

SELECT ll_graph, ll_file FROM DB.DBA.LOAD_LIST;
ll_graph                               ll_file
VARCHAR                                VARCHAR NOT NULL
____________________________________

http://dblp.l3s.de                     /usr/local/data/datasets/remote/dblp/l3s/2015-11-04/dblp.nt.gz
http://dbpedia.org/resource/classes#   /usr/local/data/datasets/remote/dbpedia/2015-04/dbpedia_2015-04.owl
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/amsterdammuseum_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/article-categories_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/bbcwildlife_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/bookmashup_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/bricklink_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/category-labels_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/cordis_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/dailymed_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/dblp_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/dbpedia_2015-04.owl
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/dbtune_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/disambiguations_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/diseasome_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/drugbank_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/eunis_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/eurostat_linkedstatistics_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/eurostat_wbsg_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/external-links_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/factbook_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/flickrwrappr_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/freebase-links_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/gadm_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/geo-coordinates_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/geonames_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/geonames_links_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/geospecies_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/gho_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/gutenberg_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/homepages_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/images_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/infobox-properties_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/infobox-property-definitions_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/instance-types-transitive_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/instance-types_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/instance_types_sdtyped-dbo_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/interlanguage-links-chapters_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/iri-same-as-uri_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/italian_public_schools_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_ar.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_de.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_es.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_fr.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_it.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_ja.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_nl.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_pl.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_pt.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_ru.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels-en-uris_zh.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/labels_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/linkedgeodata_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/linkedmdb_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/lobid.org-manifestation.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/lobid.org-organization.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_ar.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_de.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_es.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_fr.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_it.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_ja.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_nl.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_pl.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_pt.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_ru.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts-en-uris_zh.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/long-abstracts_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/mappingbased-properties_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/musicbrainz_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/nuts_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/nytimes_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/opencyc_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/openei_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/page-ids_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/persondata_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/revision-ids_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/revision-uris_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/revyu_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_ar.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_de.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_es.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_fr.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_it.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_ja.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_nl.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_pl.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_pt.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_ru.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts-en-uris_zh.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/short-abstracts_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/sider_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/skos-categories_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/specific-mappingbased-properties_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/tcm_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/transitive-redirects_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/transparency_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/uk-university_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/umbel_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/uscensus_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/viaf_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/wikicompany_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/wikipedia-links_en.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/wordnet_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/yago_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/yago_taxonomy.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/yago_type_links.nt.gz
http://dbpedia.org                     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/dbpedia.org/yago_types.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/article-categories_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/article-templates_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/category-labels_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/disambiguations_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/external-links_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/freebase-links_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/geo-coordinates_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/geonames_links_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/homepages_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/images_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/infobox-properties_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/infobox-property-definitions_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/instance-types-transitive_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/instance-types_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/instance_types_lhd_dbo_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/instance_types_lhd_ext_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/interlanguage-links-chapters_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/interlanguage-links_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/iri-same-as-uri_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/labels_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/long-abstracts_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/mappingbased-properties_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/out-degree_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/page-ids_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/page-length_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/persondata_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/pnd_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/revision-ids_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/revision-uris_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/short-abstracts_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/skos-categories_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/specific-mappingbased-properties_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/transitive-redirects_de.nt.gz
http://de.dbpedia.org                  /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/de.dbpedia.org/wikipedia-links_de.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/anchor-text_en.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/article-templates_en.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/genders_en.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/instance_types_dbtax-dbo.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/instance_types_dbtax_ext.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/instance_types_lhd_dbo_en.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/instance_types_lhd_ext_en.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/out-degree_en.nt.gz
http://ext.dbpedia.org                 /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/ext.dbpedia.org/page-length_en.nt.gz
http://pagelinks.dbpedia.org           /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/pagelinks.dbpedia.org/page-links_en.nt.gz
http://pagelinks.de.dbpedia.org        /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/pagelinks.de.dbpedia.org/page-links_de.nt.gz
http://topicalconcepts.dbpedia.org     /usr/local/data/datasets/remote/dbpedia/2015-04/importedGraphs/topicalconcepts.dbpedia.org/topical-concepts_en.nt.gz
http://rdf.freebase.com                /usr/local/data/datasets/remote/freebase/2015-08-09/fb2w.nt.gz
http://rdf.freebase.com                /usr/local/data/datasets/remote/freebase/2015-08-09/freebase-rdf-2015-08-09-00-01.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Abutters.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Abutters.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-AerialwayThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-AerialwayThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-AerowayThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-AerowayThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Amenity.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Amenity.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-BarrierThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-BarrierThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Boundary.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Boundary.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Craft.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Craft.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-CyclewayThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-CyclewayThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-EmergencyThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-EmergencyThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-HistoricThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-HistoricThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Leisure.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Leisure.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-LockThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-LockThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-ManMadeThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-ManMadeThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-MilitaryThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-MilitaryThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Office.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Office.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Place.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Place.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-PowerThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-PowerThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-PublicTransportThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-PublicTransportThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-RailwayThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-RailwayThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-RouteThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-RouteThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Shop.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-Shop.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-SportThing.node.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-SportThing.way.sorted.nt.gz
http://linkedgeodata.org               /usr/local/data/datasets/remote/linkedgeodata/2014-09-09/2014-09-09-ontology.sorted.nt.gz
http://schema.org                      /usr/local/data/datasets/remote/schema.org/2015-11-04/all.nt
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/External Ontologies/dbpedia-ontology.n3
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/External Ontologies/geonames.n3
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/External Ontologies/opencyc.n3
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/External Ontologies/same-as.n3
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/External Ontologies/schema.org.n3
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/External Ontologies/wikipedia.n3
http://umbel.org/umbel                 /usr/local/data/datasets/remote/umbel/Ontology/umbel.n3
http://umbel.org/umbel/rc              /usr/local/data/datasets/remote/umbel/Reference Structure/umbel_reference_concepts.n3
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-instances.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-properties.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-property-taxonomy.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-simple-statements.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-sitelinks.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-statements.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-taxonomy.nt.gz
http://www.wikidata.org                /usr/local/data/datasets/remote/wikidata/tools.wmflabs.org/wikidata-exports/rdf/exports/20151026/wikidata-terms.nt.gz
http://yago-knowledge.org/resource     /usr/local/data/datasets/remote/yago/yago3/2015-11-04/yagoLabels.ttl.gz

219 Rows. -- 8 msec.

OK, now comes the fun (and long part: about 1.5 hours (new Virtuoso 7 is cool 😉 for DBpedia alone, +~6 hours for Freebase)… After we registered the files to be added, now let’s finally start the process. Fire up screen if you didn’t already. (For more detailed metering than below see VirtTipsAndTricksGuideLDMeterUtility.)

sudo apt-get install screen
screen isql-vt
rdf_loader_run();
-- DO NOT USE THE DB BESIDES THE FOLLOWING COMMANDS:
-- depending on the amount of CPUs and your IO performance you can run
-- more rdf_loader_run(); commands in other isql-vt sessions which will
-- speed up the import process.
-- you can watch the progress from another isql-vt session with:
-- select * from DB.DBA.LOAD_LIST;
-- if you need to stop the loading for any reason: rdf_load_stop();
-- if you want to force stopping: rdf_load_stop(1);
checkpoint;
commit WORK;
checkpoint;
EXIT;

After this:
Take a look into var/lib/virtuoso/db/virtuoso.log and run this:

isql-vt BANNER=OFF VERBOSE=OFF 'EXEC=SELECT * FROM DB.DBA.LOAD_LIST WHERE ll_error IS NOT NULL;'

Should you find any errors in there… FIX THEM! You might be able to use the dump, but it’s incomplete in those cases. Any error quits out of the loading of the corresponding file and continues with the next one, so you’re only using the part of that file up to the place where the error occurred. (Should you find errors you can’t fix, please leave a comment.)

Final polishing

You can & should now install the DBpedia and RDF Mappers packages from the Virtuoso Conductor.
http://your-server:8890

login: dba
pw: dba

Go to System Admin / Packages. Install the DBpedia (v. 1.4.30) and rdf_mappers (v. 1.34.74) packages (takes about 5 minutes).

Testing your local mirror

Go to the sparql-endpoint of your server http://your-server:8890/sparql (or in isql-vt prefix with: SPARQL)

sparql SELECT COUNT(*) WHERE { ?s ?p ?o } ;

This shouldn’t take long in Virtuoso 7 anymore and for me now returns 849,521,186 for DBpedia (en+de) or 5,959,006,725 with all the datasets mentioned above.

I also like this query showing all the graphs and how many triples are in them:

sparql SELECT ?g COUNT(*) AS ?c { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC(?c);
g                                                            c
LONG VARCHAR                                                 LONG VARCHAR
__________________________________________________________

http://rdf.freebase.com                                      3126890738
http://linkedgeodata.org                                     1013866920
http://www.wikidata.org                                      841008708
http://dbpedia.org                                           411914840
http://pagelinks.dbpedia.org                                 158878272
http://de.dbpedia.org                                        119876594
http://ext.dbpedia.org                                       99042212
http://dblp.l3s.de                                           81987210
http://pagelinks.de.dbpedia.org                              59622795
http://yago-knowledge.org/resource                           44963422
http://umbel.org/umbel/rc                                    480616
http://www.openlinksw.com/schemas/RDF_Mapper_Ontology/1.0/   256065
http://topicalconcepts.dbpedia.org                           157560
http://dbpedia.org/resource/classes#                         28880
http://schema.org                                            8727
http://localhost:8890/DAV/                                   4806
http://www.openlinksw.com/schemas/virtrdf#                   2472
http://umbel.org/umbel                                       1584
http://OPEN.vocab.org/terms                                  1480
http://purl.org/ontology/bibo/                               1226
http://purl.org/goodrelations/v1                             937
http://purl.org/dc/terms/                                    857
http://www.openlinksw.com/schemas/opengraph                  804
http://www.openlinksw.com/schemas/linkedin                   741
http://www.openlinksw.com/schemas/googleplus                 696
http://www.openlinksw.com/schemas/google-base                691
http://www.openlinksw.com/schemas/cv                         661
virtrdf-label                                                638
http://xmlns.com/foaf/0.1/                                   557
http://rdfs.org/sioc/ns#                                     553
http://www.openlinksw.com/schemas/evri                       482
http://www.openlinksw.com/schemas/crunchbase                 444
http://bblfish.net/WORK/atom-owl/2006-06-06/                 386
http://scot-project.org/scot/ns#                             332
http://www.openlinksw.com/schemas/zillow                     311
http://www.w3.org/2004/02/skos/core                          252
http://www.openlinksw.com/schemas/cnet                       225
http://www.openlinksw.com/schemas/tesco                      183
http://www.openlinksw.com/schemas/bestbuy                    172
http://www.w3.org/2002/07/owl#                               160
http://www.w3.org/2002/07/owl                                160
http://www.openlinksw.com/schemas/angel#                     144
http://www.openlinksw.com/schemas/amazon                     143
http://purl.org/dc/elements/1.1/                             139
http://www.w3.org/2007/05/powder-s#                          117
http://www.openlinksw.com/schemas/twitter                    103
http://www.openlinksw.com/schemas/stackoverflow#             102
http://www.openlinksw.com/schemas/klout                      90
http://www.w3.org/2000/01/rdf-schema#                        87
http://www.w3.org/1999/02/22-rdf-syntax-ns#                  85
http://www.openlinksw.com/schemas/ebay                       79
http://www.openlinksw.com/schema/attribution#                68
http://www.openlinksw.com/schemas/nyt                        41
http://www.openlinksw.com/schemas/wolframalpha#              32
http://www.openlinksw.com/schemas/oplbase                    26
http://www.openlinksw.com/schemas/cert#                      23
http://www.openlinksw.com/schemas/dbpedia-spotlight#         21
http://www.openlinksw.com/schemas/money                      21
http://localhost:8890/sparql                                 14
http://dbpedia.org/schema/property_rules#                    12
dbprdf-label                                                 6
http://www.w3.org/ns/ldp#                                    3

62 ROWS. -- 58092 msec.

Congratulations, you just imported nearly 850 million triples (or nearly 6 G triples for all datasets).

Backing up this initial state

Now is a good moment to backup the whole db (takes about half an hour):

sudo -i
cd /
/etc/init.d/virtuoso-opensource stop &&
tar -cvf - /var/lib/virtuoso | lzop > virtuoso-7.1.0-DBDUMP-$(date '+%F')-dbpedia-2015-04-en_de.tar.lzop &&
/etc/init.d/virtuoso-opensource start

Afterwards you might want to repack this with xz (lzma) like this:

# apt-get install xz pxz
for f in virtuoso-7.1.0-DBDUMP-*.tar.lzop ; do lzop -d -c "$f" | pxz > "${f%lzop}.xz" ; done

Yay, done 😉
As always, feel free to leave comments if i made a mistake or to tell us about your problems or how happy you are :D.

Thanks

Many thanks to the DBpedia team for their endless efforts of providing us all with a great dataset. Also many thanks to the Virtuoso crew for releasing an OpenSource version of their DB.

Updates

  • 2015-12-07: added a check for older installed versions.

Setting up a local DBpedia 2014 mirror with Virtuoso 7.1.0

Newer version available: Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuso 7.2.1 and Docker (optional)

So you’re the guy who is allowed to setup a local DBpedia mirror or more generally a local Linked Data mirror for your work group? OK, today is your lucky day and you’re in the right place. I hope you’ll be able to benefit from my many hours of trials and errors. If anything goes wrong (or everything works fine), feel free to leave a comment below.

Versions of this guide

There are three older versions of this guide:

  • Oct. 2010: The first version focusing on DBpedia 3.5 – 3.6 and Virtuoso 6.1
  • May 2012: A bigger update to DBpedia 3.7 (new local language versions) and Virtuoso 6.1.5+ (with a lot of updates making pre-processing of the dumps easier)
  • Apr. 2014: Update to DBpedia 3.9 and Virtuoso 7

In this step by step guide I’ll tell you how to install a local Linked Data mirror of the DBpedia 2014, hosting a combination of the regular English and (exemplary) the i18n German datasets adding up to over half a billion triples. If this isn’t enough you can also follow the links to the Freebase, DBLP, Yago, Umbel and Schema.org datasets / vocabularies adding up to over 3.5 billion triples.

Let’s jump in.

Used Versions

  • DBpedia 2014
  • Virtuoso OpenSource 7.1.0
  • Ubuntu 14.04 LTS

Prerequesits

A strong machine with root access and enough RAM: We used a VM with 4 Cores and 32 GBs of RAM for DBpedia only. If you intend to also load Freebase and other datasets i recommend at least 64 GBs of RAM (we actually ended up using a 16 Core, 256 GB RAM Server). For installing i recommend more than 128 GB free HD space for DBpedia alone, 256 GB if you want to load Freebase as well, especially for downloading and repacking the datasets, as well as the growing database file when importing (mine grew to 50 GBs for DBpedia and 180 GB with Freebase).

Let’s go

Download and install virtuoso

Go and download virtuoso opensource: either from http://sourceforge.net/projects/virtuoso/ (make sure you get v7.1.0 as in this guide or a newer version).

Put the file in your home dir on the server, then extract it and switch to the directory:

cd ~
tar -xvzf virtuoso-7.1.0.tar.gz
cd virtuoso-opensource-7.1.0 # or newer, depending what you got

Now do the following to install the prerequisites and then build virtuoso:

sudo aptitude install libxml2-dev libssl-dev autoconf libgraphviz-dev \
     libmagickcore-dev libmagickwand-dev dnsutils gawk bison flex gperf

# NOTICE: the following will _not_ install into /usr/local but into /usr
# (so might clash with packages by your distribution if you install
# "the" virtuoso package)
# You'll find the db in /var/lib/virtuoso/db !
# check output for errors and FIX THEM! (e.g., install missing packages)
export CFLAGS="-O2 -m64"
./configure --with-layout=debian --enable-dbpedia-vad --enable-rdfmappers-vad

# the following will build with 5 processes in parallel
# choose something like your server's #CPUs + 1
make -j5

This will take about 5 min

sudo make install

Now change the following values in /var/lib/virtuoso/db/virtuoso.ini, the performance tuning stuff is according to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning:

# note: virtuoso ignores lines starting with whitespace and stuff after a ;
[Parameters]
# you need to include the directory where your datasets will be downloaded
# to, in our case /usr/local/data/datasets:
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
# IMPORTANT: for performance also do this
[Parameters]
# the following two are as suggested by comments in the original .ini
# file in order to use the RAM on your server:
NumberOfBuffers = 2720000
MaxDirtyBuffers = 2000000
# each buffer caches a 8K page of data and occupies approx. 8700 bytes of
# memory. It's suggested to set this value to 65 % of ram for a db only server
# so if you have 32 GB of ram: 32*1000^3*0.65/8700 = 2390804
# default is 2000 which will use 16 MB ram ;)
# Make sure to remove whitespace if you uncomment existing lines!
[Database]
MaxCheckpointRemap = 625000
# set this to 1/4th of NumberOfBuffers
[SPARQL]
# I like to increase the ResultSetMaxrows, MaxQueryCostEstimationTime
# and MaxQueryExecutionTime drastically as it's a local store where we
# do quite complex queries... up to you (don't do this if a lot of people
# use it).
# In any case for the importer to be more robust add the following setting
# to this section:
ShortenLongURIs = 1

The next step installs an init-script (autostart) and starts the virtuoso server. (If you’ve changed directories to edit /var/lib/virtuoso/db/virtuoso.ini, go back to the virtuoso source dir!):

sudo cp debian/init.d /etc/init.d/virtuoso-opensource &&
sudo chmod a+x /etc/init.d/virtuoso-opensource &&
sudo bash debian/virtuoso-opensource.postinst.debhelper

You should now have a running virtuoso server.

DBpedia URIs (en) vs. DBpedia IRIs (i18n)

The DBpedia 2014 consists of several datasets: one “standard” English version and several localized versions for other languages (i18n). The standard version mints URIs by going through all English Wikipedia articles. For all of these the Wikipedia cross-language links are used to extract corresponding labels in other languages for the en URIs (e.g., de/labels_en_uris_de.nt.bz2). This is problematic as for example articles which are only in the German Wikipedia won’t be extracted. To solve this problem the i18n versions exists and create IRIs in the form of de.dbpedia.org for every article in the German Wikipedia (e.g., de/labels_de.nt.bz2).

This approach has several implications. For backwards compatibility reasons the standard DBpedia makes statements about URIs such as http://dbpedia.org/resource/Gerhard_Schr%C3%B6der while the local chapters, like the German one, make statements about IRIs such as http://de.dbpedia.org/resource/Gerhard_Schröder (note the ö). In other words and as written above: the standard DBpedia uses URIs to identify things, while the localized versions use IRIs. This also means that http://dbpedia.org/resource/Gerhard_Schröder shouldn’t work. That said, clicking the link will actually work as there is magic going on in your browser to give you what you probably meant. Using curl curl -i -L -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Gerhard_Schröder or SPARQLing the endpoint will nevertheless not be so nice/sloppy and can cause quite some headache: select * where { dbpedia:Gerhard_Schröder ?p ?o. } vs. select * where { <http://dbpedia.org/resource/Gerhard_Schr%C3%B6der> ?p ?o. }. In order to mitigate this historic problem a bit DBpedia actually offers owl:sameAs links from IRIs to URIs: en/iri_same_as_uri_en which you should load, so you at least have a link to what you want if someone tries to get info about an IRI.

As the standard DBpedia provides labels, abstracts and a couple other things in several languages, there are two types of files in the localized DBpedia folders: There are triples directly associating the English URIs with for example the German labels (de/labels_en_uris_de) and there are the localized triple files which associate for example the DE IRIs with the German labels (de/labels_de).

Downloading the DBpedia dump files & Repacking

For our group we decided that we wanted a reasonably complete mirror of the standard DBpedia (EN) (have a look at datasets loaded into the public DBpedia SPARQL Endpoint), but also the i18n versions for the German DBpedia loaded in separate graphs, as well as each of their pagelink datasets in another separate graph. For this we download the corresponding files in (NT) format as follows. If you need something different do so (and maybe report back if there were problems and how you solved them).

Another hint: Virtuoso can only import plain (uncompressed) or gzipped files, the DBpedia dumps are bzipped, so you either repack them into gzip format or extract them. On our server the importing procedure was reasonably slower from extracted files than from gzipped ones (ignoring the vast amount of wasted disk space for the extracted files). File access becomes a bottleneck if you have a couple of cores idling. This is why I decided on repacking all the files from bz2 to gz. As you can see I do the repacking per folder in parallel, if that’s not suitable for you, feel free to change it. You might also want to change this if you want to do it in parallel to downloading. The repackaging process below took about 1 hour but was worth it in the end. The more CPUs you have, the more you can parallelize this process.

# see comment above, you could also get the all_language.tar or another DBpedia version...
mkdir -p /usr/local/data/datasets/dbpedia/2014
cd /usr/local/data/datasets/dbpedia/2014
wget -r -nc -nH --cut-dirs=1 -np -l1 -A '*.nt.bz2' -A '*.owl' -R '*unredirected*' http://downloads.dbpedia.org/2014/{en/,de/,links/,dbpedia_2014.owl}

# if you want to save space do this:
for d in */ ; do for i in "${d%/}"/*.bz2 ; do bzcat "$i" | gzip > "${i%.bz2}.gz" && rm "$i" ; done & done
# else do:
#bunzip2 */*.bz2 &

# notice that the extraction (and repacking) of *.bz2 takes quite a while (about 1 hour)
# gzipped data is reasonably packed, but still very fast to access (in contrast to bz2), so maybe this is the best choice.

Data Cleaning and The bulk loader scripts

In contrast to the previous versions of this article the virtuoso import will take care of shortening too long IRIs itself. Also it seems the bulk loader script is included in the more recent Virtuoso versions, so as a reference only: see the old version for the cleaning script and VirtBulkRDFLoaderExampleDbpedia and
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderScript
for info about the bulk loader scripts.

Importing DBpedia dumps into virtuoso

Now AFTER the re-/unpacking of the DBpedia dumps we will register all files in the dbpedia dir (recursively ld_dir_all) to be added to the dbpedia graph. If you use this method make sure that only files reside in the given subtree that you really want to import.
Also don’t forget to import the dbpedia_2014.owl file (first step in the script below)!
If you only want one directory’s files to be added (non recursive) use ld_dir('dir', '*.*', 'graph');.
If you manually want to add some files, use ld_add('file', 'graph');.
See the VirtBulkRDFLoaderScript file for details.

Be warned that it might be a bad idea to import the normal and i18n dataset into one graph if you didn’t select specific languages, as it might introduce a lot of duplicates.

In order to keep track (and easily reproduce) what was selected and imported into which graph, I actually link (ln -s) the repacked files into a directory structure beneath /usr/local/data/datasets/dbpedia/2014/importedGraphs/ and import from there instead. To make sure you think about this, I use that path below, so it won’t work if you didn’t pay attention. If you really want to import all downloaded files, just import /usr/local/data/datasets/dbpedia/2014/.

Also be aware of the fact that if you load certain parts of dumps in different graphs (such as I did with the pagelinks, as well as the i18n versions of the DE and FR datasets) that only triples from the http://dbpedia.org graph will be shown when you visit the local pages with your browser (SPARQL is unaffected by this)!

So if you want to load the same datasets as loaded on the official endpoint (but restricted to the EN and DE ones ) the following should do the trick to link them up for the next steps:

cd /usr/local/data/datasets/dbpedia/2014/
mkdir importedGraphs
cd importedGraphs

mkdir dbpedia.org
cd dbpedia.org
# ln -s ../../dbpedia_2014.owl ./ # see below!
ln -s ../../links/* ./

ln -s ../../en/article_categories_en.nt.gz ./
ln -s ../../en/category_labels_en.nt.gz ./
ln -s ../../en/disambiguations_en.nt.gz ./
ln -s ../../en/external_links_en.nt.gz ./
ln -s ../../en/freebase_links_en.nt.gz ./
ln -s ../../en/geo_coordinates_en.nt.gz ./
ln -s ../../en/geonames_links_en_en.nt.gz ./
ln -s ../../en/homepages_en.nt.gz ./
ln -s ../../en/images_en.nt.gz ./
ln -s ../../en/infobox_properties_en.nt.gz ./
ln -s ../../en/infobox_property_definitions_en.nt.gz ./
ln -s ../../en/instance_types_en.nt.gz ./
ln -s ../../en/instance_types_heuristic_en.nt.gz ./
ln -s ../../en/interlanguage_links_chapters_en.nt.gz ./
ln -s ../../en/iri_same_as_uri_en.nt.gz ./
ln -s ../../en/labels_en.nt.gz ./
ln -s ../../en/long_abstracts_en.nt.gz ./
ln -s ../../en/mappingbased_properties_cleaned_en.nt.gz ./
ln -s ../../en/page_ids_en.nt.gz ./
ln -s ../../en/persondata_en.nt.gz ./
ln -s ../../en/redirects_transitive_en.nt.gz ./
ln -s ../../en/revision_ids_en.nt.gz ./
ln -s ../../en/revision_uris_en.nt.gz ./
ln -s ../../en/short_abstracts_en.nt.gz ./
ln -s ../../en/skos_categories_en.nt.gz ./
ln -s ../../en/specific_mappingbased_properties_en.nt.gz ./
ln -s ../../en/wikipedia_links_en.nt.gz ./

ln -s ../../de/labels_en_uris_de.nt.gz ./
ln -s ../../de/long_abstracts_en_uris_de.nt.gz ./
ln -s ../../de/short_abstracts_en_uris_de.nt.gz ./

ln -s ../../fr/labels_en_uris_fr.nt.gz ./
ln -s ../../fr/long_abstracts_en_uris_fr.nt.gz ./
ln -s ../../fr/short_abstracts_en_uris_fr.nt.gz ./
cd ..


mkdir ext.dbpedia.org
cd ext.dbpedia.org
ln -s ../../en/genders_en.nt.gz ./
ln -s ../../en/out_degree_en.nt.gz ./
ln -s ../../en/page_length_en.nt.gz ./
cd ..

mkdir pagelinks.dbpedia.org
cd pagelinks.dbpedia.org
ln -s ../../en/page_links_en.nt.gz ./
cd ..

mkdir topicalconcepts.dbpedia.org
cd topicalconcepts.dbpedia.org
ln -s ../../en/topical_concepts_en.nt.gz ./
cd ..


mkdir de.dbpedia.org
cd de.dbpedia.org
ln -s ../../de/article_categories_de.nt.gz ./
ln -s ../../de/category_labels_de.nt.gz ./
ln -s ../../de/disambiguations_de.nt.gz ./
ln -s ../../de/external_links_de.nt.gz ./
ln -s ../../de/freebase_links_de.nt.gz ./
ln -s ../../de/geo_coordinates_de.nt.gz ./
ln -s ../../de/homepages_de.nt.gz ./
ln -s ../../de/images_de.nt.gz ./
ln -s ../../de/infobox_properties_de.nt.gz ./
ln -s ../../de/infobox_property_definitions_de.nt.gz ./
ln -s ../../de/instance_types_de.nt.gz ./
ln -s ../../de/interlanguage_links_chapters_de.nt.gz ./
ln -s ../../de/iri_same_as_uri_de.nt.gz ./
ln -s ../../de/labels_de.nt.gz ./
ln -s ../../de/long_abstracts_de.nt.gz ./
ln -s ../../de/mappingbased_properties_de.nt.gz ./
ln -s ../../de/out_degree_de.nt.gz ./
ln -s ../../de/page_ids_de.nt.gz ./
ln -s ../../de/page_length_de.nt.gz ./
ln -s ../../de/persondata_de.nt.gz ./
ln -s ../../de/pnd_de.nt.gz ./
ln -s ../../de/redirects_transitive_de.nt.gz ./
ln -s ../../de/revision_ids_de.nt.gz ./
ln -s ../../de/revision_uris_de.nt.gz ./
ln -s ../../de/short_abstracts_de.nt.gz ./
ln -s ../../de/skos_categories_de.nt.gz ./
ln -s ../../de/specific_mappingbased_properties_de.nt.gz ./
ln -s ../../de/wikipedia_links_de.nt.gz ./
cd ..

mkdir pagelinks.de.dbpedia.org
cd pagelinks.de.dbpedia.org
ln -s ../../de/page_links_de.nt.gz ./
cd ..

This should have prepared your importedGraphs directory. From this directory you can run the following command which print out the necessary isql commands to register your graphs for importing:

for g in * ; do echo "ld_dir_all('$(pwd)/$g', '*.*', 'http://$g');" ; done

One more thing (thanks to Romain): In order for the DBpedia.vad package (which is installed at the end) to work correctly, the dbpedia_2014.owl file needs to be imported into graph http://dbpedia.org/resource/classes#.

Note: In the following i will assume that your virtuoso isql command is called isql. If you’re in lack of such a command it might be called isql-vt, but this usually means you installed it using some other method than described in here

isql # enter virtuoso sql mode
-- we are in sql mode now
ld_add('/usr/local/data/datasets/remote/dbpedia/2014/dbpedia_2014.owl', 'http://dbpedia.org/resource/classes#');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org', '*.*', 'http://dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org', '*.*', 'http://de.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/ext.dbpedia.org', '*.*', 'http://ext.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/pagelinks.dbpedia.org', '*.*', 'http://pagelinks.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/pagelinks.de.dbpedia.org', '*.*', 'http://pagelinks.de.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/topicalconcepts.dbpedia.org', '*.*', 'http://topicalconcepts.dbpedia.org');

-- do the following to see which files were registered to be added:
SELECT * FROM DB.DBA.LOAD_LIST;
-- if unsatisfied use:
-- delete from DB.DBA.LOAD_LIST;
EXIT;

You can now also register other datasets like Freebase, DBLP, Yago, Umbel and Schema.org … that you want to be loaded. Our full DB.DBA.LOAD_LIST currently looks like this:

SELECT ll_graph, ll_file FROM DB.DBA.LOAD_LIST;
ll_graph                             ll_file
VARCHAR                              VARCHAR NOT NULL
____________________________________

http://dblp.l3s.de                   /usr/local/data/datasets/remote/dblp/l3s/2014-11-08/dblp.nt.gz
http://dbpedia.org/resource/classes# /usr/local/data/datasets/remote/dbpedia/2014/dbpedia_2014.owl
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/amsterdammuseum_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/article_categories_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/bbcwildlife_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/bookmashup_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/bricklink_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/category_labels_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/cordis_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/dailymed_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/dblp_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/dbtune_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/disambiguations_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/diseasome_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/drugbank_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/eunis_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/eurostat_linkedstatistics_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/eurostat_wbsg_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/external_links_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/factbook_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/flickrwrappr_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/freebase_links_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/gadm_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/geo_coordinates_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/geonames_links_en_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/geospecies_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/gho_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/gutenberg_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/homepages_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/images_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/infobox_properties_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/infobox_property_definitions_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/instance_types_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/instance_types_heuristic_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/interlanguage_links_chapters_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/iri_same_as_uri_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/italian_public_schools_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/labels_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/labels_en_uris_de.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/labels_en_uris_fr.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/linkedgeodata_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/linkedmdb_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/long_abstracts_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/long_abstracts_en_uris_de.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/long_abstracts_en_uris_fr.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/mappingbased_properties_cleaned_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/musicbrainz_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/nytimes_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/opencyc_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/openei_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/page_ids_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/persondata_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/redirects_transitive_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/revision_ids_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/revision_uris_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/revyu_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/short_abstracts_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/short_abstracts_en_uris_de.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/short_abstracts_en_uris_fr.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/sider_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/skos_categories_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/specific_mappingbased_properties_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/tcm_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/umbel_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/uscensus_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/wikicompany_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/wikipedia_links_en.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/wordnet_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/yago_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/yago_taxonomy.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/yago_type_links.nt.gz
http://dbpedia.org                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/dbpedia.org/yago_types.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/article_categories_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/category_labels_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/disambiguations_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/external_links_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/freebase_links_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/geo_coordinates_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/homepages_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/images_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/infobox_properties_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/infobox_property_definitions_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/instance_types_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/interlanguage_links_chapters_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/iri_same_as_uri_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/labels_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/long_abstracts_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/mappingbased_properties_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/out_degree_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/page_ids_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/page_length_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/persondata_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/pnd_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/redirects_transitive_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/revision_ids_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/revision_uris_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/short_abstracts_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/skos_categories_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/specific_mappingbased_properties_de.nt.gz
http://de.dbpedia.org                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/de.dbpedia.org/wikipedia_links_de.nt.gz
http://ext.dbpedia.org               /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/ext.dbpedia.org/genders_en.nt.gz
http://ext.dbpedia.org               /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/ext.dbpedia.org/out_degree_en.nt.gz
http://ext.dbpedia.org               /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/ext.dbpedia.org/page_length_en.nt.gz
http://pagelinks.dbpedia.org         /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/pagelinks.dbpedia.org/page_links_en.nt.gz
http://pagelinks.de.dbpedia.org      /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/pagelinks.de.dbpedia.org/page_links_de.nt.gz
http://topicalconcepts.dbpedia.org   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/topicalconcepts.dbpedia.org/topical_concepts_en.nt.gz
http://rdf.freebase.com              /usr/local/data/datasets/remote/freebase/2014-11-02/freebase-rdf-2014-11-02-00-00.gz
http://schema.org                    /usr/local/data/datasets/remote/schema.org/2014-11-08/all.nt
http://umbel.org/umbel/rc/           /usr/local/data/datasets/remote/umbel/External Ontologies/dbpediaOntology.n3
http://umbel.org/umbel/rc/           /usr/local/data/datasets/remote/umbel/External Ontologies/schema.org.n3
http://umbel.org/umbel               /usr/local/data/datasets/remote/umbel/Ontology/umbel.n3
http://umbel.org/umbel/rc/           /usr/local/data/datasets/remote/umbel/Reference Structure/umbel_reference_concepts.n3
http://yago-knowledge.org/resource/  /usr/local/data/datasets/remote/yago/yago2/2012-12/yagoLabels.ttl.gz

114 Rows. -- 5 msec.

OK, now comes the fun (and long part: about 1.5 hours (new virtuoso 7 is cool 😉 for DBpedia alone, +~3 hours for Freebase)… After we registered the files to be added, now let’s finally start the process. Fire up screen if you didn’t already. (For more detailed metering than below see VirtTipsAndTricksGuideLDMeterUtility.)

sudo aptitude install screen
screen isql
rdf_loader_run();
-- DO NOT USE THE DB BESIDES THE FOLLOWING COMMANDS:
-- depending on the amount of CPUs and your IO performance you can run
-- more rdf_loader_run(); commands in other isql sessions which will
-- speed up the import process.
-- you can watch the progress from another isql session with:
-- select * from DB.DBA.LOAD_LIST;
-- if you need to stop the loading for any reason: rdf_load_stop ();
-- if you want to force stopping: rdf_load_stop(1);
checkpoint;
commit WORK;
checkpoint;
EXIT;

After this:
Take a look into var/lib/virtuoso/db/virtuoso.log file. Should you find any errors in there… FIX THEM! You might use the dump, but it’s incomplete then. Any error quits out of the loading of the corresponding file and continues with the next one, so you’re only using the part of that file up to the place where the error occurred. (Should you find errors you can’t fix please leave a comment.)

Final polishing

You can & should now install the DBpedia and RDF Mappers packages from the Virtuoso Conductor.
http://your-server:8890

login: dba
pw: dba

Go to System Admin / Packages. Install the dbpedia (v. 1.4.28) and rdf_mappers (v. 1.34.74) packages (takes about 5 minutes).

Testing your local mirror

Go to the sparql-endpoint of your server http://your-server:8890/sparql (or in isql prefix with: SPARQL)

sparql SELECT COUNT(*) WHERE { ?s ?p ?o } ;

This shouldn’t take long in Virtuoso 7 anymore and for me now returns 695,553,624 for DBpedia (en+de), 3,543,872,243 with DBpedia (en+de), Freebase, DBLP, Yago, Umbel and Schema.org.

I also like this query showing all the graphs and how many triples are in them:

sparql SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;
g                                                           callret-1
LONG VARCHAR                                                LONG VARCHAR
____________________________________________________________

http://rdf.freebase.com                                     2760013365
http://dbpedia.org                                          375176108
http://pagelinks.dbpedia.org                                149707899
http://de.dbpedia.org                                       92508750
http://dblp.l3s.de                                          72519345
http://pagelinks.de.dbpedia.org                             55804533
http://ext.dbpedia.org                                      21900162
http://yago-knowledge.org/resource                          15372307
http://umbel.org/umbel/rc                                   403452
http://www.openlinksw.com/schemas/RDF_Mapper_Ontology/1.0/  256065
http://topicalconcepts.dbpedia.org                          149638
http://dbpedia.org/resource/classes                         27063
http://schema.org                                           8727
http://localhost:8890/DAV/                                  6187
http://www.openlinksw.com/schemas/virtrdf#                  2639
http://umbel.org/umbel                                      1702
http://OPEN.vocab.org/terms                                 1480
http://purl.org/ontology/bibo/                              1226
http://purl.org/goodrelations/v1                            937
http://purl.org/dc/terms/                                   857
http://www.openlinksw.com/schemas/opengraph                 804
http://www.openlinksw.com/schemas/linkedin                  741
http://www.openlinksw.com/schemas/googleplus                696
http://www.openlinksw.com/schemas/google-base               691
http://www.openlinksw.com/schemas/cv                        661
virtrdf-label                                               638
http://xmlns.com/foaf/0.1/                                  557
http://rdfs.org/sioc/ns#                                    553
http://www.openlinksw.com/schemas/evri                      482
http://www.openlinksw.com/schemas/crunchbase                444
http://bblfish.net/WORK/atom-owl/2006-06-06/                386
http://scot-project.org/scot/ns#                            332
http://www.openlinksw.com/schemas/zillow                    311
http://www.w3.org/2004/02/skos/core                         252
http://www.openlinksw.com/schemas/cnet                      225
http://www.openlinksw.com/schemas/tesco                     183
http://www.openlinksw.com/schemas/bestbuy                   172
http://www.w3.org/2002/07/owl#                              160
http://www.w3.org/2002/07/owl                               160
http://www.openlinksw.com/schemas/angel#                    144
http://www.openlinksw.com/schemas/amazon                    143
http://purl.org/dc/elements/1.1/                            139
http://www.w3.org/2007/05/powder-s#                         117
http://www.openlinksw.com/schemas/twitter                   103
http://www.openlinksw.com/schemas/stackoverflow#            102
http://www.openlinksw.com/schemas/klout                     90
http://www.w3.org/2000/01/rdf-schema#                       87
http://www.w3.org/1999/02/22-rdf-syntax-ns#                 85
http://www.openlinksw.com/schemas/ebay                      79
http://www.openlinksw.com/schema/attribution#               68
http://www.openlinksw.com/schemas/nyt                       41
http://www.openlinksw.com/schemas/wolframalpha#             32
http://www.openlinksw.com/schemas/oplbase                   26
http://www.openlinksw.com/schemas/cert#                     23
http://www.openlinksw.com/schemas/money                     21
http://www.openlinksw.com/schemas/dbpedia-spotlight#        21
http://localhost:8890/sparql                                14
http://dbpedia.org/schema/property_rules#                   12
dbprdf-label                                                6

59 ROWS. -- 61717 msec.

Congratulations, you just imported over half a billion triples (or over 3.5 G triples).

Backing up this initial state

Now is a good moment to backup the whole db (takes about half an hour):

sudo -i
cd /
/etc/init.d/virtuoso-opensource stop &&
tar -cvf - /var/lib/virtuoso | lzop > virtuoso-7.1.0-DBDUMP-$(date '+%F')-dbpedia-2014-en_de.tar.lzop &&
/etc/init.d/virtuoso-opensource start

Afterwards you might want to repack this with xz (lzma) like this:

# aptitude install xz
for f in virtuoso-7.1.0-DBDUMP-*.tar.lzop ; do lzop -d -c "$f" | xz > "${f%lzop}.xz" ; done

Yay, done 😉
As always, feel free to leave comments if i made a mistake or to tell us about your problems or how happy you are :D.

Our database dump file

In case you really want exactly the same state of the public datasets that we have loaded (as described above) you can download our database dump (57 GB, md5sum, including: DBpedia 2014 en,de,links,dbpedia_2014.owl, Freebase, DBLP, Yago, Umbel and Schema.org).

Thanks

Many thanks to the DBpedia team for their endless efforts of providing us all with a great dataset. Also many thanks to the Virtuoso crew for releasing an opensource version of their DB.

Updates

  • 2014-11-11: Added link to our Dump-File
  • 2014-11-24: Thanks to Romain: Load dbpedia_2014.owl into graph http://dbpedia.org/resource/classes# for DBpedia.vad to find it when resolving http://your-server:8890/ontology/author for example.

Setting up a local DBpedia 3.9 mirror with Virtuoso 7

Newer version available: Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuso 7.2.1 and Docker (optional)

I just found this aged post in my drafts folder, maybe someone will still like it…

So you’re the guy who is allowed to setup a local DBpedia mirror or more generally a local Linked Data mirror for your work group? OK, today is your lucky day and you’re in the right place. I hope you’ll be able to benefit from my many hours of trials and errors. If anything goes wrong, feel free to leave me a comment below.

Versions of this guide

There are two older versions of this guide:

  • Oct. 2010: The first version focusing on DBpedia 3.5 – 3.6 and Virtuoso 6.1
  • May 2012: A bigger update to DBpedia 3.7 (new local language versions) and Virtuoso 6.1.5+ (with a lot of updates making pre-processing of the dumps easier)

With the recent release of Virtuoso 7 (way faster, thanks to Openlink!) and DBpedia 3.9 i again felt the urge to update this guide as a couple of things changed.

In this step by step guide I’ll tell you how to install a local Linked Data mirror of the DBpedia 3.9 hosting a combination of the regular English and (exemplary) the i18n German datasets adding up to nearly half a billion triples.

Let’s jump in.

Used Versions

  • DBpedia 3.9 + 3.9-i18n dataset
  • Virtuoso OpenSource 7.0.0
  • Ubuntu 12.04 LTS

Prerequesits

A strong machine with root access and enough RAM: We use a VM with 4 Cores and 32 GBs of RAM. For installing i recommend more than 128 GB free HD space, especially for downloading and repacking the datasets, as well as the growing database file when importing (mine grew to 41 GBs).

Let’s go

Download and install virtuoso

Go and download virtuoso opensource: either from http://sourceforge.net/projects/virtuoso/ (make sure you get v7.0.0 as in this guide or newer version).

Put the file in your home dir on the server, then extract it and switch to the directory:

cd ~
tar -xvzf virtuoso-7.0.0.tar.gz
cd virtuoso-opensource-7.0.0 # or newer, depending what you got

Now do the following to install the prerequisites and then build virtuoso:

sudo aptitude install libxml2-dev libssl-dev autoconf libgraphviz-dev
     libmagickcore-dev libmagickwand-dev dnsutils gawk bison flex gperf

# NOTICE: this will _not_ install into /usr/local but into /usr
# (so might clash with packages by your distribution if you install
# "the" virtuoso package)
# You'll find the db in /var/lib/virtuoso/db !
# check output for errors and FIX THEM! (e.g., install missing packages)
export CFLAGS="-O2 -m64"
./configure --with-layout=debian

# the following will build with 5 processes in parallel
# choose something like your server's #CPUs + 1
make -j5

This will take about 5 min

sudo make install

Now change the following values in /var/lib/virtuoso/db/virtuoso.ini, the performance tuning stuff is according to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning:

# note: virtuoso ignores lines starting with whitespace and stuff after a ;
[Parameters]
# you need to include the directory where your datasets will be downloaded
# to, in our case /usr/local/data/datasets:
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
# IMPORTANT: for performance also do this
[Parameters]
# the following two are as suggested by comments in the original .ini
# file in order to use the RAM on your server:
NumberOfBuffers = 2720000
MaxDirtyBuffers = 2000000
# each buffer caches a 8K page of data and occupies approx. 8700 bytes of
# memory. It's suggested to set this value to 65 % of ram for a db only server
# so if you have 32 GB of ram: 32*1000^3*0.65/8700 = 2390804
# default is 2000 which will use 16 MB ram ;)
# Make sure to remove whitespace if you uncomment existing lines!
[Database]
MaxCheckpointRemap = 625000
# set this to 1/4th of NumberOfBuffers
[SPARQL]
# I like to increase the ResultSetMaxrows, MaxQueryCostEstimationTime
# and MaxQueryExecutionTime drastically as it's a local store where we
# do quite complex queries... up to you (don't do this if a lot of people
# use it).
# In any case for the importer to be more robust add the following setting
# to this section:
ShortenLongURIs = 1

The next step installs an init-script (autostart) and starts the virtuoso server. (If you’ve changed directories to edit /var/lib/virtuoso/db/virtuoso.ini, go back to the virtuoso source dir!):

sudo cp debian/init.d /etc/init.d/virtuoso-opensource &&
sudo chmod a+x /etc/init.d/virtuoso-opensource &&
sudo bash debian/virtuoso-opensource.postinst.debhelper

You should now have a running virtuoso server.

DBpedia URIs (en) vs. DBpedia IRIs (i18n)

The DBpedia 3.9 consists of several datasets: one “standard” English version and several localized versions for other languages (i18n). The standard version mints URIs by going through all English Wikipedia articles. For all of these the Wikipedia cross-language links are used to extract corresponding labels in other languages for the en URIs (e.g., de/labels_en_uris_de.nt.bz2). This is problematic as for example articles which are only in the German Wikipedia won’t be extracted. To solve this problem the i18n versions exists and create IRIs in the form of de.dbpedia.org for every article in the German Wikipedia (e.g., de/labels_de.nt.bz2).

This approach has several implications. For backwards compatibility reasons the standard DBpedia makes statements about URIs such as http://dbpedia.org/resource/Gerhard_Schr%C3%B6der while the local chapters, like the German one, make statements about IRIs such as http://de.dbpedia.org/resource/Gerhard_Schröder (note the ö). In other words and as written above: the standard DBpedia uses URIs to identify things, while the localized versions use IRIs. This also means that http://dbpedia.org/resource/Gerhard_Schröder shouldn’t work. That said, clicking the link will actually work as there is magic going on in your browser to give you what you probably meant. Using curl curl -i -L -H "Accept: application/rdf+xml" http://dbpedia.org/resource/Gerhard_Schröder or SPARQLing the endpoint will nevertheless not be so nice/sloppy and can cause quite some headache: select * where { dbpedia:Gerhard_Schröder ?p ?o. } vs. select * where { <http://dbpedia.org/resource/Gerhard_Schr%C3%B6der> ?p ?o. }. In order to mitigate this historic problem a bit DBpedia actually offers owl:sameAs links from IRIs to URIs: en/iri_same_as_uri_en which you should load, so you at least have a link to what you want if someone tries to get info about an IRI.

As if this isn’t confusing enough there is another trap: If you were to download the .ttl files then you suddenly have all statements associated with the IRI for the standard DBpedia (unlike the online endpoint). The only reason i can think of for this inconsistency is that at some point the actual inconsisty of URIs in EN vs IRIs in everything else will be resolved. For now these files are most certainly not what you want! So use the .nt files!

As the standard DBpedia provides labels, abstracts and a couple other things in several languages, there are two types of files in the localized DBpedia folders: There are triples directly associating the English URIs with for example the German labels (de/labels_en_uris_de) and there are the localized triple files which associate for example the DE IRIs with the German labels (de/labels_de).

Downloading the DBpedia dump files & Repacking

For our group we decided that we wanted a reasonably complete mirror of the standard DBpedia (EN) (have a look at datasets loaded into the public DBpedia SPARQL Endpoint), but also the i18n versions for the German and French DBpedia loaded in separate graphs, as well as each of their pagelink datasets in another separate graph. For this we download the corresponding files in (NT) format (also see previous section with remarks about the TTL files!). If you need something different do so (and maybe report back if there were problems and how you solved them).

Another hint: Virtuoso can only import plain (uncompressed) or gzipped files, the DBpedia dumps are bzipped, so you either repack them into gzip format or extract them. On our server the importing procedure was reasonably slower from extracted files than from gzipped ones (ignoring the vast amount of wasted disk space for the extracted files). File access becomes a bottleneck if you have 4 cores idling. This is why I decided on repacking all the files from bz2 to gz. As you can see I do the repacking per folder in parallel, if that’s not suitable for you, feel free to change it. You might also want to change this if you want to do it in parallel to downloading. The repackaging process below took about 1 hour but was worth it in the end. The more CPUs you have, the more you can parallelize this process.

sudo -i # get root
# see comment above, you could also get the all_language.tar or another DBpedia version...
mkdir -p /usr/local/data/datasets/dbpedia/3.9
cd /usr/local/data/datasets/dbpedia/3.9
wget -r -nc -nH --cut-dirs=1 -np -l1 -A '*.nt.bz2' -A '*.owl' -R '*unredirected*' http://downloads.dbpedia.org/3.9/{en/,de/,fr/,links/,wikidata/,dbpedia_3.9.owl}

# if you want to save space do this:
for d in */ ; do for i in "${d%/}"/*.bz2 ; do bzcat "$i" | gzip > "${i%.bz2}.gz" && rm "$i" ; done & done
# else do:
#bunzip2 */*.bz2 &

# notice that the extraction (and repacking) of *.bz2 takes quite a while (about 1 hour)
# gzipped data is reasonably packed, but still very fast to access (in contrast to bz2), so maybe this is the best choice.

Data Cleaning and The bulk loader scripts

In contrast to the previous versions of this article the virtuoso import will take care of shortening too long IRIs itself. Also it seems the bulk loader script is included in the more recent Virtuoso versions, so as a reference only: see the old version for the cleaning script and VirtBulkRDFLoaderExampleDbpedia and
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderScript
for info about the bulk loader scripts.

Importing DBpedia dumps into virtuoso

Now AFTER the re-/unpacking of the DBpedia dumps we will register all files in the dbpedia dir (recursively ld_dir_all) to be added to the dbpedia graph. If you use this method make sure that only files reside in the given subtree that you really want to import.
Also don’t forget to import the dbpedia_3.9.owl file (last step in the script below)!
If you only want one directory’s files to be added (non recursive) use ld_dir.
If you manually want to add some files, use ld_add.
See the VirtBulkRDFLoaderScript file for args to pass.

Be warned that it might be a bad idea to import the normal and i18n dataset into one graph if you didn’t select specific languages, as it might introduce a lot of duplicates.

In order to keep track what was selected and imported into which graph, I actually link (ln -s) the repacked files into a directory structure beneath /usr/local/data/datasets/dbpedia/3.9/importedGraphs/ and import from there instead. To make sure you think about this, I use that path below, so it won’t work if you didn’t pay attention. If you really want to import all downloaded files, just import /usr/local/data/datasets/dbpedia/3.9/.

Also be aware of the fact that if you load certain parts of dumps in different graphs (such as I did with the pagelinks, as well as the i18n versions of the DE and FR datasets) that only triples from the http://dbpedia.org graph will be shown when you visit the local pages with your browser (SPARQL is unaffected by this)!

So if you want to load the same datasets as loaded on the official endpoint (but restricted to the EN,DE and FR ones ) the following should do the trick to link them up for the next steps:

cd /usr/local/data/datasets/dbpedia/3.9/
mkdir -p importedGraphs/dbpedia.org
cd importedGraphs/dbpedia.org
ln -s
  ../../en/article_categories_en.nt.gz
  ../../en/category_labels_en.nt.gz
  ../../en/disambiguations_en.nt.gz
  ../../en/external_links_en.nt.gz
  ../../en/geo_coordinates_en.nt.gz
  ../../en/homepages_en.nt.gz
  ../../en/images_en.nt.gz
  ../../en/instance_types_en.nt.gz
  ../../en/instance_types_heuristic_en.nt.gz
  ../../en/interlanguage_links_chapters_en.nt.gz
  ../../en/iri_same_as_uri_en.nt.gz
  ../../en/labels_en.nt.gz
  ../../en/long_abstracts_en.nt.gz
  ../../en/mappingbased_properties_cleaned_en.nt.gz
  ../../en/page_ids_en.nt.gz
  ../../en/persondata_en.nt.gz
  ../../en/pnd_en.nt.gz
  ../../en/raw_infobox_properties_en.nt.gz
  ../../en/raw_infobox_property_definitions_en.nt.gz
  ../../en/redirects_transitive_en.nt.gz
  ../../en/revision_ids_en.nt.gz
  ../../en/revision_uris_en.nt.gz
  ../../en/short_abstracts_en.nt.gz
  ../../en/skos_categories_en.nt.gz
  ../../en/specific_mappingbased_properties_en.nt.gz
  ../../en/wikipedia_links_en.nt.gz
  ../../de/labels_en_uris_de.nt.gz
  ../../de/long_abstracts_en_uris_de.nt.gz
  ../../de/pnd_en_uris_de.nt.gz
  ../../de/short_abstracts_en_uris_de.nt.gz
  ../../fr/labels_en_uris_fr.nt.gz
  ../../fr/long_abstracts_en_uris_fr.nt.gz
  ../../fr/short_abstracts_en_uris_fr.nt.gz
  ../../links/amsterdammuseum_links.nt.gz
  ../../links/bbcwildlife_links.nt.gz
  ../../links/bookmashup_links.nt.gz
  ../../links/bricklink_links.nt.gz
  ../../links/cordis_links.nt.gz
  ../../links/dailymed_links.nt.gz
  ../../links/dblp_links.nt.gz
  ../../links/dbtune_links.nt.gz
  ../../links/diseasome_links.nt.gz
  ../../links/drugbank_links.nt.gz
  ../../links/eunis_links.nt.gz
  ../../links/eurostat_linkedstatistics_links.nt.gz
  ../../links/eurostat_wbsg_links.nt.gz
  ../../links/factbook_links.nt.gz
  ../../links/flickrwrappr_links.nt.gz
  ../../links/freebase_links.nt.gz
  ../../links/gadm_links.nt.gz
  ../../links/geonames_links.nt.gz
  ../../links/geospecies_links.nt.gz
  ../../links/gho_links.nt.gz
  ../../links/gutenberg_links.nt.gz
  ../../links/italian_public_schools_links.nt.gz
  ../../links/linkedgeodata_links.nt.gz
  ../../links/linkedmdb_links.nt.gz
  ../../links/musicbrainz_links.nt.gz
  ../../links/nytimes_links.nt.gz
  ../../links/opencyc_links.nt.gz
  ../../links/openei_links.nt.gz
  ../../links/revyu_links.nt.gz
  ../../links/sider_links.nt.gz
  ../../links/tcm_links.nt.gz
  ../../links/umbel_links.nt.gz
  ../../links/uscensus_links.nt.gz
  ../../links/wikicompany_links.nt.gz
  ../../links/wordnet_links.nt.gz
  ../../links/yago_links.nt.gz
  ../../links/yago_taxonomy.nt.gz
  ../../links/yago_type_links.nt.gz
  ../../links/yago_types.nt.gz
  ../../dbpedia_3.9.owl
  ./

Note: in the following i will assume that your virtuoso isql command is called isql. If you’re in lack of such a command it might be called isql-vt, but this usually means you installed it using some other method than described in here

isql # enter virtuoso sql mode
-- we are in sql mode now
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/3.9/importedGraphs/dbpedia.org', '*.*', 'http://dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/3.9/importedGraphs/de.dbpedia.org', '*.*', 'http://de.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/3.9/importedGraphs/pagelinks.dbpedia.org', '*.*', 'http://pagelinks.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/3.9/importedGraphs/pagelinks.de.dbpedia.org', '*.*', 'http://pagelinks.de.dbpedia.org');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/3.9/importedGraphs/topicalconcepts.dbpedia.org', '*.*', 'http://topicalconcepts.dbpedia.org');

-- do the following to see which files were registered to be added:
SELECT * FROM DB.DBA.LOAD_LIST;
-- if unsatisfied use:
-- delete from DB.DBA.LOAD_LIST;
EXIT;

OK, now comes the fun (and long part: about 1.5 hours (new virtuoso 7 is cool 😉 )… We registered the files to be added, now let’s finally start the process. Fire up screen if you didn’t already.

sudo aptitude install screen
screen isql
rdf_loader_run();
-- DO NOT USE THE DB BESIDES THE FOLLOWING COMMANDS:
-- (I had some warnings about a possibly corrupt db in the log,
-- when I visited the virtuoso conductor during the first run...)
-- you can watch the progress from another isql session with:
-- select * from DB.DBA.LOAD_LIST;
-- if you need to stop the loading for any reason: rdf_load_stop ();
-- if you want to force stopping: rdf_load_stop(1);
checkpoint;
commit WORK;
checkpoint;
EXIT;

After this:
Take a look into var/lib/virtuoso/db/virtuoso.log file. Should you find any errors in there… FIX THEM! You might use the dump, but it’s incomplete then. Any error quits out of the loading of the corresponding file and continues with the next one, so you’re only using the part of that file up to the place where the error occurred. (Should you find errors you can’t fix in the way I did above, please leave a comment.)

Final polishing

You can & should now install the DBpedia and RDF Mappers packages from the Virtuoso Conductor.
http://your-server:8890

login: dba
pw: dba

Go to System Admin / Packages. Install the dbpedia (v. 1.3.83) and rdf_mappers (v. 1.34.72) packages (takes about 5 minutes).

Testing your local mirror

Go to the sparql-endpoint of your server http://your-server:8890/sparql (or in isql prefix with: SPARQL)

sparql SELECT COUNT(*) WHERE { ?s ?p ?o } ;

This shouldn’t take long in Virtuoso 7 anymore and for me now returns 567,173,934.
I also like this query showing all the graphs and how many triples are in them:

sparql SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;
g                                                          callret-1
LONG VARCHAR                                               LONG VARCHAR
___________________________________________________________

http://dbpedia.org                                         312505120
http://pagelinks.dbpedia.org                               136591822
http://de.dbpedia.org                                      67997676
http://pagelinks.de.dbpedia.org                            49664737
http://www.openlinksw.com/schemas/RDF_Mapper_Ontology/1.0/ 256065
http://topicalconcepts.dbpedia.org                         136887
http://localhost:8890/DAV/                                 4709
http://www.openlinksw.com/schemas/virtrdf#                 2617
http://OPEN.vocab.org/terms                                1480
http://purl.org/ontology/bibo/                             1226
http://purl.org/goodrelations/v1                           937
http://purl.org/dc/terms/                                  857
http://www.openlinksw.com/schemas/opengraph                804
http://www.openlinksw.com/schemas/linkedin                 741
http://www.openlinksw.com/schemas/googleplus               696
http://www.openlinksw.com/schemas/google-base              691
http://www.openlinksw.com/schemas/cv                       661
virtrdf-label                                              638
http://xmlns.com/foaf/0.1/                                 557
http://rdfs.org/sioc/ns#                                   553
http://www.openlinksw.com/schemas/evri                     482
http://www.openlinksw.com/schemas/crunchbase               444
http://bblfish.net/WORK/atom-owl/2006-06-06/               386
http://scot-project.org/scot/ns#                           332
http://www.openlinksw.com/schemas/zillow                   311
http://www.w3.org/2004/02/skos/core                        252
http://www.openlinksw.com/schemas/cnet                     225
http://www.openlinksw.com/schemas/tesco                    183
http://www.openlinksw.com/schemas/bestbuy                  172
http://www.w3.org/2002/07/owl#                             160
http://www.w3.org/2002/07/owl                              160
http://www.openlinksw.com/schemas/angel#                   144
http://www.openlinksw.com/schemas/amazon                   143
http://purl.org/dc/elements/1.1/                           139
http://www.w3.org/2007/05/powder-s#                        117
http://www.openlinksw.com/schemas/twitter                  103
http://www.openlinksw.com/schemas/stackoverflow#           102
http://www.openlinksw.com/schemas/klout                    90
http://www.w3.org/2000/01/rdf-schema#                      87
http://www.w3.org/1999/02/22-rdf-syntax-ns#                85
http://www.openlinksw.com/schemas/ebay                     79
http://www.openlinksw.com/schema/attribution#              68
http://www.openlinksw.com/schemas/nyt                      41
http://www.openlinksw.com/schemas/wolframalpha#            32
http://www.openlinksw.com/schemas/oplbase                  26
http://www.openlinksw.com/schemas/cert#                    23
http://www.openlinksw.com/schemas/money                    21
http://www.openlinksw.com/schemas/dbpedia-spotlight#       21
http://localhost:8890/sparql                               14
http://dbpedia.org/schema/property_rules#                  12
dbprdf-label                                               6

51 ROWS. -- 4563 msec.

Congratulations, you just imported nearly half a billion triples.

Backing up this initial state

Now is a good moment to backup the whole db (takes about half an hour):

sudo -i
cd /
/etc/init.d/virtuoso-opensource stop &&
tar -cvjf virtuoso-7.0.0-DBDUMP-dbpedia-3.9-en_de-$(date '+%F').tar.bz2 /var/lib/virtuoso &&
/etc/init.d/virtuoso-opensource start

Yay, done 😉
As always, feel free to leave comments if i made a mistake or to tell us about your problems or how happy you are :D.

Thanks

Many thanks to the DBpedia team for their endless efforts of providing us all with a great dataset. Also many thanks to the Virtuoso crew for releasing an opensource version of their DB.

Setting up a local DBpedia 3.7 mirror with Virtuoso 6.1.5+

Newer version available: Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuso 7.2.1 and Docker (optional)

Nearly 1.5 years after i initially published a post about how to setup a local DBpedia mirror i recently revisited the problem myself to setup a local mirror of the DBpedia 3.7.

Unlike the previous updates so many things have changed that I decided to put them into a separate post instead of continuing to update the old one making it more and more complicated.
Two of the most severe changes are that Virtuoso 6.1.5+ includes a setting making the importer more robust so the repacking of the files isn’t needed anymore and the changes of DBpedia 3.7 to also provide internationalized versions causing a couple of problems / inconsistencies.

In this step by step guide I’ll tell you how to install a local mirror of the DBpedia 3.7 hosting a combination of the regular English and the i18n German datasets adding up to nearly half a billion triples!!!
Let’s jump in.

Versions

DBpedia 3.7 + 3.7-i18n dataset. Virtuoso 6.1.5+ (Actually it’s a 6.1.6-dev version with some bugfixes for the DBpedia VAD files, as detailed below). Ubuntu 12.04 LTS.

Prerequesits

A strong machine with root access and enough RAM: We used a VM with 4 Cores and 32 GBs of RAM. For installing i recommend more than 128 GB free HD space, especially for downloading and repacking the datasets, as well as the growing database file when importing (mine grew to 45 GBs).

Let’s go

Download and install virtuoso

Go and download virtuoso opensource:

Initially i started this guide with virtuoso-opensource-6.1.5 from http://sourceforge.net/projects/virtuoso/, but later on in the process i ran into problems with the DBpedia VAD file which is used for resolving and content negotiation of instances via http. If you only intend to use the sparql endpoint you can download that version, but if you want to be able to actually resolve the local versions of the http://dbpedia.org/resource/Kaiserslautern pages with content negotiation, you need a version with some bugfixes from github:
https://github.com/openlink/virtuoso-opensource/tree/674df8668d7dd3018b3a8a14c23702c583d64961. If at the time of you reading this 6.1.6 is officially released, I’d probably use the official release.

To get that version from github do the following:

cd ~
git clone git://github.com/openlink/virtuoso-opensource.git virtuoso-opensource
cd virtuoso-opensource
# the following command will actually set your working directory to the
# correct revision
git checkout 674df8668d7dd3018b3a8a14c23702c583d64961
./autogen.sh

(Skip the following if you got the version from github.)

If you downloaded one of the release .tar.gz files instead: Put the file in your home dir on the server, then do the following.

cd ~
tar -xvzf virtuoso-*
cd virtuoso-opensource-6.1.5 # or 6.1.6, depending what you got

Alright, now no matter how you got that virtuoso version do the following to install the prerequisites and then build virtuoso:

sudo aptitude install libxml2-dev libssl-dev autoconf libgraphviz-dev
     libmagickcore-dev libmagickwand-dev dnsutils gawk bison flex gperf
export CFLAGS="-O2 -m64"
./configure --with-layout=debian
# NOTICE: this will _not_ install into /usr/local but into /usr
# (so might clash with packages by your distribution if you install
# "the" virtuoso package)
# You'll find the db in /var/lib/virtuoso/db !
# check output for errors and FIX THEM! (e.g., install missing packages)
make -j5

This will take about 1 hour. In parallel, you might want to start with downloading the DBpedia files (next section) and come back.

sudo make install

Now change the following values in /var/lib/virtuoso/db/virtuoso.ini, the performance tuning stuff is according to http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning:

# note: virtuoso ignores lines starting with whitespace
[Parameters]
# you need to include the directory where your datasets will be downloaded
# to, in our case /usr/local/data/datasets:
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
# IMPORTANT: for performance also do this
[Parameters]
# the following two are as suggested by comments in the original .ini
# file in order to use the RAM on your server:
NumberOfBuffers = 2720000
MaxDirtyBuffers = 2000000
# each buffer caches a 8K page of data and occupies approx. 8700 bytes of
# memory. It's suggested to set this value to 65 % of ram for a db only server
# so if you have 32 GB of ram: 32*1000^3*0.65/8700 = 2390804
# default is 2000 which will use 16 MB ram ;)
# Make sure to remove whitespace if you uncomment existing lines!
[Database]
MaxCheckpointRemap = 625000
# set this to 1/4th of NumberOfBuffers
[SPARQL]
# I like to increase the ResultSetMaxrows, MaxQueryCostEstimationTime
# and MaxQueryExecutionTime drastically as it's a local store where we
# do quite complex queries... up to you (don't do this if a lot of people
# use it).
# In any case for the importer to be more robust add the following setting
# to this section:
ShortenLongURIs = 1

The next step installs an init-script (autostarts) and starts the virtuoso server. (If you’ve changed directories to edit /var/lib/virtuoso/db/virtuoso.ini, go back to the virtuoso source dir!):

sudo cp debian/init.d /etc/init.d/virtuoso-opensource &&
sudo chmod a+x /etc/init.d/virtuoso-opensource &&
sudo bash debian/virtuoso-opensource.postinst.debhelper

Downloading the DBpedia dump files and a word about problems / inconsistencies in them

The DBpedia 3.7 is split into two separate datasets: one standard version and one i18n version. The standard version mints URIs by going through all English Wikipedia articles. For all of these the cross-language links are used to extract corresponding labels for the en URIs. This is problematic as for example articles which are only in the German Wikipedia won’t be extracted. To solve this problem the i18n version exists and creates IRIs in the form of de.dbpedia.org for every article in the German Wikipedia. There also are interlinking datasets providing owl:sameAs between the new URIs and the ones in corresponding other datasets. Note that the i18n IDs for concepts are IRIs, while the ones in the English Wikipedia are URIs. Also even though the i18n dataset includes all languages, only the Greek (el), German (de) and Russian (ru) Wikipedia have minted their own IRIs. The others are broken… they use URIs start with http://dbpedia.org but are linked to their corresponding language codes in the interlanguage links (e.g., the French interlanguage links falsely point to fr.dbpedia.org ). So it’s a mess! If you have a cleaned version of the datasets let us know or just wait for DBpedia 3.8 as we all do 😉

Besides that, the el, de and ru i18n files ending in .nt.gz are actually not valid NT files, because the IRIs are UTF-8 encoded. After finding this out I simply renamed all the German files to .n3.gz. and as n3 is a subset of turtle (TTL) and as virtuoso actually uses a TTL-parser (also for NT which is a subset of n3), I guess that renaming wasn’t all that important for Virtuoso. Still I had a bad feeling of having files with wrong endings flying around.

We have decided that we only needed the German and English files in (NT) format. If you need something different do so (and maybe report back if there were problems and how you solved them). If you decide to download the all-languages tar then make sure to exclude the NQ files from the later importing steps. One simple way to do this is to move everything you don’t want to import out of the directory. Also don’t forget to import the dbpedia_3.*.owl file (last step in the script below)!
Another hint: Virtuoso can only import plain (uncompressed) or gzipped files, the DBpedia dumps are bzipped, so you either repack them into gzip format or extract them. On our server the importing procedure was reasonably slower from extracted files than from gzipped ones (ignoring the vast amount of wasted disk space for the extracted files). File access becomes a bottleneck if you have 4 cores idling. This is why I decided on repacking all the files from bz2 to gz. As you can see I do the en and de repacking in parallel, if that’s not suitable for you, feel free to change it. You might also want to change this if you want to do it in parallel to downloading. The repackaging process below took about 1 hour but was worth it in the end. The more CPUs you have, the more you can parallelize this process.

sudo -i # get root
# see comment above, you could also get the all_language.tar or another DBpedia version...
mkdir -p /usr/local/data/datasets/dbpedia/3.7/3.7/en
cd /usr/local/data/datasets/dbpedia/3.7/3.7/en
wget -r -np -nd -nc -A'*.nt.bz2' http://downloads.dbpedia.org/3.7/en/

# if you want to save space do this:
for i in *.bz2 ; do bzcat $i | gzip > ${i%.bz2}.gz && rm $i ; done &
# else do:
#bunzip2 *&

cd ..
wget http://downloads.dbpedia.org/3.7/dbpedia_3.7.owl

mkdir ../3.7-i18n/de && cd ../3.7-i18n/de
wget -r -np -nd -nc -A'*.nt.bz2' http://downloads.dbpedia.org/3.7-i18n/de/
# if you want to save space do this:
for i in *.nt.bz2 ; do bzcat $i | gzip > ${i%.nt.bz2}.n3.gz && rm $i ; done &
# else do:
#bunzip2 *

# notice that the extraction (and repacking) of *.bz2 takes quite a while (about 1 hour)
# gzipped data is reasonably packed, but still very fast to access (in contrast to bz2), so maybe this is the best choice.

Data Cleaning and The bulk loader scripts

In contrast to the previous version of this article the virtuoso import will take care of shortening too long IRIs itself. Also it seems the bulk loader script is included in the more recent Virtuoso versions, so as a reference only: see the old version for the cleaning script and VirtBulkRDFLoaderExampleDbpedia and
http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtBulkRDFLoaderScript
for info about the bulk loader scripts.

Importing DBpedia dumps into virtuoso

Now AFTER the re-/unpacking of the DBpedia dumps we will register all files in the dbpedia dir (recursively ld_dir_all) to be added to the dbpedia graph. As mentioned above: If you use this method make sure that only files reside in the given subtree that you really want to import.
If you only want one directory’s files to be added (non recursive) use ld_dir.
If you manually want to add some files, use ld_add.
See the VirtBulkRDFLoaderScript file for args to pass.

Be warned that it might be a bad idea to import the normal and i18n dataset into one graph if you didn’t select specific languages, as it might introduce a lot of duplicates. In order to keep track what was selected and imported into which graph (see Note 2 below), we linked (ln -s) the files from the English (orig) and German (i18n) into a directory structure beneath /usr/local/data/datasets/dbpedia/3.7/importedGraphs/ and imported from there instead. To make sure you think about this, I use that path below, so it won’t work if you didn’t pay attention. If you really want, just import /usr/local/data/datasets/dbpedia/3.7/.

Note: in the following i will assume that your virtuoso isql command is called isql. If you’re in lack of such a command it might be called isql-vt.
Note2: in our case we actually decided not to import all the files into just one graph but instead used separated graphs for en and de as well as for the pagelinks, infoboxprops, extlinks and interlanguage_links dumps. Be warned though that only a certain amount of triples from the http://dbpedia.org graph will be shown in case you visit the local pages with your browser.

isql # enter virtuoso sql mode
-- we are in sql mode now
ld_dir_all('/usr/local/data/datasets/dbpedia/3.7/importedGraphs/dbpedia.org', '*.*', 'http://dbpedia.org');
-- do the following to see which files were registered to be added:
SELECT * FROM DB.DBA.LOAD_LIST;
-- if unsatisfied use:
-- delete from DB.DBA.LOAD_LIST;
EXIT;

OK, now comes the fun (and long part: about 7 hours)… We registered the files to be added, now let’s finally start the process. Fire up screen (see comment) if you didn’t already.

sudo aptitude install screen
screen isql
rdf_loader_run();
-- DO NOT USE THE DB BESIDES THE FOLLOWING COMMANDS:
-- (I had some warnings about a possibly corrupt db in the log,
-- when I visited the virtuoso conducter during the first run...)
-- you can watch the progress from another isql session with:
-- select * from DB.DBA.LOAD_LIST;
-- if you need to stop the loading for any reason: rdf_load_stop ();
-- if you want to force stopping: rdf_load_stop(1);
checkpoint;
commit WORK;
checkpoint;
EXIT;

After this:
Take a look into var/lib/virtuoso/db/virtuoso.log file. Should you find any errors in there… FIX THEM! You might use the dump, but it’s incomplete then. Any error quits out of the loading of the corresponding file and continues with the next one, so you’re only using the part of that file up to the place where the error occurred. (Should you find errors you can’t fix in the way I did above, please leave a comment.)

Final polishing

You can & should now install the DBpedia and RDF Mappers packages from the Virtuoso Conductor.
http://your-server:8890

login: dba
pw: dba

Go to System Admin / Packages. Install the dbpedia and rdf_mappers packages (takes about 5 minutes).

Testing your local mirror

Go to the sparql-endpoint of your server http://your-server:8890/sparql (or in isql prefix with: SPARQL)

SELECT COUNT(*) WHERE { ?s ?p ?o } ;

This might take about 15 minutes and then returns 437,768,995. Subsequent queries are a lot faster (if you find another way (preferably automatic) to warm up the caches, please leave me a note).
I also like this query showing all the graphs and how many triples are in them:

SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;
g                                                          callret-1
LONG VARCHAR                                               LONG VARCHAR
___________________________________________________________

http://dbpedia.org                                         131477215
http://pagelinks.dbpedia.org                               118039661
http://rawinfoboxproperties.dbpedia.org                    83705116
http://pagelinks.de.dbpedia.org                            41397135
http://extlinks.dbpedia.org                                31354613
http://de.dbpedia.org                                      19748791
http://rawinfoboxproperties.de.dbpedia.org                 11076144
http://interlanguagelinks.de.dbpedia.org                   694064
http://www.openlinksw.com/schemas/RDF_Mapper_Ontology/1.0/ 256065
http://localhost:8890/DAV                                  4009
http://www.openlinksw.com/schemas/virtrdf#                 2066
http://OPEN.vocab.org/terms                                1480
http://purl.org/ontology/bibo/                             1226
http://purl.org/goodrelations/v1                           937
http://purl.org/dc/terms/                                  857
http://www.openlinksw.com/schemas/opengraph                804
http://www.openlinksw.com/schemas/googleplus               696
http://www.openlinksw.com/schemas/google-base              691
http://www.openlinksw.com/schemas/cv                       661
virtrdf-label                                              638
http://www.openlinksw.com/schemas/linkedin                 613
http://xmlns.com/foaf/0.1/                                 557
http://rdfs.org/sioc/ns#                                   553
http://www.openlinksw.com/schemas/evri                     482
http://www.openlinksw.com/schemas/crunchbase               426
http://bblfish.net/WORK/atom-owl/2006-06-06/               386
http://scot-project.org/scot/ns#                           332
http://www.openlinksw.com/schemas/zillow                   311
http://www.w3.org/2004/02/skos/core                        252
http://www.openlinksw.com/schemas/cnet                     225
http://www.openlinksw.com/schemas/tesco                    183
http://www.openlinksw.com/schemas/bestbuy                  172
http://www.w3.org/2002/07/owl#                             167
http://www.w3.org/2002/07/owl                              160
http://www.openlinksw.com/schemas/angel#                   144
http://www.openlinksw.com/schemas/amazon                   143
http://purl.org/dc/elements/1.1/                           139
http://www.w3.org/2007/05/powder-s#                        117
http://www.openlinksw.com/schemas/twitter                  103
http://www.openlinksw.com/schemas/stackoverflow#           102
http://www.openlinksw.com/schemas/klout                    90
http://www.w3.org/2000/01/rdf-schema#                      87
http://www.w3.org/1999/02/22-rdf-syntax-ns#                85
http://www.openlinksw.com/schemas/ebay                     79
http://www.openlinksw.com/schema/attribution#              68
http://www.openlinksw.com/schemas/nyt                      41
http://www.openlinksw.com/schemas/oplbase                  26
http://www.openlinksw.com/schemas/cert#                    23
http://www.openlinksw.com/schemas/dbpedia-spotlight#       21
http://www.openlinksw.com/schemas/money                    21
http://dbpedia.org/schema/property_rules#                  12
dbprdf-label                                               6

52 ROWS. -- 1711753 msec.

Congratulations, you just imported nearly half a billion triples.

Backing up this initial state

Now is a good moment to backup the whole db (takes about half an hour):

sudo -i
cd /
/etc/init.d/virtuoso-opensource stop &&
tar -cvf - /var/lib/virtuoso | gzip --fast > virtuoso-6.1.6-dev-DBDUMP-dbpedia-3.7-en_de-$(date '+%F').tar.gz &&
/etc/init.d/virtuoso-opensource start

Yay, done 😉
As always, feel free to leave comments to tell us about your problems or how happy you are :D.

Thanks

Many thanks to the DBpedia team for their endless efforts of providing us all with a great dataset. Also many thanks to the Virtuoso crew for releasing an opensource version of their DB; especially to Hugh Williams and Patrick van Kleef for helping me out with a couple of problems in the newer version.

Setting up a local DBpedia mirror with Virtuoso

Newer version available: Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuso 7.2.1 and Docker (optional)

So you’re the guy who is allowed to setup a local DBpedia mirror for your work group? OK, today is your lucky day and you’re in the right place. I hope you’ll be able to benefit from my hours of trials and errors 😉 Continue reading