Setting up a local DBpedia 2014 mirror with Virtuoso 7.1.0

Newer version available: Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuso 7.2.1 and Docker (optional)

So you’re the guy who is allowed to setup a local DBpedia mirror or more generally a local Linked Data mirror for your work group? OK, today is your lucky day and you’re in the right place. I hope you’ll be able to benefit from my many hours of trials and errors. If anything goes wrong (or everything works fine), feel free to leave a comment below.

Versions of this guide

There are three older versions of this guide:

  • Oct. 2010: The first version focusing on DBpedia 3.5 – 3.6 and Virtuoso 6.1
  • May 2012: A bigger update to DBpedia 3.7 (new local language versions) and Virtuoso 6.1.5+ (with a lot of updates making pre-processing of the dumps easier)
  • Apr. 2014: Update to DBpedia 3.9 and Virtuoso 7

In this step by step guide I’ll tell you how to install a local Linked Data mirror of the DBpedia 2014, hosting a combination of the regular English and (exemplary) the i18n German datasets adding up to over half a billion triples. If this isn’t enough you can also follow the links to the Freebase, DBLP, Yago, Umbel and datasets / vocabularies adding up to over 3.5 billion triples.

Let’s jump in.

Used Versions

  • DBpedia 2014
  • Virtuoso OpenSource 7.1.0
  • Ubuntu 14.04 LTS


A strong machine with root access and enough RAM: We used a VM with 4 Cores and 32 GBs of RAM for DBpedia only. If you intend to also load Freebase and other datasets i recommend at least 64 GBs of RAM (we actually ended up using a 16 Core, 256 GB RAM Server). For installing i recommend more than 128 GB free HD space for DBpedia alone, 256 GB if you want to load Freebase as well, especially for downloading and repacking the datasets, as well as the growing database file when importing (mine grew to 50 GBs for DBpedia and 180 GB with Freebase).

Let’s go

Download and install virtuoso

Go and download virtuoso opensource: either from (make sure you get v7.1.0 as in this guide or a newer version).

Put the file in your home dir on the server, then extract it and switch to the directory:

cd ~
tar -xvzf virtuoso-7.1.0.tar.gz
cd virtuoso-opensource-7.1.0 # or newer, depending what you got

Now do the following to install the prerequisites and then build virtuoso:

sudo aptitude install libxml2-dev libssl-dev autoconf libgraphviz-dev \
     libmagickcore-dev libmagickwand-dev dnsutils gawk bison flex gperf

# NOTICE: the following will _not_ install into /usr/local but into /usr
# (so might clash with packages by your distribution if you install
# "the" virtuoso package)
# You'll find the db in /var/lib/virtuoso/db !
# check output for errors and FIX THEM! (e.g., install missing packages)
export CFLAGS="-O2 -m64"
./configure --with-layout=debian --enable-dbpedia-vad --enable-rdfmappers-vad 

# the following will build with 5 processes in parallel
# choose something like your server's #CPUs + 1
make -j5

This will take about 5 min

sudo make install

Now change the following values in /var/lib/virtuoso/db/virtuoso.ini, the performance tuning stuff is according to

# note: virtuoso ignores lines starting with whitespace and stuff after a ;
# you need to include the directory where your datasets will be downloaded
# to, in our case /usr/local/data/datasets:
DirsAllowed = ., /usr/share/virtuoso/vad, /usr/local/data/datasets
# IMPORTANT: for performance also do this
# the following two are as suggested by comments in the original .ini
# file in order to use the RAM on your server:
NumberOfBuffers = 2720000
MaxDirtyBuffers = 2000000
# each buffer caches a 8K page of data and occupies approx. 8700 bytes of
# memory. It's suggested to set this value to 65 % of ram for a db only server
# so if you have 32 GB of ram: 32*1000^3*0.65/8700 = 2390804
# default is 2000 which will use 16 MB ram ;)
# Make sure to remove whitespace if you uncomment existing lines!
MaxCheckpointRemap = 625000
# set this to 1/4th of NumberOfBuffers
# I like to increase the ResultSetMaxrows, MaxQueryCostEstimationTime
# and MaxQueryExecutionTime drastically as it's a local store where we
# do quite complex queries... up to you (don't do this if a lot of people
# use it).
# In any case for the importer to be more robust add the following setting
# to this section:
ShortenLongURIs = 1

The next step installs an init-script (autostart) and starts the virtuoso server. (If you’ve changed directories to edit /var/lib/virtuoso/db/virtuoso.ini, go back to the virtuoso source dir!):

sudo cp debian/init.d /etc/init.d/virtuoso-opensource &&
sudo chmod a+x /etc/init.d/virtuoso-opensource &&
sudo bash debian/virtuoso-opensource.postinst.debhelper

You should now have a running virtuoso server.

DBpedia URIs (en) vs. DBpedia IRIs (i18n)

The DBpedia 2014 consists of several datasets: one “standard” English version and several localized versions for other languages (i18n). The standard version mints URIs by going through all English Wikipedia articles. For all of these the Wikipedia cross-language links are used to extract corresponding labels in other languages for the en URIs (e.g., de/labels_en_uris_de.nt.bz2). This is problematic as for example articles which are only in the German Wikipedia won’t be extracted. To solve this problem the i18n versions exists and create IRIs in the form of for every article in the German Wikipedia (e.g., de/labels_de.nt.bz2).

This approach has several implications. For backwards compatibility reasons the standard DBpedia makes statements about URIs such as while the local chapters, like the German one, make statements about IRIs such asöder (note the ö). In other words and as written above: the standard DBpedia uses URIs to identify things, while the localized versions use IRIs. This also means thatöder shouldn’t work. That said, clicking the link will actually work as there is magic going on in your browser to give you what you probably meant. Using curl curl -i -L -H "Accept: application/rdf+xml"öder or SPARQLing the endpoint will nevertheless not be so nice/sloppy and can cause quite some headache: select * where { dbpedia:Gerhard_Schröder ?p ?o. } vs. select * where { <> ?p ?o. }. In order to mitigate this historic problem a bit DBpedia actually offers owl:sameAs links from IRIs to URIs: en/iri_same_as_uri_en which you should load, so you at least have a link to what you want if someone tries to get info about an IRI.

As the standard DBpedia provides labels, abstracts and a couple other things in several languages, there are two types of files in the localized DBpedia folders: There are triples directly associating the English URIs with for example the German labels (de/labels_en_uris_de) and there are the localized triple files which associate for example the DE IRIs with the German labels (de/labels_de).

Downloading the DBpedia dump files & Repacking

For our group we decided that we wanted a reasonably complete mirror of the standard DBpedia (EN) (have a look at datasets loaded into the public DBpedia SPARQL Endpoint), but also the i18n versions for the German DBpedia loaded in separate graphs, as well as each of their pagelink datasets in another separate graph. For this we download the corresponding files in (NT) format as follows. If you need something different do so (and maybe report back if there were problems and how you solved them).

Another hint: Virtuoso can only import plain (uncompressed) or gzipped files, the DBpedia dumps are bzipped, so you either repack them into gzip format or extract them. On our server the importing procedure was reasonably slower from extracted files than from gzipped ones (ignoring the vast amount of wasted disk space for the extracted files). File access becomes a bottleneck if you have a couple of cores idling. This is why I decided on repacking all the files from bz2 to gz. As you can see I do the repacking per folder in parallel, if that’s not suitable for you, feel free to change it. You might also want to change this if you want to do it in parallel to downloading. The repackaging process below took about 1 hour but was worth it in the end. The more CPUs you have, the more you can parallelize this process.

# see comment above, you could also get the all_language.tar or another DBpedia version...
mkdir -p /usr/local/data/datasets/dbpedia/2014
cd /usr/local/data/datasets/dbpedia/2014
wget -r -nc -nH --cut-dirs=1 -np -l1 -A '*.nt.bz2' -A '*.owl' -R '*unredirected*'{en/,de/,links/,dbpedia_2014.owl}

# if you want to save space do this:
for d in */ ; do for i in "${d%/}"/*.bz2 ; do bzcat "$i" | gzip > "${i%.bz2}.gz" && rm "$i" ; done & done
# else do:
#bunzip2 */*.bz2 &

# notice that the extraction (and repacking) of *.bz2 takes quite a while (about 1 hour)
# gzipped data is reasonably packed, but still very fast to access (in contrast to bz2), so maybe this is the best choice.

Data Cleaning and The bulk loader scripts

In contrast to the previous versions of this article the virtuoso import will take care of shortening too long IRIs itself. Also it seems the bulk loader script is included in the more recent Virtuoso versions, so as a reference only: see the old version for the cleaning script and VirtBulkRDFLoaderExampleDbpedia and
for info about the bulk loader scripts.

Importing DBpedia dumps into virtuoso

Now AFTER the re-/unpacking of the DBpedia dumps we will register all files in the dbpedia dir (recursively ld_dir_all) to be added to the dbpedia graph. If you use this method make sure that only files reside in the given subtree that you really want to import.
Also don’t forget to import the dbpedia_2014.owl file (first step in the script below)!
If you only want one directory’s files to be added (non recursive) use ld_dir('dir', '*.*', 'graph');.
If you manually want to add some files, use ld_add('file', 'graph');.
See the VirtBulkRDFLoaderScript file for details.

Be warned that it might be a bad idea to import the normal and i18n dataset into one graph if you didn’t select specific languages, as it might introduce a lot of duplicates.

In order to keep track (and easily reproduce) what was selected and imported into which graph, I actually link (ln -s) the repacked files into a directory structure beneath /usr/local/data/datasets/dbpedia/2014/importedGraphs/ and import from there instead. To make sure you think about this, I use that path below, so it won’t work if you didn’t pay attention. If you really want to import all downloaded files, just import /usr/local/data/datasets/dbpedia/2014/.

Also be aware of the fact that if you load certain parts of dumps in different graphs (such as I did with the pagelinks, as well as the i18n versions of the DE and FR datasets) that only triples from the graph will be shown when you visit the local pages with your browser (SPARQL is unaffected by this)!

So if you want to load the same datasets as loaded on the official endpoint (but restricted to the EN and DE ones ) the following should do the trick to link them up for the next steps:

cd /usr/local/data/datasets/dbpedia/2014/
mkdir importedGraphs
cd importedGraphs

# ln -s ../../dbpedia_2014.owl ./ # see below!
ln -s ../../links/* ./

ln -s ../../en/article_categories_en.nt.gz ./
ln -s ../../en/category_labels_en.nt.gz ./
ln -s ../../en/disambiguations_en.nt.gz ./
ln -s ../../en/external_links_en.nt.gz ./
ln -s ../../en/freebase_links_en.nt.gz ./
ln -s ../../en/geo_coordinates_en.nt.gz ./
ln -s ../../en/geonames_links_en_en.nt.gz ./
ln -s ../../en/homepages_en.nt.gz ./
ln -s ../../en/images_en.nt.gz ./
ln -s ../../en/infobox_properties_en.nt.gz ./
ln -s ../../en/infobox_property_definitions_en.nt.gz ./
ln -s ../../en/instance_types_en.nt.gz ./
ln -s ../../en/instance_types_heuristic_en.nt.gz ./
ln -s ../../en/interlanguage_links_chapters_en.nt.gz ./
ln -s ../../en/iri_same_as_uri_en.nt.gz ./
ln -s ../../en/labels_en.nt.gz ./
ln -s ../../en/long_abstracts_en.nt.gz ./
ln -s ../../en/mappingbased_properties_cleaned_en.nt.gz ./
ln -s ../../en/page_ids_en.nt.gz ./
ln -s ../../en/persondata_en.nt.gz ./
ln -s ../../en/redirects_transitive_en.nt.gz ./
ln -s ../../en/revision_ids_en.nt.gz ./
ln -s ../../en/revision_uris_en.nt.gz ./
ln -s ../../en/short_abstracts_en.nt.gz ./
ln -s ../../en/skos_categories_en.nt.gz ./
ln -s ../../en/specific_mappingbased_properties_en.nt.gz ./
ln -s ../../en/wikipedia_links_en.nt.gz ./

ln -s ../../de/labels_en_uris_de.nt.gz ./
ln -s ../../de/long_abstracts_en_uris_de.nt.gz ./
ln -s ../../de/short_abstracts_en_uris_de.nt.gz ./

ln -s ../../fr/labels_en_uris_fr.nt.gz ./
ln -s ../../fr/long_abstracts_en_uris_fr.nt.gz ./
ln -s ../../fr/short_abstracts_en_uris_fr.nt.gz ./
cd ..

ln -s ../../en/genders_en.nt.gz ./
ln -s ../../en/out_degree_en.nt.gz ./
ln -s ../../en/page_length_en.nt.gz ./
cd ..

ln -s ../../en/page_links_en.nt.gz ./
cd ..

ln -s ../../en/topical_concepts_en.nt.gz ./
cd ..

ln -s ../../de/article_categories_de.nt.gz ./
ln -s ../../de/category_labels_de.nt.gz ./
ln -s ../../de/disambiguations_de.nt.gz ./
ln -s ../../de/external_links_de.nt.gz ./
ln -s ../../de/freebase_links_de.nt.gz ./
ln -s ../../de/geo_coordinates_de.nt.gz ./
ln -s ../../de/homepages_de.nt.gz ./
ln -s ../../de/images_de.nt.gz ./
ln -s ../../de/infobox_properties_de.nt.gz ./
ln -s ../../de/infobox_property_definitions_de.nt.gz ./
ln -s ../../de/instance_types_de.nt.gz ./
ln -s ../../de/interlanguage_links_chapters_de.nt.gz ./
ln -s ../../de/iri_same_as_uri_de.nt.gz ./
ln -s ../../de/labels_de.nt.gz ./
ln -s ../../de/long_abstracts_de.nt.gz ./
ln -s ../../de/mappingbased_properties_de.nt.gz ./
ln -s ../../de/out_degree_de.nt.gz ./
ln -s ../../de/page_ids_de.nt.gz ./
ln -s ../../de/page_length_de.nt.gz ./
ln -s ../../de/persondata_de.nt.gz ./
ln -s ../../de/pnd_de.nt.gz ./
ln -s ../../de/redirects_transitive_de.nt.gz ./
ln -s ../../de/revision_ids_de.nt.gz ./
ln -s ../../de/revision_uris_de.nt.gz ./
ln -s ../../de/short_abstracts_de.nt.gz ./
ln -s ../../de/skos_categories_de.nt.gz ./
ln -s ../../de/specific_mappingbased_properties_de.nt.gz ./
ln -s ../../de/wikipedia_links_de.nt.gz ./
cd ..

ln -s ../../de/page_links_de.nt.gz ./
cd ..

This should have prepared your importedGraphs directory. From this directory you can run the following command which print out the necessary isql commands to register your graphs for importing:

for g in * ; do echo "ld_dir_all('$(pwd)/$g', '*.*', 'http://$g');" ; done

One more thing (thanks to Romain): In order for the DBpedia.vad package (which is installed at the end) to work correctly, the dbpedia_2014.owl file needs to be imported into graph

Note: In the following i will assume that your virtuoso isql command is called isql. If you’re in lack of such a command it might be called isql-vt, but this usually means you installed it using some other method than described in here

isql # enter virtuoso sql mode
-- we are in sql mode now
ld_add('/usr/local/data/datasets/remote/dbpedia/2014/dbpedia_2014.owl', '');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/', '*.*', '');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/', '*.*', '');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/', '*.*', '');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/', '*.*', '');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/', '*.*', '');
ld_dir_all('/usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/', '*.*', '');

-- do the following to see which files were registered to be added:
select * from DB.DBA.LOAD_LIST;
-- if unsatisfied use:
-- delete from DB.DBA.LOAD_LIST;

You can now also register other datasets like Freebase, DBLP, Yago, Umbel and … that you want to be loaded. Our full DB.DBA.LOAD_LIST currently looks like this:

select ll_graph, ll_file from DB.DBA.LOAD_LIST;
ll_graph                             ll_file
VARCHAR                              VARCHAR NOT NULL
____________________________________                   /usr/local/data/datasets/remote/dblp/l3s/2014-11-08/dblp.nt.gz /usr/local/data/datasets/remote/dbpedia/2014/dbpedia_2014.owl                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/                /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/               /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/               /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/               /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/         /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/      /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/   /usr/local/data/datasets/remote/dbpedia/2014/importedGraphs/              /usr/local/data/datasets/remote/freebase/2014-11-02/freebase-rdf-2014-11-02-00-00.gz                    /usr/local/data/datasets/remote/           /usr/local/data/datasets/remote/umbel/External Ontologies/dbpediaOntology.n3           /usr/local/data/datasets/remote/umbel/External Ontologies/               /usr/local/data/datasets/remote/umbel/Ontology/umbel.n3           /usr/local/data/datasets/remote/umbel/Reference Structure/umbel_reference_concepts.n3  /usr/local/data/datasets/remote/yago/yago2/2012-12/yagoLabels.ttl.gz

114 Rows. -- 5 msec.

OK, now comes the fun (and long part: about 1.5 hours (new virtuoso 7 is cool 😉 for DBpedia alone, +~3 hours for Freebase)… After we registered the files to be added, now let’s finally start the process. Fire up screen if you didn’t already. (For more detailed metering than below see VirtTipsAndTricksGuideLDMeterUtility.)

sudo aptitude install screen
screen isql
-- depending on the amount of CPUs and your IO performance you can run
-- more rdf_loader_run(); commands in other isql sessions which will
-- speed up the import process.
-- you can watch the progress from another isql session with:
-- select * from DB.DBA.LOAD_LIST;
-- if you need to stop the loading for any reason: rdf_load_stop ();
-- if you want to force stopping: rdf_load_stop(1);
commit work;

After this:
Take a look into var/lib/virtuoso/db/virtuoso.log file. Should you find any errors in there… FIX THEM! You might use the dump, but it’s incomplete then. Any error quits out of the loading of the corresponding file and continues with the next one, so you’re only using the part of that file up to the place where the error occurred. (Should you find errors you can’t fix please leave a comment.)

Final polishing

You can & should now install the DBpedia and RDF Mappers packages from the Virtuoso Conductor.

login: dba
pw: dba

Go to System Admin / Packages. Install the dbpedia (v. 1.4.28) and rdf_mappers (v. 1.34.74) packages (takes about 5 minutes).

Testing your local mirror

Go to the sparql-endpoint of your server http://your-server:8890/sparql (or in isql prefix with: SPARQL)

sparql SELECT count(*) WHERE { ?s ?p ?o } ;

This shouldn’t take long in Virtuoso 7 anymore and for me now returns 695,553,624 for DBpedia (en+de), 3,543,872,243 with DBpedia (en+de), Freebase, DBLP, Yago, Umbel and

I also like this query showing all the graphs and how many triples are in them:

sparql SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2;
g                                                           callret-1
LONG VARCHAR                                                LONG VARCHAR
____________________________________________________________                                     2760013365                                          375176108                                149707899                                       92508750                                          72519345                             55804533                                      21900162                          15372307                                   403452  256065                          149638                         27063                                           8727
http://localhost:8890/DAV/                                  6187                  2639                                      1702                                 1480                              1226                            937                                   857                 804                  741                696               691                        661
virtrdf-label                                               638                                  557                                    553                      482                444                386                            332                    311                         252                      225                     183                   172                              160                               160                    144                    143                            139                         117                   103            102                     90                       87                 85                      79               68                       41             32                   26                     23                     21        21
http://localhost:8890/sparql                                14                   12
dbprdf-label                                                6

59 Rows. -- 61717 msec.

Congratulations, you just imported over half a billion triples (or over 3.5 G triples).

Backing up this initial state

Now is a good moment to backup the whole db (takes about half an hour):

sudo -i
cd /
/etc/init.d/virtuoso-opensource stop &&
tar -cvf - /var/lib/virtuoso | lzop > virtuoso-7.1.0-DBDUMP-$(date '+%F')-dbpedia-2014-en_de.tar.lzop &&
/etc/init.d/virtuoso-opensource start

Afterwards you might want to repack this with xz (lzma) like this:

# aptitude install xz
for f in virtuoso-7.1.0-DBDUMP-*.tar.lzop ; do lzop -d -c "$f" | xz > "${f%lzop}.xz" ; done

Yay, done 😉
As always, feel free to leave comments if i made a mistake or to tell us about your problems or how happy you are :D.

Our database dump file

In case you really want exactly the same state of the public datasets that we have loaded (as described above) you can download our database dump (57 GB, md5sum, including: DBpedia 2014 en,de,links,dbpedia_2014.owl, Freebase, DBLP, Yago, Umbel and


Many thanks to the DBpedia team for their endless efforts of providing us all with a great dataset. Also many thanks to the Virtuoso crew for releasing an opensource version of their DB.


  • 2014-11-11: Added link to our Dump-File
  • 2014-11-24: Thanks to Romain: Load dbpedia_2014.owl into graph for DBpedia.vad to find it when resolving http://your-server:8890/ontology/author for example.

24 thoughts on “Setting up a local DBpedia 2014 mirror with Virtuoso 7.1.0

  1. Romain Beaumont

    Thank you for these articles !
    One small thing I found changed between dbpedia 3.9 and dbpedia 2014 : the ontology needs to be registered under named graph instead of or the dbpedia vad won’t find url like http://server:8890/ontology/author (well it just displays the sameas property)
    So for example :
    ld_add(‘/home/dbpedia/dbpedia_2014/2014/dbpedia_2014.owl’, ‘’);

  2. Pingback: DBpedia 2014 Stats – Top Subjects, Predicates and Objects | Jörn's Blog

  3. Khaled Yacout

    Thanks for the post!
    Just wanted to note that, when you run this command to generate the make file:
    ./configure –with-layout=debian –enable-dbpedia-vad –enable-rdfmappers-vad
    an error message may appear “checking validity of the OpenSSL headers in /usr… configure: error: bad. Check config.log for details” To Resolve this, you have to install libssl-dev:
    apt-get install libssl-dev

    1. joern Post author

      It’s a replace all or nothing backup, so my process is:
      1. shut down the virtuoso process [cci]/etc/init.d/virtuoso-opensource stop[/cci] (or via [cci]isql[/cci] with the [cci]shutdown;[/cci] command).
      2. move the current db out of the way (usually it resides in [cci]/var/lib/virtuoso[/cci], so you could do [cci]mv /var/lib/virtuoso /var/lib/virtuoso.bak[/cci]
      3. extract the backup and move it back to [cci]/var/lib/virtuoso[/cci]
      4. start the virtuoso daemon again with [cci]/etc/init.d/virtuoso-opensource start[/cci]

  4. alexe

    thanks for the awesome guide. The whole process seems to have worked out. But now when i try for example:

    select ?subject ?predicate ?object
    where {
    dbpedia-owl:wikiPageRedirects* ?subject.
    ?subject ?predicate ?object.

    Then i receive the following error:
    SPARQL compiler, line 5: Undefined namespace prefix at ‘dbpedia-owl’ before ‘*’

    Romains example with http://server:8890/ontology/author also didn´t work out but returned 404 – File not found.
    So I think I am missing something here. Could anyone help out?
    Thanks in advance!

    1. joern Post author


      thanks. The first problem is indicates that the [cci]dbpedia-owl[/cci] isn’t defined on your server, which as far as i know is part of the dbpedia VAD file. If that isn’t installed properly it could also explain the second error… so maybe check that again?

      In the meantime you should be able to get the SPARQL query running by explicitly defining the prefix:
      PREFIX dbpedia-owl: <>
      select ?subject ?predicate ?object
      where {
      <> dbpedia-owl:wikiPageRedirects* ?subject.
      ?subject ?predicate ?object.

  5. Elias

    Thanks for great guide, but I have a problem with repacking bz2 to gz.
    After executing this
    for d in */ ; do for i in “${d%/}”/*.bz2 ; do bzcat “$i” | gzip > “${i%.bz2}.gz” && rm “$i” ; done & done
    i get immediately this
    [187] 99745
    [188] 99746
    [189] 99747

    Can you explain what’s wrong please?

    1. joern Post author

      nothing wrong, it’s just telling you the process ids of the background jobs… if you want to wait for them you can just execute “fg”

  6. Elod Barna Bodo

    Hi. Thank you for this great post!
    I am trying to install Virtuoso 7.2.1 and adding DBPedia data 3.9 (2015-04)
    Actually I got problems running these commands:
    sudo cp debian/init.d /etc/init.d/virtuoso-opensource &&
    sudo chmod a+x /etc/init.d/virtuoso-opensource &&
    sudo bash debian/virtuoso-opensource.postinst.debhelper

    1. There is no init.d filer in debian folder, there is only virtuoso-opensource-7.init, so I used that for the first command, I hope that is ok.

    2. There is no ‘virtuoso-opensource.postinst.debhelper’ in the debian folder, just ‘ virtuoso-opensource-7.postinst’, So i tried to use that, but than I got an error message:
    ‘Template parse error near `# These templates have been reviewed by the debian-l10n-english’, in stanza #1 of virtuoso-opensource-7.templates’

    Any ideas for a solution?
    Thank you!

    1. joern Post author

      the problem seems to be that the debian packages were updated in 7.2.0+ and don’t work like in 7.1.0 anymore.

      I plan to release another version of this guide soon, but till then i guess you might want to actually build the debian package from source and then install it… have a look at and

  7. Karwan JAcksi


    Many thanks for this great post!
    I was just asking are you going to post an updated version of this one soon that it could work with Virtuoso 7.2+ and maybe DBpedia 2015-04?

    Thanks again for this one.

  8. Karwan Jacksi

    Hello again,

    I could do it with BDpedia 2014 with no errors :), but I have a question, when I query the endpoint from isql (command line) it is working fine, but from browser (http://localhost:8890/sparql) then it fails with transaction timed our, any idea?

    Thanks again

    `Virtuoso S1T00 Error SR171: Transaction timed out`

    `SPARQL query:
    define sql:big-data-const 0
    define sql:signal-void-variables 1

    SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2`

    1. joern Post author

      I could imagine that this has to do with the default limits in your virtuoso.ini… if you query via isql for example the MaxQueryCostEstimationTime and MaxQueryExecutionTime limits in the [SPARQL] section don’t apply. For me the default setting work quite well, so maybe consider increasing the RAM or put your DB on an SSD as well.

      1. Karwan Jacksi

        Actually I thought about it, but my 24 gigs of RAM plus 28 gigs of SWAP and core i7 processor prevent me to change the ini file :D, since when I system monitored my machine only 15 gigs of ram were in use so I thought there is still so much space checking.
        Anyway, yes that was the reason, I changed the ini file to the following settings and now it’s working 🙂

        Many thanks.

        MaxQueryCostEstimationTime = 1200
        MaxQueryExecutionTime = 360

  9. Vijin K P

    Great guide for setting up local dbpedia mirror. hats off.
    Could you please tell me how to load partly loaded file again as it failed because of RAM issues.

    1. joern Post author

      thanks 😉

      Uhm, in theory you should be able to just re-add the file(s) and run the rdf_loader_run() again. I only did this for smaller files though. If bigger files fail, i usually restore a previous backup and re-run the importer, just to make sure nothing could end up in a bogus state.

      1. Vijin K P

        Thanks for the reply. In my case, all failed files are large. I will first try with the theory 😛

  10. Susmita

    Hi, thank you for this post. It’s a great guide for setting up local DBpedia. I have done it,but facing a problem.
    I have done -> ld_add(‘/usr/local/data/datasets/dbpedia/2014/dbpedia_2014.owl’, ‘’);
    For this query –> “sparql SELECT ?g COUNT(*) { GRAPH ?g {?s ?p ?o.} } GROUP BY ?g ORDER BY DESC 2; “, I am getting —>
    g callret-1
    _______________________________________________________________________________ 359784292 149707899 21896212 27063
    http://localhost:8890/DAV/ 3003 2475 160
    http://localhost:8890/sparql 14 3

    but, “http://my-server:8890/ontology/author” is not working and throwing error like –>
    Error HTTP/1.1 404 File not found
    The requested URL was not found URI = ‘/ontology/author’

    Could you please tell me the solution?

    Thanks again.

    1. joern Post author

      hmm, i’m actually not sure, but i guess the DBpedia VAD might have changed. I’ll have a look next time i update the guide.

  11. Pingback: Setting up a Linked Data mirror from RDF dumps (DBpedia 2015-04, Freebase, Wikidata, LinkedGeoData, …) with Virtuoso 7.2.1 and Docker (optional) | Jörn's Blog

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.