Skip to content

corese-stack/corese-benchmark

Repository files navigation

corese-benchmark

Description

This repository offers some tools and utilities to run a benchmark of a set of java-based triple stores, namely Corese, RDF4J and Jena. Its aim is to

  • compare Corese with the other triplestore
  • compare different versions of Corese

Its principles are

  • Focusing on performance measurements, such as loading time, memory used, query time, number of threads/CPU, etc.
  • Using native core java libraries instead of server versions of the triplestore. The code is written in Groovy, which is one of java's scripting language available.
  • Producing reusable CSV exports of the performance measurements that can be used in other contexts.
  • Building upon existing RDF or SPARQL benchmarks such as
    • Bowlogna SPARQL Benchmark
    • BSBM Berlin SPARQL Benchmark
    • DBPedia datasets
    • etc.

Links to dashboard

  • The minisite with dynamic versions of the plots is available at ...
  • You can also have a look at the image version of the plots in the dashboard folder. See below HOW TO run it section to run locally the benchmark and generate a new version of the plots.

Organisation of the repository

There are 2 main parts of the code:

  • The groovy/java code

    • is versionned in src folder
    • processes the input data using the 3 triplestores : loading, and querying (WIP)
    • saves the CSV containing the measurements in the out folder. Examples of previous run are already given.
  • workflow automation code written in Python, and versionned in python-utils folder. The main steps automatized are :

      1. creating input folder, downloading and saving input data in it
      1. launching the benchmark.groovy script
      1. launching the plot-compare.py script which saves the plot files in public folder
  • 2 versions of the workflow are available:

    • workflow.py to compare 3 given versions of the 3 triplestores
    • workflow-corese-versions.py to compare 2 or more given version of Corese.
  • The latest results that we version in this repo are visible in the dashboard folder. If you run it by yourself, updated plots will be saved in this folder.

HOW TO run it

Run the workflow.py automation script

conda activate benchmark_env
  • launch the script
(benchmark_env)cd python-utils
(benchmark_env)python workflow.py
# or 
(benchmark_env)python workflow-corese-versions.py

For the workflow-corese-versions.py:

  • modify as required the versions in the script file, modifying the following line
coreseVersions = ["4.0.1","4.6.3","local"]
  • if you want to test with a local version:
    • add "local" in the coreseVersions list
    • put the jar of the corese-core version in the 'libs' directory

Run the benchmark.groovy alone

  • 1st build the execution environement
./gradlew  clean build
  • then run it without forgetting to give the path to the input directory, the path to the output directory, and the list of triplestore names, eg:
./gradlew runGroovyScript --args="/path/to/directory /path/to/outdirectory rdf4j.5.1.2,jena.4.10.0,corese.4.6.3"

Run the plot-compare.py alone

Assuming the python environment benchmark_env has been actived:

(benchmark_env)cd python-utils
(benchmark_env)python plot-compare.py
# or optionaly indicating the folder to read the CSV files from
(benchmark_env)python plot-compare.py outputdirectory

It will loop throught the content of the given directory and plots the loading time and memory usage and generate

  • a png and html version of the plot
  • a index.html file to be used as the dashboard

Datasets

Bowlogna

  • Bowlogna becnhmark dataset (from this link)
  • Synthetic dataset built according a model describing relations between students, universities, and course programs.
  • It's made of 10 files, formally equivalent, but containing each different data. Each file loaded adds ~1.2 million triples
  • Total size ~12 Millions triples
  • Reference article : SIMPDA2011 paper

DBPedia sample

  • DBPedia is an RDF translation effort of Wikipedia
  • We sampled 10 files from the dump folder available online : https://downloads.dbpedia.org/3.5.1/en/
    • redirects_en.nt
    • disambiguations_en.nt
    • homepages_en.nt
    • geo_coordinates_en.nt
    • instance_types_en.nt
    • category_labels_en.nt
    • skos_categories_en.nt
    • images_en.nt
    • specific_mappingbased_properties_en.nt
    • persondata_en.nt
  • total size is ~20 Millions triples

Metrics for measuring and comparing triple stores performance

Typical Configuration used

  • inMemoryStore
  • inference level (check if levels are comparable) :
    • no inference
    • RDFS
  • format to be parsed :
    • nt
    • turtle
    • trig

About

Benchmark comparing Corese with 2 other open source triple stores

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages