This is an English translation of a Japanese blog. Some content may not be fully translated.
AWS

Exporting RDF Data from Amazon Neptune

Introduction

Let’s try exporting RDF data to Turtle using the Neptune tool available in awslabs.

amazon-neptune-tools/neptune-export at master · awslabs/amazon-neptune-tools https://github.com/awslabs/amazon-neptune-tools/tree/master/neptune-export

Please refer to readme.md for detailed usage instructions.

amazon-neptune-tools/readme.md at master · awslabs/amazon-neptune-tools https://github.com/awslabs/amazon-neptune-tools/blob/master/neptune-export/readme.md

awslabs/amazon-neptune-tools https://github.com/awslabs/amazon-neptune-tools/blob/master/neptune-export/docs/export-rdf.md

Notes

Regarding Exporting an RDF Graph, there is a note: At present neptune-export supports exporting an RDF dataset to Turtle with a single-threaded long-running query. Depending on the data volume, the export may be long-running due to single-threaded operation. You should be mindful of execution time and load on the target instance. It may be worth considering options such as creating a clone and using a separate instance, or using the Read Replica side.

Environment Check

Let’s try it out. Data is minimized for verification. The loaded data is as follows:

SELECT *
WHERE {
  ?s ?p ?o .
}
LIMIT 100
OFFSET 0

Execution

Create the output directory:

mkdir -p /home/ec2-user/output

Download the neptune-export tool from awslabs:

#sudo yum -y install git
git clone https://github.com/awslabs/amazon-neptune-tools.git

Install Maven:

sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo sed -i s/\$releasever/6/g /etc/yum.repos.d/epel-apache-maven.repo
sudo yum install -y apache-maven
mvn --version

Build the jar file:

cd /home/ec2-user/amazon-neptune-tools/neptune-export
mvn clean install

The neptune-export.jar is built in the target directory:

[ec2-user@bastin neptune-export]$ ls -l target
total 62712
drwxrwxr-x 4 ec2-user ec2-user       28 Feb 24 05:13 classes
drwxrwxr-x 3 ec2-user ec2-user       25 Feb 24 05:13 generated-sources
drwxrwxr-x 3 ec2-user ec2-user       30 Feb 24 05:13 generated-test-sources
drwxrwxr-x 2 ec2-user ec2-user       28 Feb 24 05:13 maven-archiver
drwxrwxr-x 3 ec2-user ec2-user       35 Feb 24 05:13 maven-status
-rw-rw-r-- 1 ec2-user ec2-user   202719 Feb 24 05:13 neptune-export-1.0-SNAPSHOT.jar
-rw-rw-r-- 1 ec2-user ec2-user 64006996 Feb 24 05:14 neptune-export.jar
drwxrwxr-x 2 ec2-user ec2-user     4096 Feb 24 05:13 surefire-reports
drwxrwxr-x 3 ec2-user ec2-user       17 Feb 24 05:13 test-classes
[ec2-user@bastin neptune-export]$

Run neptune-export.sh

cd /home/ec2-user/amazon-neptune-tools/neptune-export
sh ./bin/neptune-export.sh export-rdf -e neptestdb.xxxxxxxxx.ap-northeast-1.neptune.amazonaws.com -d /home/ec2-user/output

Note: The command must be run from neptune-export, one level above bin, not from within bin. This is because the script searches for neptune-export.jar and stores it in a variable.

jar=$(find . -name neptune-export.jar)
java -jar ${jar} "$@"

Note: When specifying the Neptune instance name, do NOT include “https”. This will cause an error.

Completed export-rdf in 0 seconds
An error occurred while exporting from Neptune:
java.lang.RuntimeException: org.eclipse.rdf4j.query.QueryEvaluationException: https: Name or service not known
	at com.amazonaws.services.neptune.rdf.NeptuneSparqlClient.executeQuery(NeptuneSparqlClient.java:166)
	at com.amazonaws.services.neptune.rdf.io.ExportRdfGraphJob.execute(ExportRdfGraphJob.java:31)

After execution, a ttl file is output in the output directory. The triples match correctly.

cd /home/ec2-user/output/1584768727668/statements
[ec2-user@bastin statements]$ cat statements-0.ttl

<http://aws.amazon.com/neptune/a> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Satomi_Ishihara> .
<http://aws.amazon.com/neptune/b> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Haruka_Ayase> .
<http://aws.amazon.com/neptune/c> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Honda_Tsubasa> .
<http://aws.amazon.com/neptune/d> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Haruka_Ayase> .

The available output formats are “turtle (default)”, “nquads”, and “json (neptuneStreamsJson)”.

For json (neptuneStreamsJson), the result looked like this:

[ec2-user@bastin neptune-export]$ sh ./bin/neptune-export.sh export-rdf --format neptuneStreamsJson -e neptestdb.xxxxxxxxx.ap-northeast-1.neptune.amazonaws.com -d /home/ec2-user/output
Creating statement files

/home/ec2-user/output/1584769323164

[ec2-user@bastin statements]$ cat statements-0.json | jq
{
  "eventId": {
    "commitNum": -1,
    "opNum": 0
  },
  "data": {
    "stmt": "<http://aws.amazon.com/neptune/a> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Satomi_Ishihara> ."
  },
  "op": "ADD"
}
{
  "eventId": {
    "commitNum": -1,
    "opNum": 0
  },
  "data": {
    "stmt": "<http://aws.amazon.com/neptune/b> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Haruka_Ayase> ."
  },
  "op": "ADD"
}
{
  "eventId": {
    "commitNum": -1,
    "opNum": 0
  },
  "data": {
    "stmt": "<http://aws.amazon.com/neptune/c> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Honda_Tsubasa> ."
  },
  "op": "ADD"
}
{
  "eventId": {
    "commitNum": -1,
    "opNum": 0
  },
  "data": {
    "stmt": "<http://aws.amazon.com/neptune/d> <http://xmlns.com/foaf/0.1/interest> <http://dbpedia.org/resource/Haruka_Ayase> ."
  },
  "op": "ADD"
}

Side Note

I initially encountered an error and couldn’t complete successfully, but after posting on Stack Overflow, someone fixed it. I’m grateful.

amazon web services - regarding about export of neptune data - Stack Overflow https://stackoverflow.com/questions/60429428/regarding-about-export-of-neptune-data

Suggest an edit on GitHub