Neo4j is one of the world’s leading graph database management systems, with support for the AWS, Azure, and Google Cloud platforms.
In this article, we’ll walk you through how to get setup in AWS. This includes:
- Hosting the Neo4j Community edition on EC2
- Using the Neo4j Python driver to execute transactions against the database
- Creating a Lambda function to access the database
Create the EC2 Instance
- Sign in to your AWS account
- Open the EC2 service
- Click on Instances, then Launch Instance
- Select the AWS Marketplace menu item, then search for Neo4j
- Select Neo4j Graph Database – Community Edition
- Select the EC2 Instance Type. Neo4j currently recommend m4.large or higher
- Click Next: Configure Instance Details
- Select a VPC and subnet to launch the EC2 instance into
- Click Next: Add Storage
- Select your Root volume size and attach any additional EBS volumes you need. (Neo4j stores its data on local volumes, so ensure you have enough space to store the required data)
- Select Review and Launch
Connect to the EC2 Instance
- Look up the public IP of the EC2 instance, then go to
https://MY_PUBLIC_IP:7473
- Enter neo4j as the username, and neo4j as the password
- After connecting the first time, it will prompt you to setup a new password. Once this is provided, click Change Password
- You should now be connected to the Neo4j database
Extract Data Using AWS Lambda
https://MY_PUBLIC_IP:7473
So, you’ve got a Neo4j database hosted on an EC2 machine, and can access the GUI via your web browser. But what if you want to run queries against the database programatically?
One way to achieve this is using Lambda, Python, and the Neo4j Python driver.
Creating the SSM Parameters
When starting a new Neo4j session using the Python driver, you need to specify the uri, username, and password.
The values for each of these items can be stored in the AWS SSM Parameter Store. Just remember to attach the necessary IAM policy to your Lambda function role (AmazonSSMReadOnlyAccess policy will work, as we only need to run SSM:GetParameter).
To create our SSM parameters:
- Open AWS Systems Manager, then click on Parameter Store
- Create your 3 parameters. uri and username can be String, with password being SecureString
We can now retrieve the values from SSM using our Lambda function, instead of having to hard code them.
Creating the Lambda Function
Our Lambda function will be written in Python, but Neo4j also have drivers to support a wide range of programming languages. The full list can be found here.
import logging import traceback import boto3 import os from neo4j import GraphDatabase ssm = boto3.client('ssm') logging.getLogger().setLevel(logging.INFO) def lambda_handler(event, context): logging.info("Running handler") # Connect to the Neo4j database and open a new session db_uri = ssm.get_parameter(Name='/Prod/Neo4j/uri') username = ssm.get_parameter(Name='/Prod/Neo4j/username') password = ssm.get_parameter(Name='/Prod/Neo4j/password', WithDecryption=True) uri = db_uri['Parameter']['Value'] username = username['Parameter']['Value'] password = password['Parameter']['Value'] session = connect_db(uri, username, password) # Read data from the database treatment_data = read_from_db(session) # Close our database session disconnect_db(session) return(treatment_data) def connect_db(uri, user, password): try: driver = GraphDatabase.driver(uri, auth=(user, password)) session = driver.session() except Exception as error: msg = "".join(traceback.format_tb(error.__traceback__)) logging.info( "error connecting to Neo4j database. %s:%s\n%s", type(error), error, msg, ) logging.info("Successfully connected to Neo4j database") return session def disconnect_db(session): logging.info("Closing Neo4j session") session.close() def read_from_db(session): result = session.read_transaction(data_to_read) return result def write_to_db(session): result = session.write_transaction(data_to_write) return result def data_to_read(tx): cypher_query = ''' CYPHER_QUERY ''' result = tx.run(cypher_query) result_list = [record["field_name"] for record in result] return result_list def data_to_write(tx): cypher_query = ''' CYPHER_QUERY ''' result = tx.run(cypher_query) result_list = [record["field_name"] for record in result] return result_list
You can also pass parameter values into your Cypher query.
Example:
If we wanted to pass in status and name as variables, we would use $status and $name in our Cypher query, then pass in the values using result = tx.run(cypher_query, {'status':'ACTIVE', 'name':'Untreated'})
.
Creating a Lambda Layer for Neo4j
When you’re writing your Python code in the inline code editor of Lambda, you’ll encounter issues if you try to access any library from the neo4j package. e.g. from neo4j import GraphDatabase
To resolve this, you’ll need to upload the Neo4j package files to a new Layer in Lambda.
Creating the Neo4j Lambda Layer:
- Open a local Terminal window (Mac) or CMD (Windows)
- Create a new folder called Neo4j
- Navigate to that folder, then run:
pip install neo4j -t .
- Now compress the contents of the Neo4j folder (not the directory itself) e.g. Neo4j.zip
- Open the AWS Lambda service via the AWS Console
- Select Layers, then Create Layer
- Upload your Neo4j.zip file
- Select the compatible runtimes. In our case, this all of the available Python versions
- Name your Layer e.g. Neo4j_v4_0_0, then click Create
With the Neo4j Lambda Layer created, we can now create our Neo4j Lambda function.
Creating the Neo4j Lambda Function:
- Open the AWS Lambda Service
- Select Functions, then Create Function
- Choose to Author from scratch and provide a name for your function
- Change the Runtime to one of the 3.x Python versions, then Create function
- From the Configuration tab, select Layers, then Add a layer
- From the Name dropdown, you should be able to select your Neo4j Layer. Version will be 1, if this is a new layer
- Click Add
- You should now be able to call
from neo4j import GraphDatabase
from your Lambda function without error
If you’re still seeing the error, another option is to package all your dependencies up in a zip file alongside your lambda function code, then upload this zip to Lambda.
Uploading Neo4j as a Lambda package:
- Open a local Terminal window (Mac) or CMD (Windows)
- Create a new folder called Neo4j
- Navigate to the new Neo4j folder, then run:
pip install neo4j -t .
- Add your lambda_function.py file inside the same folder
- Now compress the contents of the Neo4j folder (not the directory itself)
- Open the AWS Lambda service via the AWS Console
- Select Functions, then Create Function
- Name the function and select the compatible runtimes. In our case, this all of the available Python versions
- At the Function Code window, click the Actions dropdown and select Upload a .zip file
- Upload your zip file
- Save the function, then create and run a test event
- This should now be successful