Discovery Graphs - talk companion

This is the companion post for my talk on Discovery Graphs. You will find the links to resources I mention in the talk. I have also summarised the steps I took to create a Discovery Graph.

The talk is an introduction to graph databases via my experience of using a graph database in a discovery project. The challenge was to communicate all the information we found in a structured and visual way.

Event listing: Meetup link

What is Graph Theory?

Graph Theory is part of Discrete Mathematics, widely applied in computer science to solve complex problems.

Futher Reading: Graph Theory – Wikipedia

Background on Graph Databases

Learning & Training Resources

The Neo4j Academy offers excellent free courses on graph databases, with videos, text guides, and hands-on exercises.

Browse courses: Neo4j Graph Academy
Courses I mentioned in the talk
- Fundamentals of Graph Databases
- Graph Data Modelling Fundamentals

Books

Learning Neo4j – free from Neo4j in exchange for your email address – Download Here
Graph Powered Analytics and Machine Learning – A great resource for graph analytics

Getting hands on

Neo4j offers multiple deployment options:

Neo4j Desktop – My choice for stability during the talk
Aura Cloud – Free-tier available for experimentation

Other graph databases worth exploring:

OrientDB – I have used for experimentation
Cosmos DB (Azure) – Supports a graph API (Tinkerpop-based), though visualization requires add-ons
Gephi – Specializes in graph visualization (Gephi Website)

Steps to Recreate the Demo

1. Abstract Graph Model

Start by iteratively designing the graph structure

Nodes = nouns (entities)
Edges = verbs (relationships)

Use an instance graph approach to refine and validate the model

2. Organising the Data

Used pandas for data transformation
Split data into separate files per entity type
Added unique identifiers and labels where required
Cleaned and deduplicated the dataset
Matched entities to reveal the connections
Export csv files for the nodes

3. Creating the Graph Edges

Generated datasets to represent relationships using unique identifiers, ensuring correct connection directionality. Export csv files for the edges

4. Automating Data Updates

Refactored the code for efficient reprocessing, allowing new discovery data to be easily included in an updated database.

5. Loading Data into Neo4j

Placed CSV files in Neo4j’s import directory (see screenshot and description below)
Verified load privileges (Cypher Documentation)
Created and saved Cypher scripts for easy re-execution when needed.
Ensured correct load order (nodes first, then edges).

Screenshot showing how to find Neo4j desktop import folder location

In Neo4j desktop I right clicked the 3 ellipsis next to the database within the Neo4j project and click Open folder, then import.

6. Navigating the Graph

Using the Neo4j UI, entities can be explored interactively. Cypher queries can be scripted to retrieve specific relationships efficiently. Cypher cheatsheet.

Additional Resources Neo4j resources

Final Thoughts

I found Graph Databases to be a great way to structure and organise data collected in a discovery project. The discovery project also provided an opportunity to get hands on with graph databases in a relatively low risk way, as this was not going to be a production system. I highly recommend experiementing with graph databases to make you think differently about data and see the possibilities first hand.

Let me know your thoughts!

Discovery Graphs – talk companion

What is Graph Theory?

Learning & Training Resources

Books

Getting hands on

Steps to Recreate the Demo

Final Thoughts

Leave a Reply Cancel reply

Discovery Graphs – talk companion

What is Graph Theory?

Learning & Training Resources

Books

Getting hands on

Steps to Recreate the Demo

Final Thoughts

Related Posts

Databricks Certified Data Engineer Professional – Comprehensive Resource Guide

Databricks Certified Data Engineer Associate – Comprehensive Resource Guide

Leave a Reply Cancel reply