This is the companion post for my talk on Discovery Graphs. You will find the links to resources I mention in the talk. I have also summarised the steps I took to create a Discovery Graph.
The talk is an introduction to graph databases via my experience of using a graph database in a discovery project. The challenge was to communicate all the information we found in a structured and visual way.

Event listing: Meetup link

What is Graph Theory?
Graph Theory is part of Discrete Mathematics, widely applied in computer science to solve complex problems.
Futher Reading: Graph Theory – Wikipedia
Background on Graph Databases
Learning & Training Resources
The Neo4j Academy offers excellent free courses on graph databases, with videos, text guides, and hands-on exercises.
- Browse courses: Neo4j Graph Academy
- Courses I mentioned in the talk
Books
- Learning Neo4j – free from Neo4j in exchange for your email address – Download Here
- Graph Powered Analytics and Machine Learning – A great resource for graph analytics

Getting hands on
Neo4j offers multiple deployment options:
- Neo4j Desktop – My choice for stability during the talk
- Aura Cloud – Free-tier available for experimentation

Other graph databases worth exploring:
- OrientDB – I have used for experimentation
- Cosmos DB (Azure) – Supports a graph API (Tinkerpop-based), though visualization requires add-ons
- Gephi – Specializes in graph visualization (Gephi Website)


Steps to Recreate the Demo
1. Abstract Graph Model
Start by iteratively designing the graph structure
- Nodes = nouns (entities)
- Edges = verbs (relationships)
Use an instance graph approach to refine and validate the model
2. Organising the Data
- Used pandas for data transformation
- Split data into separate files per entity type
- Added unique identifiers and labels where required
- Cleaned and deduplicated the dataset
- Matched entities to reveal the connections
- Export csv files for the nodes
3. Creating the Graph Edges
Generated datasets to represent relationships using unique identifiers, ensuring correct connection directionality. Export csv files for the edges
4. Automating Data Updates
Refactored the code for efficient reprocessing, allowing new discovery data to be easily included in an updated database.
5. Loading Data into Neo4j
- Placed CSV files in Neo4j’s import directory (see screenshot and description below)
- Verified load privileges (Cypher Documentation)
- Created and saved Cypher scripts for easy re-execution when needed.
- Ensured correct load order (nodes first, then edges).

In Neo4j desktop I right clicked the 3 ellipsis next to the database within the Neo4j project and click Open folder, then import.
6. Navigating the Graph
Using the Neo4j UI, entities can be explored interactively. Cypher queries can be scripted to retrieve specific relationships efficiently. Cypher cheatsheet.
Additional Resources Neo4j resources
Final Thoughts
I found Graph Databases to be a great way to structure and organise data collected in a discovery project. The discovery project also provided an opportunity to get hands on with graph databases in a relatively low risk way, as this was not going to be a production system. I highly recommend experiementing with graph databases to make you think differently about data and see the possibilities first hand.
Let me know your thoughts!