Discovery Graphs – talk companion

This is the companion post for my talk on Discovery Graphs. You will find the links to resources I mention in the talk. I have also summarised the steps I took to create a Discovery Graph.

The talk is an introduction to graph databases via my experience of using a graph database in a discovery project. The challenge was to communicate all the information we found in a structured and visual way.

Dumpster fire database

Event listing: Meetup link

What is Graph Theory?

Graph Theory is part of Discrete Mathematics, widely applied in computer science to solve complex problems.

Futher Reading: Graph Theory – Wikipedia

Background on Graph Databases

Learning & Training Resources

The Neo4j Academy offers excellent free courses on graph databases, with videos, text guides, and hands-on exercises.

Books

learning neo4j book cover

Getting hands on

Neo4j offers multiple deployment options:

  • Neo4j Desktop – My choice for stability during the talk
  • Aura Cloud – Free-tier available for experimentation

Other graph databases worth exploring:

  • OrientDB – I have used for experimentation
  • Cosmos DB (Azure) – Supports a graph API (Tinkerpop-based), though visualization requires add-ons
  • Gephi – Specializes in graph visualization (Gephi Website)

Steps to Recreate the Demo

1. Abstract Graph Model

Start by iteratively designing the graph structure

  • Nodes = nouns (entities)
  • Edges = verbs (relationships)

Use an instance graph approach to refine and validate the model

2. Organising the Data

  • Used pandas for data transformation
  • Split data into separate files per entity type
  • Added unique identifiers and labels where required
  • Cleaned and deduplicated the dataset
  • Matched entities to reveal the connections
  • Export csv files for the nodes

3. Creating the Graph Edges

Generated datasets to represent relationships using unique identifiers, ensuring correct connection directionality. Export csv files for the edges

4. Automating Data Updates

Refactored the code for efficient reprocessing, allowing new discovery data to be easily included in an updated database.

5. Loading Data into Neo4j

  • Placed CSV files in Neo4j’s import directory (see screenshot and description below)
  • Verified load privileges (Cypher Documentation)
  • Created and saved Cypher scripts for easy re-execution when needed.
  • Ensured correct load order (nodes first, then edges).
Screenshot showing how to find Neo4j desktop import folder location

In Neo4j desktop I right clicked the 3 ellipsis next to the database within the Neo4j project and click Open folder, then import.

6. Navigating the Graph

Using the Neo4j UI, entities can be explored interactively. Cypher queries can be scripted to retrieve specific relationships efficiently. Cypher cheatsheet.

Additional Resources Neo4j resources

Final Thoughts

I found Graph Databases to be a great way to structure and organise data collected in a discovery project. The discovery project also provided an opportunity to get hands on with graph databases in a relatively low risk way, as this was not going to be a production system. I highly recommend experiementing with graph databases to make you think differently about data and see the possibilities first hand.

Let me know your thoughts!

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *