Graphs Data Structure Guide

What is a Graph?

A graph is an ordered set of paired vertices and a set of edges, used to create an interconnected network.

Edges can be directed or undirected. Weight can be applied to an edge to represent relative cost of moving between vertices (nodes).

This guide walks you through everything you need to know about the graph data structure, with clear, commented code samples.

Pros and cons of graphs
Types of graphs
Graph data structures
Big O space and time complexity
Graph search: BFS and DFS
How to code a Graph in Python
Coding Breadth First Search in Python
Coding Depth First Search in Python

Pros and Cons of Graphs

Pros:

Great for representing links and relationships. e.g. Facebook friendships or roads connecting cities.

Cons:

Most graph algorithms scale poorly. Big O time complexity of O(n log n).

Types of Graphs

Directed: One-way arrow connecting the nodes
Undirected: Two-way arrow connecting the nodes, or no arrows
Connected: Path exists between every pair of vertices
Cyclic: Graph contains a cycle
Acyclic: Graph without cycles
Weighted: Each edge has a weight
Unweighted: No weights have been applied to the edges

Graph Data Structures

There are three main graph data structures:

Edge Lists
Adjacency Matrices
Adjacency Lists

Edge Lists

Simple to implement, but slow to search for an edge. O(E) time.
Take every edge in a graph and represent in array
e.g. [0,1] for the edge from vertices 0 to vertices 1

Sample edge list in Python:

 
graph = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [4, 6], [5, 1], [5, 2]]

Adjacency Matrices

An NxN boolean matrix, where N is number of nodes
Edges are represented by ‘true’ value
If graph is undirected, matrix will be symmetric
Fast O(1) lookup, but not space efficient
Only really useful when almost all nodes connected to all nodes
Represented in a matrices

Sample adjacency matrix (directed) in Python:

 
  graph = [
    [0, 1, 0, 0, 1, 0],
    [0, 0, 1, 0, 1, 0],
    [0, 0, 0, 1, 0, 0],
    [0, 0, 0, 0, 1, 1],
    [0, 0, 0, 0, 0, 0],
    [0, 0, 0, 0, 0, 0]
]

Sample adjacency matrix (undirected) in Python:

 
  graph = [
    [0, 1, 0, 0, 1, 0],
    [1, 0, 1, 0, 1, 0],
    [0, 1, 0, 1, 0, 0],
    [0, 0, 1, 0, 1, 1],
    [1, 1, 0, 1, 0, 0],
    [0, 0, 0, 1, 0, 0]
]

Adjacency List

An adjacency list is the most common way to represent a graph. A hash table data structure is used, whereby each key is a vertex, and the value is the vertices it’s connected to (usually in the form of a list).

Properties of an adjacency list:

Fast lookup, easy to find neighbours
Each vertex (node) stores a list of adjacent vertices
Can also be represented using two classes: Node and Graph
e.g. Vertex 0 has edges pointing to 1 and 4, so, 0 -> [1, 4]

Sample adjacency list in Python:

 
  graph = {
    1: [2, 5],
    2: [1, 5],
    3: [2, 4],
    4: [3, 5, 6],
    5: [1, 2, 4, 6],
    6: [4, 5],
}

Big O Space and Time Complexity

In the majority of cases, the best option will be to use an adjacency list. The main advantage being the efficiency of the O(V) lookup for a vertex. They’re also incredibly space efficient.

Graph Structure	Storage	Add Vertex	Add Edge	Remove Vertex	Remove Edge	Query
Adjacency List	O(V + E)	O(1)	O(1)	O(V + E)	O(E)	O(V)
Edge List	O(V + E)	O(1)	O(1)	O(E)	O(E)	O(E)
Adjacency Matrix	O(V²)	O(V²)	O(1)	O(V²)	O(1)	O(1)

* V = No. of vertices. E = No. of edges.

Graph Search: BFS and DFS

Two most common ways to search a graph are breadth-first search (BFS) and depth-first search (DFS).

This section will cover how to implement the following in Python:

A graph
Breadth First Search
Depth First Search

Which is better, DFS or BFS?

Depth First:

Simple to implement than BFS when using recursion
Better when the item searched for is faraway
e.g. Simulating chess to end of game combinations

Breadth First:

BFS always finds the shortest path, given an undirected and unweighted graph
Often requires more memory than a DFS, due to storing all neighbours
Better when the item searched for is near the top
e.g. Social networks

Depth-first Search

From a start node (often the root), visit the first child (node one step away). Now visit the first child of this child (node two steps away from start node). Continue until you reach the end of this path. Go back to the start node, and repeat the process for each of its other child nodes until all nodes are visited.

Steps to implement in code:

Use a stack (LIFO)
For the first node, add it to stack, then immediately pop it off
Add first node’s direct neighbours to the stack, in any order
Mark them as ‘visited’ in the graph
Pop the first of these neighbours off the stack
Add its unvisited neighbours to the stack, and mark them as ‘visited’
Pop one of these neighbours off the stack, and mark it as visited
Repeat until you reach a node with no unvisted nodes connected to current node
Pop remaining nodes off the stack, adding and popping their neighbours as you go

Breadth-first Search

From a start node (often the root), visit all the immediate children (nodes one step away). Now visit the children of the children (nodes two steps away from start node). Continue until you reach the end of the graph (visited all nodes).

The time complexity of BFS is O(V + E), where V is the number of nodes and E is the number of edges.

Steps to implement in code:

Use a queue (FIFO)
For the first node, add it to queue, then immediately pop it off
Add first node’s direct neighbours to the queue, in any order
Mark them as ‘visited’ in the graph
Pop the first of these neighbours off the queue
Add its unvisited neighbours to the queue, and mark them as ‘visited’
When hit a node in the queue that has all its neighbours marked as ‘visited’, pop the next one off the queue
Repeat until all nodes popped off the queue and marked as ‘visited’

Useful resources:

MIT Algorithms and Data Structure Course

Bidirectional Search

Bidirectional search can be used to find the shortest path between a source node and destination node.

This process requires two simultaneous breadth-first searches, one using the source node as its starting point, and the other using the destination node. When their searches collide, this is the path.

Coding a Graph in Python

Before we can traverse a graph, we first need to create it. The following code allows you to create a graph that follows the adjacency list structure.

Basic operations are:

add_vertex: Creates a new vertex.
add_edge: Creates an edge between two vertices.
show_vertices: Displays a list of all vertices in the graph.
show_edges: Displays a list of tuples representing edges between vertices.
find_path: Finds a path between a start vertex and end vertex.

 
class Graph:

    def __init__(self, graph_dict = {}):
        """Initializes a graph object.
        
        If no graph exists, a new, empty dictionary will be used.
        
        Attributes:
            graph_dict: A Python dictionary (hash table) to hold new items.
        """
        self.graph_dict = graph_dict

    def add_vertex(self, vertex):
        """Adds a new vertex to the graph, if it doesn't already exist.
        
        A new vertex is added to the graph, with no connected edges.
        If the vertex already exists, raises error.
        
        Args:
            vertex: The new vertex being added to the graph.
            
        """
        if vertex not in self.graph_dict:
            self.graph_dict[vertex] = []
        else:
            raise ValueError("Vertex already exists in graph.")
               
    def add_edge(self, start, end): 
        """Adds a new edge to the graph, if it doesn't already exist.
        
        If the start node doesn't already exist, it will be created.
        If the end node doesn't already exist, it will be created.
        If the edge doesn't exist, it will be created.
        
        Args:
            start: The start vertex for the edge.
            end: The end vertex for the edge.
            
        """
        if end not in self.graph_dict:
            self.add_vertex(end)
            
        if start not in self.graph_dict:
            self.graph_dict[start] = [end]
        else:
            self.graph_dict[start].append(end)

    def show_vertices(self):
        """Get a list of vertices in the graph.
        
        Returns:
            Python list of all vertices in the graph.
        
        """
        return list(self.graph_dict.keys())                                

    def show_edges(self):
        """Display each graph edge as a tuple. (node, neighbour)."""
        for node in self.graph_dict:
            for neighbour in self.graph_dict[node]:
                print("(",node,", ",neighbour,")")

    def find_path(self, start, end, path = []):
        """Search for a path between two vertices in the graph.
        
        Args:
            start: Starting vertex.
            end: End vertex.
            path: Python list to store the sequence of vertices in a path
                  between the start vertex and end vertex.
                
        Returns:
            A list containing the sequence of vertices from start vertex to
            end vertex, if a path exists.
            
        """
        path = path + [start]   
        if start == end:
            return path

        for node in self.graph_dict[start]:
            if node not in path:
                newPath = self.find_path(node, end, path)

            if newPath:
                return newPath

            return None

Coding Breadth First Search in Python

For the following implementation of BFS, we will be using an adjacency list to represent the graph.

 
# Create an adjacency list as our graph structure
graph = {1: [2, 3],
         2: [4, 5],
         3: [5],
         4: [6],
         5: [6],
         6: [7],
         7: []}

def bfs(graph, start):
    """Performs a Breadth First Search (BFS).
    
    Adds starting vertex to queue, then immediately pops it off.

    Loops through each neighbour of each vertex, adding them to the
    set of visited vertices, before moving on to the next vertex.

    Next vertex to check is always taken from the front of the queue.

    Args:
        graph: The graph.
        start: The starting vertex for the search.

    Returns:
        Python list of vertices in the order they were traversed.
    
    """
    visited = set()
    queue = [start]   

    visited.add(start)

    while queue:
        vertex = queue.pop(0)
        
        for neighbour in graph[vertex]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(neighbour)

    return visited

bfs(graph, 1) # {1, 2, 3, 4, 5, 6, 7}

Coding Depth First Search in Python

For the iterative implementation of DFS found below, we will be using an adjacency list to represent the graph.

 
# Create an adjacency list as our graph structure
graph = {1: [2, 3],
         2: [4, 5],
         3: [5],
         4: [6],
         5: [6],
         6: [7],
         7: []}

def dfs(graph, start):
    """Performs a Depth First Search (DFS).

    Adds starting vertex to stack, then immediately pops it off.

    Loops through each neighbour of a vertex, adding them to the
    stack. Once all neighbours are on the stack, they are popped off
    and added to the path. Repeats for each vertex.

    Next vertex to check is always taken from the top of the stack.

    Args:
        graph: The graph.
        start: The starting vertex for the search.

    Returns:
        Python list of vertices in the order they were traversed.
    
    """
    path = []
    stack = [start]

    while stack:
        vertex = stack.pop()
        
        if vertex in path:
            continue

        path.append(vertex)

        for neighbor in graph[vertex]:
            stack.append(neighbor) 

    return path

dfs(graph, 1) # [1, 3, 5, 6, 7, 2, 4]