What is a Graph?
A graph is an ordered set of paired vertices and a set of edges, used to create an interconnected network.
Edges can be directed or undirected. Weight can be applied to an edge to represent relative cost of moving between vertices (nodes).
This guide walks you through everything you need to know about the graph data structure, with clear, commented code samples.
- Pros and cons of graphs
- Types of graphs
- Graph data structures
- Big O space and time complexity
- Graph search: BFS and DFS
- How to code a Graph in Python
- Coding Breadth First Search in Python
- Coding Depth First Search in Python
Pros and Cons of Graphs
Pros:
- Great for representing links and relationships. e.g. Facebook friendships or roads connecting cities.
Cons:
- Most graph algorithms scale poorly. Big O time complexity of O(n log n).
Types of Graphs
- Directed: One-way arrow connecting the nodes
- Undirected: Two-way arrow connecting the nodes, or no arrows
- Connected: Path exists between every pair of vertices
- Cyclic: Graph contains a cycle
- Acyclic: Graph without cycles
- Weighted: Each edge has a weight
- Unweighted: No weights have been applied to the edges
Graph Data Structures
There are three main graph data structures:
- Edge Lists
- Adjacency Matrices
- Adjacency Lists
Edge Lists
- Simple to implement, but slow to search for an edge. O(E) time.
- Take every edge in a graph and represent in array
- e.g. [0,1] for the edge from vertices 0 to vertices 1
Sample edge list in Python:
graph = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6], [4, 6], [5, 1], [5, 2]]
Adjacency Matrices
- An NxN boolean matrix, where N is number of nodes
- Edges are represented by ‘true’ value
- If graph is undirected, matrix will be symmetric
- Fast O(1) lookup, but not space efficient
- Only really useful when almost all nodes connected to all nodes
- Represented in a matrices
Sample adjacency matrix (directed) in Python:
graph = [ [0, 1, 0, 0, 1, 0], [0, 0, 1, 0, 1, 0], [0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 1, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0] ]
Sample adjacency matrix (undirected) in Python:
graph = [ [0, 1, 0, 0, 1, 0], [1, 0, 1, 0, 1, 0], [0, 1, 0, 1, 0, 0], [0, 0, 1, 0, 1, 1], [1, 1, 0, 1, 0, 0], [0, 0, 0, 1, 0, 0] ]
Adjacency List
An adjacency list is the most common way to represent a graph. A hash table data structure is used, whereby each key is a vertex, and the value is the vertices it’s connected to (usually in the form of a list).
Properties of an adjacency list:
- Fast lookup, easy to find neighbours
- Each vertex (node) stores a list of adjacent vertices
- Can also be represented using two classes:
Node
andGraph
- e.g. Vertex 0 has edges pointing to 1 and 4, so, 0 -> [1, 4]
Sample adjacency list in Python:
graph = { 1: [2, 5], 2: [1, 5], 3: [2, 4], 4: [3, 5, 6], 5: [1, 2, 4, 6], 6: [4, 5], }
Big O Space and Time Complexity
In the majority of cases, the best option will be to use an adjacency list. The main advantage being the efficiency of the O(V) lookup for a vertex. They’re also incredibly space efficient.
Graph Structure | Storage | Add Vertex | Add Edge | Remove Vertex | Remove Edge | Query |
---|---|---|---|---|---|---|
Adjacency List | O(V + E) | O(1) | O(1) | O(V + E) | O(E) | O(V) |
Edge List | O(V + E) | O(1) | O(1) | O(E) | O(E) | O(E) |
Adjacency Matrix | O(V²) | O(V²) | O(1) | O(V²) | O(1) | O(1) |
* V = No. of vertices. E = No. of edges.
Graph Search: BFS and DFS
Two most common ways to search a graph are breadth-first search (BFS) and depth-first search (DFS).
This section will cover how to implement the following in Python:
- A graph
- Breadth First Search
- Depth First Search
Which is better, DFS or BFS?
Depth First:
- Simple to implement than BFS when using recursion
- Better when the item searched for is faraway
- e.g. Simulating chess to end of game combinations
Breadth First:
- BFS always finds the shortest path, given an undirected and unweighted graph
- Often requires more memory than a DFS, due to storing all neighbours
- Better when the item searched for is near the top
- e.g. Social networks
Depth-first Search
From a start node (often the root), visit the first child (node one step away). Now visit the first child of this child (node two steps away from start node). Continue until you reach the end of this path. Go back to the start node, and repeat the process for each of its other child nodes until all nodes are visited.
Steps to implement in code:
- Use a stack (LIFO)
- For the first node, add it to stack, then immediately pop it off
- Add first node’s direct neighbours to the stack, in any order
- Mark them as ‘visited’ in the graph
- Pop the first of these neighbours off the stack
- Add its unvisited neighbours to the stack, and mark them as ‘visited’
- Pop one of these neighbours off the stack, and mark it as visited
- Repeat until you reach a node with no unvisted nodes connected to current node
- Pop remaining nodes off the stack, adding and popping their neighbours as you go
Breadth-first Search
From a start node (often the root), visit all the immediate children (nodes one step away). Now visit the children of the children (nodes two steps away from start node). Continue until you reach the end of the graph (visited all nodes).
The time complexity of BFS is O(V + E), where V is the number of nodes and E is the number of edges.
Steps to implement in code:
- Use a queue (FIFO)
- For the first node, add it to queue, then immediately pop it off
- Add first node’s direct neighbours to the queue, in any order
- Mark them as ‘visited’ in the graph
- Pop the first of these neighbours off the queue
- Add its unvisited neighbours to the queue, and mark them as ‘visited’
- When hit a node in the queue that has all its neighbours marked as ‘visited’, pop the next one off the queue
- Repeat until all nodes popped off the queue and marked as ‘visited’
Useful resources:
Bidirectional Search
Bidirectional search can be used to find the shortest path between a source node and destination node.
This process requires two simultaneous breadth-first searches, one using the source node as its starting point, and the other using the destination node. When their searches collide, this is the path.
Coding a Graph in Python
Before we can traverse a graph, we first need to create it. The following code allows you to create a graph that follows the adjacency list structure.
Basic operations are:
- add_vertex: Creates a new vertex.
- add_edge: Creates an edge between two vertices.
- show_vertices: Displays a list of all vertices in the graph.
- show_edges: Displays a list of tuples representing edges between vertices.
- find_path: Finds a path between a start vertex and end vertex.
class Graph: def __init__(self, graph_dict = {}): """Initializes a graph object. If no graph exists, a new, empty dictionary will be used. Attributes: graph_dict: A Python dictionary (hash table) to hold new items. """ self.graph_dict = graph_dict def add_vertex(self, vertex): """Adds a new vertex to the graph, if it doesn't already exist. A new vertex is added to the graph, with no connected edges. If the vertex already exists, raises error. Args: vertex: The new vertex being added to the graph. """ if vertex not in self.graph_dict: self.graph_dict[vertex] = [] else: raise ValueError("Vertex already exists in graph.") def add_edge(self, start, end): """Adds a new edge to the graph, if it doesn't already exist. If the start node doesn't already exist, it will be created. If the end node doesn't already exist, it will be created. If the edge doesn't exist, it will be created. Args: start: The start vertex for the edge. end: The end vertex for the edge. """ if end not in self.graph_dict: self.add_vertex(end) if start not in self.graph_dict: self.graph_dict[start] = [end] else: self.graph_dict[start].append(end) def show_vertices(self): """Get a list of vertices in the graph. Returns: Python list of all vertices in the graph. """ return list(self.graph_dict.keys()) def show_edges(self): """Display each graph edge as a tuple. (node, neighbour).""" for node in self.graph_dict: for neighbour in self.graph_dict[node]: print("(",node,", ",neighbour,")") def find_path(self, start, end, path = []): """Search for a path between two vertices in the graph. Args: start: Starting vertex. end: End vertex. path: Python list to store the sequence of vertices in a path between the start vertex and end vertex. Returns: A list containing the sequence of vertices from start vertex to end vertex, if a path exists. """ path = path + [start] if start == end: return path for node in self.graph_dict[start]: if node not in path: newPath = self.find_path(node, end, path) if newPath: return newPath return None
Coding Breadth First Search in Python
For the following implementation of BFS, we will be using an adjacency list to represent the graph.
# Create an adjacency list as our graph structure graph = {1: [2, 3], 2: [4, 5], 3: [5], 4: [6], 5: [6], 6: [7], 7: []} def bfs(graph, start): """Performs a Breadth First Search (BFS). Adds starting vertex to queue, then immediately pops it off. Loops through each neighbour of each vertex, adding them to the set of visited vertices, before moving on to the next vertex. Next vertex to check is always taken from the front of the queue. Args: graph: The graph. start: The starting vertex for the search. Returns: Python list of vertices in the order they were traversed. """ visited = set() queue = [start] visited.add(start) while queue: vertex = queue.pop(0) for neighbour in graph[vertex]: if neighbour not in visited: visited.add(neighbour) queue.append(neighbour) return visited bfs(graph, 1) # {1, 2, 3, 4, 5, 6, 7}
Coding Depth First Search in Python
For the iterative implementation of DFS found below, we will be using an adjacency list to represent the graph.
# Create an adjacency list as our graph structure graph = {1: [2, 3], 2: [4, 5], 3: [5], 4: [6], 5: [6], 6: [7], 7: []} def dfs(graph, start): """Performs a Depth First Search (DFS). Adds starting vertex to stack, then immediately pops it off. Loops through each neighbour of a vertex, adding them to the stack. Once all neighbours are on the stack, they are popped off and added to the path. Repeats for each vertex. Next vertex to check is always taken from the top of the stack. Args: graph: The graph. start: The starting vertex for the search. Returns: Python list of vertices in the order they were traversed. """ path = [] stack = [start] while stack: vertex = stack.pop() if vertex in path: continue path.append(vertex) for neighbor in graph[vertex]: stack.append(neighbor) return path dfs(graph, 1) # [1, 3, 5, 6, 7, 2, 4]