This is an old revision of the document!
Table of Contents
Chapter 3: Graphs
3.1: Basic Definitions and Applications
We define a graph as a collection V of vertices, or nodes, and a collection of edges between those vertices, represented as two-element subsets: {u, v}, where u and v are nodes. For a directed graph, we consider the ordered pair (u, v), in which u is the tail of the edge, and v is the head. Then, the edge leaves u and enters v.
We see graphs everywhere:
- Transportation networks
- Communication networks
- Information networks
- Social networks
- Dependency networks (e.g. college courses and pre-requisites).
A fundamental operation involving graphs is traversing a sequence of nodes connected by edges. Specifically, we call this traversing a path of an undirected graph, defined as a sequence of nodes in which each consecutive pair is joined by an edge. A path is simple if all its vertices are distinct (i.e. if to follow it we don't visit a node twice), and a cycle is a path in which all nodes are distinct except the first and last are equal (i.e. we cycle back to where we began). Additionally, we call an undirected graph connected if there exists a path between all pairs of points. We say a directed graph is strongly connected if there is a path in both directions between each pair of points. Finally, we define the distance between two nodes to be the minimum number of edges in a path between the two. If there exists no path, we simply denote that with a symbol such as the infinity symbol.
Trees
A tree is an undirected graph that is connected and does not contain a cycle. Note that deleting any edge from a tree will make it disconnected.
Given any tree, it is useful to root it at a particular node. Thence we let the others “fall” below that root node (defined by us) and the tree is visible. Additionally, the terms parents, children, ancestors, and descendants are self-explanatory. We have the following statement:
Let G be an undirected graph on n nodes. Any two of the following statements imply the third:
- G is connected.
- G does not contain a cycle.
- G has n - 1 edges.
Note that this implies that all trees of n nodes have n - 1 edges.
This was a good section which broke the ideas down to fundamentals in a manner both mathematically sound as well as illustrative: 9/10.
3.2: Graph Connectivity and Traversal
When considering a graph, a common question is, given two nodes s and t, is there a path between them? This is considered the question of node-to-node connectivity, or more specifically, s-t connectivity. We proceed to consider (at relatively high-level) two different natural algorithms to find such a path, or lack thereof.
Breadth-First Search (BFS)
A BFS consists of starting from our node s, and sequentially exploring outward in layers. Specifically,
- Layer L1 consists of all nodes that are neighbors of s (note that sometimes we may refer to the set containing only s as L0)
- Having defined layers L1, …, Lj, layer Lj+1 consists of all nodes that do not belong to an earlier layer and that have an edge to a node in layer Lj.
Note that Lj is the set of all nodes exactly j from s, as well as that if a node does not appear in any layers, then there is no path from s to it. In summation by the book:
For each j >= 1, layer Lj produced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t iff t appears in some layer.
Additionally, the BFS is a natural way to produce a tree T rooted at s. Then this leads to the “dropping-off” of any non-tree graph edges, which the following fact mentions:
Let T be a BFS tree from graph G, let x and y be nodes in T belonging to layers Li and Lj respectively, and let (x, y) be an edge of G. Then i and j differ by at most 1.
The BFS algorithm yields a set R that we'll call the connected component of G containing s. Once we have this, we need simply check if t is in s to check s-t connectivity. Additionally, the path between the two is easily recoverable by recording the nodes visited along each path, then tracing backward from t. Lastly, note that the BFS is only one method of ordering the search and exploration of the connected component R. Now we proceed to explore another.
Depth-First Search (DFS)
The depth-first search follows each path out to its end (a node where all nodes to which it's connected have already been explored), and then recursively follows all others (by backtracking to the last node which had one unexplored). These resulting DFS trees look very different to the BFS trees, largely because the nodes are visited in drastically different order. Thus we arrive at a similar statement, but one more focused on ancestry than levels:
Let T be a DFS tree in graph G, let x and y be nodes in T, and let (x, y) be an edge of G that is not an edge of T. Then one of x or y is an ancestor of the other.
Regardless of the search method, both find the exact same set of connected components, as stated below:
For any two nodes s and t in a graph, their connected components are either identical or disjoint.
We could think of this as illustrative of connection being transitive (at least in the undirected graph realm we're considering here).
This was also a very enjoyable section to read, clear and concise, but also covering all questions. 9/10.
3.3: Implementing Graph Traversal Using Queues and Stacks
We define a graph G = (V,E) with n = |V| nodes and m = |E| edges. We represent a given edge from node u to node v as (u,v). Note that a graph can have at most one edge between any pair of nodes, so m =< (n choose 2) =< n2, but also m >= n-1. We proceed to implement graph search algorithms in O(m + n) time, which we call linear in terms of the number of nodes and edges. There are two basic representations of graphs.
An adjacency matrix is an n x n matrix A where A[u,v] is 1 if the edge (u,v) is in the graph. The same works for the other direction, which supports directed graphs as well. Unfortunately, this representation requires Theta(n2) space, and also takes Theta(n) time to search if a certain edge exists from a given node. For more efficiency in both, particularly when there are far fewer edges, we can use a better representation.
Adjacency lists improve on much of these issues. What we have is an array Adj with an index for each node. At each Adj[u] there is a list containing all the nodes adjacent to u. Therefore, in an undirected graph, edge (u,v) appears twice–v appears in u's list, and vice versa. This very naturally extends to use for directed graphs then.
Adjacency lists require only O(m + n) space. This is because Adj obviously has n spaces, and then the total length of all the lists at those spaces is 2m. The latter is easily understood through defining the degree of a node v, nv, to be the number of incident edges to v. The length of the list at Adj[v] then is nv, and sum of this across all nodes is 2m.
Additionally, if we are currently at a given node, it is constant time to access any/all of its neighbors. This lends itself to searching efficiently, so in all of our algorithms we will use the adjacency list representation.
Implementing Breadth-First Search
To do the following algorithm for the adjacency list structure, we simply maintain an array Discovered of length n (one spot corresponding to each node), initially all marked False, and change them to True whenever we discover a given node. Then, we have a list L[i] for each layer, adding the nodes there when we discover them in a given layer. Here, s is our source node.
BFS(s)
Set Discovered[s] = true and Discovered[v] = false for all other v
Initialize L[0] to consist of just node s
Set layer counter i = 0
Set current BFS tree T to be an empty set
While L[i] is not empty:
Initialize an empty list L[i+1]
For each node u in L[i]:
Consider each edge (u,v) incident to u
If Discovered[v] = False then:
Set Discovered[v] = True
Add edge (u,v) to tree T
Add v to list L[i+1]
Endif
Endfor
i += 1
Endwhile
This algorithm runs in time O(m + n). Note that instead of various lists we could use a queue to maintain the levels.
Implementing Depth-First Search
For DFS, we make use of a stack, and then instead of the Discovered array, we use an Explored array, only marking each node as true after all of its edges have been explored. The algorithm is as follows:
DFS(s)
Initialize S to be a stack with one element s
While S is not empty:\
Take a node u from S
If Explored[u] = False then:
Set Explored[u] = True
For each edge (u,v) incident to u:
Add v to the stack S
Endfor
Endif
Endwhile
We can further use an array Parent to create the DFS tree by setting Parent[v] = u when we add node v to the stack via edge (u,v).
This algorithm runs in time O(m + n).
Additionally, we can find the set of all connected components using either algorithm, and this takes O(m + n) time as well. This is because doing one connected component removes those nodes from the next group of nodes in the graph G that must be searched next.
