Chapter 3: Graphs

Chapter 3: Graphs

3.1: Basic Definitions and Applications

We define a graph as a collection V of vertices, or nodes, and a collection of edges between those vertices, represented as two-element subsets: {u, v}, where u and v are nodes. For a directed graph, we consider the ordered pair (u, v), in which u is the tail of the edge, and v is the head. Then, the edge leaves u and enters v.

We see graphs everywhere:

Transportation networks
Communication networks
Information networks
Social networks
Dependency networks (e.g. college courses and pre-requisites).

A fundamental operation involving graphs is traversing a sequence of nodes connected by edges. Specifically, we call this traversing a path of an undirected graph, defined as a sequence of nodes in which each consecutive pair is joined by an edge. A path is simple if all its vertices are distinct (i.e. if to follow it we don't visit a node twice), and a cycle is a path in which all nodes are distinct except the first and last are equal (i.e. we cycle back to where we began). Additionally, we call an undirected graph connected if there exists a path between all pairs of points. We say a directed graph is strongly connected if there is a path in both directions between each pair of points. Finally, we define the distance between two nodes to be the minimum number of edges in a path between the two. If there exists no path, we simply denote that with a symbol such as the infinity symbol.

Trees

A tree is an undirected graph that is connected and does not contain a cycle. Note that deleting any edge from a tree will make it disconnected.

Given any tree, it is useful to root it at a particular node. Thence we let the others “fall” below that root node (defined by us) and the tree is visible. Additionally, the terms parents, children, ancestors, and descendants are self-explanatory. We have the following statement:

Let G be an undirected graph on n nodes. Any two of the following statements imply the third:

G is connected.
G does not contain a cycle.
G has n - 1 edges.

Note that this implies that all trees of n nodes have n - 1 edges.

This was a good section which broke the ideas down to fundamentals in a manner both mathematically sound as well as illustrative: 9/10.

3.2: Graph Connectivity and Traversal

When considering a graph, a common question is, given two nodes s and t, is there a path between them? This is considered the question of node-to-node connectivity, or more specifically, s-t connectivity. We proceed to consider (at relatively high-level) two different natural algorithms to find such a path, or lack thereof.

Breadth-First Search (BFS)

A BFS consists of starting from our node s, and sequentially exploring outward in layers. Specifically,

Layer L₁ consists of all nodes that are neighbors of s (note that sometimes we may refer to the set containing only s as L₀)
Having defined layers L₁, …, L_j, layer L_j+1 consists of all nodes that do not belong to an earlier layer and that have an edge to a node in layer L_j.

Note that L_j is the set of all nodes exactly j from s, as well as that if a node does not appear in any layers, then there is no path from s to it. In summation by the book:

For each j >= 1, layer L_j produced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t iff t appears in some layer.

Additionally, the BFS is a natural way to produce a tree T rooted at s. Then this leads to the “dropping-off” of any non-tree graph edges, which the following fact mentions:

Let T be a BFS tree from graph G, let x and y be nodes in T belonging to layers L_i and L_j respectively, and let (x, y) be an edge of G. Then i and j differ by at most 1.

The BFS algorithm yields a set R that we'll call the connected component of G containing s. Once we have this, we need simply check if t is in s to check s-t connectivity. Additionally, the path between the two is easily recoverable by recording the nodes visited along each path, then tracing backward from t. Lastly, note that the BFS is only one method of ordering the search and exploration of the connected component R. Now we proceed to explore another.

Depth-First Search (DFS)

The depth-first search follows each path out to its end (a node where all nodes to which it's connected have already been explored), and then recursively follows all others (by backtracking to the last node which had one unexplored). These resulting DFS trees look very different to the BFS trees, largely because the nodes are visited in drastically different order. Thus we arrive at a similar statement, but one more focused on ancestry than levels:

Let T be a DFS tree in graph G, let x and y be nodes in T, and let (x, y) be an edge of G that is not an edge of T. Then one of x or y is an ancestor of the other.

Regardless of the search method, both find the exact same set of connected components, as stated below:

For any two nodes s and t in a graph, their connected components are either identical or disjoint.

We could think of this as illustrative of connection being transitive (at least in the undirected graph realm we're considering here).

This was also a very enjoyable section to read, clear and concise, but also covering all questions. 9/10.

3.3: Implementing Graph Traversal Using Queues and Stacks

We define a graph G = (V,E) with n = |V| nodes and m = |E| edges. We represent a given edge from node u to node v as (u,v). Note that a graph can have at most one edge between any pair of nodes, so m =< (n choose 2) =< n², but also m >= n-1. We proceed to implement graph search algorithms in O(m + n) time, which we call linear in terms of the number of nodes and edges. There are two basic representations of graphs.

An adjacency matrix is an n x n matrix A where A[u,v] is 1 if the edge (u,v) is in the graph. The same works for the other direction, which supports directed graphs as well. Unfortunately, this representation requires Theta(n²) space, and also takes Theta(n) time to search if a certain edge exists from a given node. For more efficiency in both, particularly when there are far fewer edges, we can use a better representation.

Adjacency lists improve on much of these issues. What we have is an array Adj with an index for each node. At each Adj[u] there is a list containing all the nodes adjacent to u. Therefore, in an undirected graph, edge (u,v) appears twice–v appears in u's list, and vice versa. This very naturally extends to use for directed graphs then.

Adjacency lists require only O(m + n) space. This is because Adj obviously has n spaces, and then the total length of all the lists at those spaces is 2m. The latter is easily understood through defining the degree of a node v, n_v, to be the number of incident edges to v. The length of the list at Adj[v] then is n_v, and sum of this across all nodes is 2m.

Additionally, if we are currently at a given node, it is constant time to access any/all of its neighbors. This lends itself to searching efficiently, so in all of our algorithms we will use the adjacency list representation.

Implementing Breadth-First Search

To do the following algorithm for the adjacency list structure, we simply maintain an array Discovered of length n (one spot corresponding to each node), initially all marked False, and change them to True whenever we discover a given node. Then, we have a list L[i] for each layer, adding the nodes there when we discover them in a given layer. Here, s is our source node.

**Disclaimer: still working on the indents for algorithms

BFS(s)

Set Discovered[s] = true and Discovered[v] = false for all other v Initialize L[0] to consist of just node s Set layer counter i = 0 Set current BFS tree T to be an empty set While L[i] is not empty: Initialize an empty list L[i+1] For each node u in L[i]: Consider each edge (u,v) incident to u If Discovered[v] = False then: Set Discovered[v] = True Add edge (u,v) to tree T Add v to list L[i+1] Endif Endfor i += 1 Endwhile

This algorithm runs in time O(m + n). Note that instead of various lists we could use a queue to maintain the levels.

Implementing Depth-First Search

For DFS, we make use of a stack, and then instead of the Discovered array, we use an Explored array, only marking each node as true after all of its edges have been explored. The algorithm is as follows:

DFS(s)

Initialize S to be a stack with one element s While S is not empty: Take a node u from S If Explored[u] = False then: Set Explored[u] = True For each edge (u,v) incident to u: Add v to the stack S Endfor Endif Endwhile

We can further use an array Parent to create the DFS tree by setting Parent[v] = u when we add node v to the stack via edge (u,v). This algorithm runs in time O(m + n).

Additionally, we can find the set of all connected components using either algorithm, and this takes O(m + n) time as well. This is because doing one connected component removes those nodes from the next group of nodes in the graph G that must be searched next.

This section explained the algorithms very well, I would give it a 9/10.

3.4: Testing Bipartiteness with Breadth-First Search

Recall that a bipartite graph is one that can have its nodes partitioned into sets X and Y such that each edge has one end in X and the other in Y.

A graph is bipartite if and only if it contains no odd cycles.

We can implement an algorithm for testing this using the BFS search tree. We simply create an array Color in which each node is assigned a color red or blue, alternating with each subsequent level. If ever there is an edge for which both ends have the same color at the end, then it is not bipartite. This runs in O(m + n) time as well, a nice recurring theme it seems. The proof for this algorithm hinges simply on the fact that if there is ever an edge in G between nodes in the same layer, then there is an odd length cycle, and therefore it is not bipartite.

Obviously brief, this section complemented our class discussion of the process very well, though I do think it could have been a little more concise in the order of its claims; an 8/10.

3.5: Connectivity in Directed Graphs

Directed graphs simply mean that edge (u,v) indicates an edge from u to v. To represent these, we simply give each node two lists: one which contains the nodes to which it has edges, and the other which has all the nodes from which edges are incident to it.

DFS and BFS run essentially the exact same way with these, simply following the edges “out” from the source node. However, the crucial distinction is in what these search trees represent. The “connected component” of s then is all of the nodes that can be reached from it. Note that BFS still gives us the shortest path if one exists. Additionally, we can define a new graph G^rev as the same G but with all the directions reversed. Then, running a BFS or DFS in this reversed graph, we know if we reach node n from s in G^rev, then that indicates that it has a path to s in G.

Strong Connectivity

This above function becomes useful in determining strong connectivity of a given graph. First, we define two nodes to be mutually reachable if there is a path from u to v and from v to u. Additionally, note that this property is transitive.

If every pair of nodes in a graph is mutually reachable, then we call the graph strongly connected. To determine whether a given graph is strongly connected, we run the search mentioned above from s in G and in G^rev, and if either fails to reach every node, then G is not strongly connected.

So, we can define the strong component of a given node to be the set of all other nodes to which it is strongly reachable. This is similar to the connected components of undirected graphs. Then, we have that:

For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.

Just as with the connected components, we can determine all of the strong components of a directed graph in, wait for it, O(m + n) time.

This was a lovely chapter that expounded on strong connectivity very smoothly; 10/10.

3.6: Directed Acyclic Graphs Topological Ordering

A directed graph with no cycles is directed acyclic graph or DAG. We define a topological ordering of a directed graph to be an ordering of all of its nodes as v₁, v₂, …, v_n so that for every edge (v_i, v<subj</sub>), we have that i < j. This immediately leads to the fact that if G has a topological ordering, then G is a DAG.

From the fact that a DAG must have a node with no incoming edges, we then find the stronger statement that:

G is a DAG iff it has a topological ordering.

Algorithm to compute topological ordering of G

Find a node v with no incoming edges and order it first Delete v from G Recursively compute topological ordering of G-{v} and append this after v

Thanks to the above theorem, we know that there will always be such a node v to compute. The total running time is O(n²), and if G is very dense (Theta(n² edges) then this is linear in the size of the input. If the graph is more sparse though, then the O(m + n) could be a very good improvement.

We can actually achieve this O(m + n) time if we are more efficient in finding the nodes that have no incoming edges. We simply maintain for each node the number of incoming edges that it has. Then that number iterates downwards when one its incoming neighbor is deleted, to where we are aware of which node has zero.

This section also did a great job of explaining, and I am a big fan of the giving us the general algorithm, and then a method to make it more efficient. This will prove very beneficial in developing/analyzing them in the future as they grow more complex; 9/10.

Table of Contents