This is an old revision of the document!

Chapter 3

In this chapter, we will introduce the basic definitions surrounding graphs, and list a spectrum of different algorithmic settings where graphs arise naturally. We will then discuss some basic algorithmic primitives for graphs, beginning with the problem of connectivity and developing some fundamental graph search techniques.

3.1 Basic Definitions and Applications

3.1.1 Definitions

A graph consists of a collection V of nodes (or vertices) and a collection E of edges, each of which “joins” two of the nodes. We thus represent an edge e ∈ E as a two-element subset of V: e = {u, v} for some u, v ∈ V, where we call u and v the ends of e.

A directed graph G’ consists of a set of nodes V and a set of directed edges E’, where each e’ ∈ E’ is an ordered pair (u, v). By contrast, an undirected graph is one in which the roles of u and v in a pair (u, v) for an edge e are interchangeable.

3.1.2 Examples of Applications of Graphs

Some well known examples of applications of graphs are given below:

Transportation networks (e.g. the map of routes served by an airline carrier)
Communication networks (e.g. a collection of computers connected via a network)
Information networks (e.g. the World Wide Web)
Social networks (e.g. Facebook)
Dependency networks (e.g for the list of courses offered by a college, we could have a node for each course and an edge from u to v if u is a prerequisite for v.)

3.1.3 Paths and Connectivity

A path in an undirected graph G = (V, E) is defined to be a sequence P of nodes v_1, v_2, … , v_k-1, v_k with the property that each consecutive pair v_i, v_i+1 is joined by an edge in G.

A path is called simple if all its nodes are distinct from one another.

A cycle is a path v_1, v_2, … , v_k-l, v_k in which k > 2, the first k-1 nodes are all distinct, and v_1 = v_k. In other words, the sequence of nodes “cycles back” to where it began.

We say a graph is connected if, for every pair of nodes u and v, there is a path from u to v.

3.1.4 Trees

An undirected graph is a tree if it is connected and does not contain a cycle. We say that, for nodes v and w, w is a descendant of v (or v is an ancestor of w) if v lies on the path from the root node to w; and we say that a node is a leaf if it has no descendants.

Important to note: every tree with n nodes that exactly n-1 edges. Any less than n-1 edges and the graph will no longer be a tree, and any more than n-1 edges the graph will contain a cycle.

The following theorem is true:

Let G be an undirected graph with n nodes. Any two of the following statements implies the remaining third:
  - G is connected.
  - G does not contain a cycle.
  - G has n-1 edges.

3.1.5 Comments

For me, this section was mostly review from CSCI-112, and from a high school Computer Science class. It was very easy to read and comprehend, but not particularly interesting, since it is basically just facts about trees–something that I already knew. That said, I did find the use of graphs in social networks kind of interesting, because I had never thought about graphs that way before.

3.2 Graph Connectivity and Graph Traversal

In this section, we describe two natural algorithms for the node-to-node connectivity problem at a high level: breadth-first search (BFS) and depth-first search (DFS).

3.2.1 Breadth-First Search (BFS)

In breadth-first search, we explore outward from starting node s in all possible directions, adding nodes one “layer” at a time. Thus we start with s and include all nodes that are joined by an edge to s–this is the first layer of the search. We then include all additional nodes that are joined by an edge to any node in the first layer–this is the second layer. We continue in this way until no new nodes are encountered.

For each j ≥ 1, layer L_j produced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer.

BFS has the natural property of producing trees that are bushier and branched out.

3.2.2 Depth-First Search (DFS)

In depth-first search, we start from s and try the first edge leading out of it, to a node v. We then follow the first edge leading out of u, and continue in this way until we reached a “dead end”–a node for which we had already explored all its neighbors. We then backtrack until we get to a node with an unexplored neighbor, and resume from there. This algorithm is called depth-first search since it explores G by going as deeply as possible and only retreating when necessary.

The algorithm for DFS is relatively simple and straight-forward:

DFS(u):
   Mark u as "Explored" and add u to R 
   For each edge (u,v) incident to u
      If v is not marked "Explored" then 
         Recursively invoke DFS(v)
      Endif 
   Endfor

While DFS ultimately visits exactly the same set of nodes as BFS, it typically does so in a very different order: it probes its way down long paths, potentially getting very far from s, before backing up to try nearer unexplored nodes. As such, it produces trees that are deep and narrow, as opposed to the trees produced by BFS which are bushy and branched out.

3.2.3 Comments

This section was also part review from CSCI112, where we discussed breadth-first and depth-first graph traversals superficially. The section was, however, much more detailed than what we did in CSCI112. The book did a good job with tree diagrams explaining the algorithms for the traversals step-by-step–it made understanding the underlying concepts very easy (not to mention the fact that we have discussed the algorithms in detail in class). I'd say that this section was easy to comprehend, and relatively interesting as well.

3.3 Implementing Graph Traversal Using Queues and Stacks

3.3.1 Representing Graphs

There are two basic ways to represent graphs: by an adjacency matrix and by an adjacency list representation.

A graph G = (V, E) has two natural input parameters, the number of nodes |V|, and the number of edges |E|. We will use n = |V| and m = |E| to denote these, respectively. Running times will be given in terms of both of these two parameters. In this section, we aim to implement the basic graph search algorithms in time O(m+n), which we will refer to as linear time.

For a graph G = (V, E) with n nodes, assuming that the set of nodes is V = {1 ….. n}, we can represent the graph as an adjacency matrix, which is an n×n matrix A where A[u,v] = 1 if there is an edge between u and v, and A[u,v] = 0 otherwise. The adjacency matrix representation allows us to check in O(1) time if a given edge (u, v) is present in the graph. However, the representation takes Θ(n^2) space.

On the other hand, we could also represent the graph using the adjacency list representation. In the adjacency list representation, there is a record for each node v, containing a list of the nodes to which v has edges. To be precise, we have an array Adj, where Adj[v] is a record containing a list of all nodes adjacent to node v. For an undirected graph G = (V, E), each edge e = (v, w) ∈ E occurs on two adjacency lists: node w appears on the list for node v, and node v appears on the list for node w.

While an adjacency matrix requires O(n^2) space (since it uses an n×n matrix), the adjacency list representation requires only O(m+n) space. In the adjacency list representation, checking if a given edge (u, v) is present in the graph can take time proportional to the degree O(n_v): we have to follow the pointers on u’s adjacency list to see if edge v occurs on the list.

3.3.2 Implementing Breadth-First Search

The following is an algorithm that implements BFS:

BFS(s):
   Set Discovered[s] = true and Discovered[v] = false for all other v 
   Initialize L[0] to consist of the single element s
   Set the layer counter i=0
   Set the current BFS tree T=0
   While L[i] is not empty
      Initialize an empty list L[i+1] 
      For each node u ∈ L[i]
         Consider each edge (u, v) incident to u 
         If Discovered[v] = false then
            Set Discovered[v] = true
            Add edge (u, v) to the tree T
            Add v to the list L[i+1] 
         Endif
      Endfor
      Increment the layer counter i by one 
   Endwhile

The above implementation of the BFS algorithm runs in time O(m+n) (i.e., linear in the input size), if the graph is given by the adjacency list representation. (Detailed proof on page 91 of the textbook.)

3.3.3 Implementing Depth-First Search

The following algorithm implements BFS:

DFS(s):
   Initialize S to be a stack with one element s
   While S is not empty
      Take a node u from S
      If Explored[u] = false then
         Set Explored[u] = true
         For each edge (u, v) incident to u
            Add v to the stack S 
         Endfor
      Endif 
   Endwhile

The above implementation of the DFS algorithm runs in time O(m+n) (i.e., linear in the input size), if the graph is given by the adjacency list representation. (Detailed proof on page 94 of the textbook.)

3.3.4 Comments

This section was a relatively interesting read. I found myself struggling with the notion of O(m+n) initially–even after discussing it in class. Going back and reading the section made much more sense after in-class discussion. I guess I've never had to deal with two variables in terms of the input in order to determine the big-O, which made getting used to the idea a bit more challenging. That said, it does seem intuitive now that I have read this section a couple of times and we have discussed the concept in detail in class.

3.4 Testing Bipartiteness: An Application of the Breadth-First Search

A bipartite graph is one where the node set V can be partitioned into sets X and Y in such a way that every edge has one end in X and the other end in Y. For ease of understanding, we can imagine that the nodes in the set X are colored red, and the nodes in the set Y are colored blue.

If a graph G is bipartite, then it cannot contain an odd cycle. This is because if we have k red nodes, we must also have k blue nodes. Having an odd cycling means that G has 2n+1 nodes. There is no way to divide these 2n+1 nodes into k red nodes and k blue nodes.

3.4.1 An Algorithm for Testing Bipartiteness

We can modify the BFS algorithm slightly to create an algorithm that tests if a graph G is bipartite. We can do this by adding an extra array Color over the nodes. Whenever we get to a step in BFS where we are adding a node v to a list L[i+1], we assign Color[v] = red if i+1 is an even number, and Color[v] = blue if i+1 is an odd number. At the end of this procedure, we simply scan all the edges and determine whether there is any edge for which both ends received the same color. The total running time for the coloring algorithm is O(m+n), much like BFS.

Theorem:

Let G be a connected graph, and let L_1, L_2 .... be the layers produced by BFS starting at node s. Then exactly one of the following two things must hold:
   (i) There is no edge of G joining two nodes of the same layer. In this case G is a bipartite graph in which the nodes in even-numbered layers can be colored red, and the nodes in odd-numbered layers can be colored blue.
   (ii) There is an edge of G joining two nodes of the same layer. In this case, G contains an odd-length cycle, and so it cannot be bipartite.

The detailed proof of this theorem can be found on page 96 of the textbook.

3.4.2 Comments

Despite the its fancy-sounding name, the concept of bipartite graphs was very easy to follow. The algorithm to test for bipartiteness was more or less the BFS algorithm with a few steps added to it. I did think that testing for bipartiteness was an interesting application of the BFS algorithm, and because of that, I found this section a good read.

3.5 Connectivity in Directed Graphs

In a directed graph, the edge (u, v) has a direction: it goes from u to v. In this way, the relationship between u and v is asymmetric, and this has qualitative effects on the structure of the resulting graph.

3.5.1 Representing Directed Graphs

We can use a version of the adjacency list representation to represent directed graphs. In this representation, instead of each node having a single list of neighbors, each node has two lists associated with it: one list consists of nodes to which it has edges, and a second list consists of nodes from which it has edges.

3.5.2 The Graph Search Algorithms

The BFS and DFS algorithms are more or less the same for directed graphs as they are for undirected graphs.

For the BFS algorithm, we start at a node s, define a first layer of nodes to consist of all those to which s has an edge, define a second layer to consist of all additional nodes to which these first-layer nodes have an edge, and so forth. In this way, we discover nodes layer by layer as they are reached in this outward search from s, and the nodes in layer j are precisely those for which the shortest path from s has exactly j edges. As in the undirected case, this algorithm performs at most constant work for each node and edge, resulting in a running time of O(m+n). Note, however, that since in directed graphs, it is possible for a node s to have a path to a node t even though t has no path to s, what directed BFS is computing is merely the set of all nodes t with the property that s has a path to t; such nodes may or may not have paths back to s.

The DFS algorithm for directed graphs also runs in linear time and computes the same set of nodes. It is again a recursive procedure that tries to explore as deeply as possible, in this case only following edges according to their inherent direction. Thus, when DFS is at a node u, it recursively launches a depth-first search, in order, for each node to which u has an edge.

3.5.3 Strong Connectivity

A directed graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u.

Table of Contents