Chapter 3: Graphs
3.1 Basic Definitions and Applications
This section simply introduces graphs and some basic terminology that is related to them. This section is pretty much review from CS112, but there are definitely a few important concepts that it is helpful to review and remember. Graphs are incredibly useful data structures, especially for modeling a variety of networks, such as social or communication networks.
A path in an undirected graph is a sequence of connected nodes. Paths can be simple, where all vertices are distinct, or there can be a cycle, which is when the path loops back to where it started. A graph is connected if for every pair of nodes, there is a path between the nodes. A tree is an undirected graph that is connected and does not contain a cycle. You can picture it as being able to pick a node as the “top” of the tree and all other nodes can hang down from it. Every tree of n nodes will have n-1 edges. The main theorem of this section is that for an undirected graph G of n nodes, any two of the following imply the third:
i) G is connected
ii) G does not contain a cycle
iii) G has n-1 edges.
3.2 Graph Connectivity and Graph Traversal
This section looks how to efficiently determine if there is a path from node s to node t in G. To do this, we compare breadth-first search(BFS) and depth-first search(DFS). BFS starts at s and explores outward in each direction, considering all nodes 1 step away, then all nodes 2 steps away, and so on. We call these layers, so each layer i of the tree represents the nodes that are i edges from the starting node s. We know there is a path from s to t if and only if t appears in one of the layers. Another property of a BFS tree is that nodes with an edge between them must be in adjacent layers.
In DFS, you follow the first edge out the node s, and follow it until you reach a dead end, where you back-track and explore the next node at that layer, and so on. The key property to remember about a DFS tree is that is that edges not in the tree, but in G can only connect ancestors.
Both BFS and DFS can build the connected component of a graph. However, they visit nodes in a differnet order and BFS produces a dense tree in comparison to DFS. Personally, I think the BFS algorithm and approach is more logical to me. The point of these trees is to show that for any two nodes in a graph, their connected components(which can be represented as a tree) are either identical or disjoint.
3.3 Implementing Graph Traversal Using Queues and Stacks
A graph can be represented as an adjacency list or an adjacency matrix. In this book, we use the adjacency list. The main reason they chose this implementation is because it corresponds well to the idea of “exploring” the graph. With an adjacency list, if the algorithm is looking at a node, it can read the list of neighbors in constant time for each neighbor. Overall, the adjacency list takes O(m+n) space, which is less than the adjacency matrix.
The BFS can be implemented with an adjacency list and either a stack or a queue(we chose a queue). This ambiguity is because the algorithm can consider the nodes in each layer in any order. This allows us to implement BFS in O(m+n) time. The algorithm for BFS is on page 90-91. DFS, on the other hand, must use a stack in order to get a running time of O(m+n). This is due to the recursive nature of DFS and the fact that the order in which we visit the nodes is important. The algorithm is on page 93.
Because BFS and DFS only spend time on nodes and edges in the connected component of the graph, we can find the set of all connected components in O(m+n) time, even though we may have to run the DFS or BFS multiple times.
3.4 Testing Bipartiteness: An Application of Breadth-First Search
A graph is bipartite if all the nodes can be colored red or blue such that each edge has one red end and one blue end. The problem that this section discusses is how we determine if a graph G is bipartite. One easy way to rule out a graph as bipartite is to look at cycles. If a graph has an odd length cycle, it is not bipartite. Designing an algorithm to determine if a graph is bipartite makes use of the layers in a BFS tree. You choose a node s and color it red. Then all the nodes in layer 2 will be blue, and so on. Since we only add a coloring step, the running time for the algorithm is the same as that for BFS, which is O(m+n). If there is an edge between two nodes in the same layer, then there will be an odd length cycle and the graph will not be bipartite.
I have seen problems similar to this in other classes. The two color problem, or the problem of coloring connected nodes with the least number of colors possible. I think it is very interesting and a fairly intuitive problem to me.
3.5 Connectivity in Directed Graphs
This section gives a basic introduction to directed graphs and a few important concepts related to them. In order to represent a directed graph, we will use the adjacency list, as before, but each node will keep track of nodes that point to it and nodes it points to(i,e, to and from nodes). BFS and DFS trees are pretty much the same as with undirected graphs, but it is important to remember that even if there is a path from node s to node t, there may not be a path from t to s.
In talking about a strongly connected graph, it is best to talk about whether or not two nodes are mutually reachable, which means there is a path from the first node to the second, and vice versa. One nice property is that if u and v are mutually reachable and v and w are mutually reachable, then u and w are mutually reachable. This is an intuitive property that is very useful. The strong components of two nodes in a graph are either disjoint or identical.
It seems to me that algorithms for directed graphs only require slight modifications to the algorithms for undirected graphs. In some cases, the algorithms are the same; the results just need to be interpreted differently.
3.6 Directed Acyclic Graphs and Topological Ordering
If an undirected graph has not cycles, the each of its connected components is a tree. If a directed graph has this property it is a DAG. DAG's are useful for representing precedence and dependencies. A topological ordering of a DAG is and ordering of the nodes such that each node is “less than” the next. This means that each of the edges points forward. The goal of this section is to prove that every DAG has a topological ordering, and then that we can find it efficiently.
The key to finding a topological ordering is to find the first node. This is easier than it seems because the highest node will be the one with no incoming edges. From there, you can traverse the outgoing edges to build the topological ordering. An algorithm to compute a topological ordering is on page 102. As far as analysis goes, it appears the running time in O(n^2), but it can actually be improved to O(m+n) if we keep track of the nodes that can be deleted at all times, thus spending constant time per edge. I occasionally get confused by the O(m+n) running times in the book. They make more sense in class when we can label each step of the algorithm more precisely.