This section will focus, as the title suggests, on graphs and their uses. It will include the basic definitions about graphs and progress into algorithms where graphs “arise naturally”. It will finish off with discussion about connectivity and fundamental graph search techniques.
A graph consists of a collection V of nodes and a collection E of edges that join two nodes. As presented in the book, an edge e that is in E and u, v which are in V, we have V: e = {u, v} where u and v are ends of e. If we want to specify the order of these edges, we can make a directed graph where for each ordered pair (u, v), we will call u the tail and v the head, as the edge e will “leave node u and enter node v.” By default, a graph is undirected. Also, vertex and node are interchangeable.
To provide a deeper definition of graphs, the authors provided the following examples:
A path in an undirected graph is described as a sequence of nodes where each pair of consecutive nodes is joined by an edge. A path is simple if all vertices are distinct. A cycle is a path where the nodes are distinct and the last node is connected to the first. These all apply to a directed graph, except that the direction matters in a directed graph.
Connectivity is apparent in an undirected graph if, for every pair of nodes u and v, there is a path from u to v. In a directed graph, it is strongly connected if there is a path from u to v and path from v to u.
A tree is a graph in which there is no cycle and it is connected. Every n-node tree will have exactly n - 1 edges. Also, if we have G, an undirected graph on n nodes, and any two of the following three statements is true, then so too is the third:
Overall, this section was a very good overarching definition of graphs and I felt a lot more confident in my knowledge of them after reading it. I would like to see more code of how they are implemented to get a complete understanding of their uses in the real world, though, before I could say that I fully comprehend them. I would give this section an 8/10 for readability, as it was effective in its relating graphs to real world examples while providing through definitions of them.
The point of this section was to figure out some efficient ways to traverse graphs and find out there connectivity, as the title suggests.
The first algorithm discussed was Breadth-First Search. This was deemed “perhaps the simplest algorithm for determining connectivity.” In this, you explore outward from starting node s, adding a single layer at a time. So, any node that s touches will be added to the next layer. The next layer is found by adding any nodes that are touched from the first layer, and so on until all nodes are added to a particular layer (none added twice). This algorithm helps us primarily to determine the shortest path from s to a particular node, as you only have to count the amount of edges it takes to get to that particular node.
The next algorithm was Depth-First Search. An interesting way to remember depth-first search was as if it was a maze of rooms and you were trying to find your way out. You would start at a particular node s and work your way around to all the rooms that touch s. When you reached a dead end, you would retrace your steps and then enter the next room that you hadn't entered yet. This is the basic interpretation of the depth-first search, where you backtrack on the previously traversed nodes when you reach a node at the end of a connection. Thinking of DFS this way really helped me to remember its algorithm.
Overall I'd give this section a 6/10. It was pretty simple in its content, as I had a pretty good understanding of both algorithms already and in some parts it was exhaustive in its explanations when I felt it didn't need to be.
This section dealt with the implementation of the previously discussed algorithms. It talked about how to use both lists and arrays to represent these graphs and the benefits and shortcomings of both.
The adjacency matrix and adjacency list were the two “basic ways” to represent a graph. It is stated that throughout the rest of the book, the adjacency list will be the preferred implementation. It is also stated that the sought after runtime for both is O(m + n) (referred to as linear) and for connected graphs, O(m), because m >= n - 1 in a connected graph.
In an nxn matrix A, we can check a given edge's presence in constant time, its main advantage. Its disadvantages are that it takes up a decent amount of space at Θ(n2) and that checking to each edge incident to a particular node runs in Θ(n).
On the other hand, an adjacency list will have a record for each node v and contain a list of each node that v touches. Therefore, the adjacency list only requires O(m + n) space, because it will have two lists of (possibly) different lengths: one for the amount of nodes and one for the edges of each node. This will check if an edge is present in linear time, slower than the adjacency matrix, as it moves to the desired node and then iterates through its edges.
To maintain the sets of these elements, either a stack or queue can be used. In a stack, it is a last-in, first-out order; a queue, first-in, first-out. For a BFS, it is best to use a queue so that it will produce the same result as the BFS implementation. For a DFS, we will maintain its elements in a stack. This is best because it makes the backtracking part of the DFS very easy, as all that has to be done is pop to go back. In order to make sure we know which node is the parent of the pushed node, we create an edge (node, parent[node]) so that we can preserve the integrity of the DFS tree. Both algorithms continued to run in O(m + n) time with these implementations.
This was a good section, as the implementation of the two algorithms using stacks and queues was good information. Also, the clarification on adjacency lists and matrices was worth the read. 7/10.
As one with a colorful imagination might suspect, this section covered bipartite graphs. These are, by definition, graphs “where the node set V can be partitioned into sets X and Y in such a way that every edge has one end in X and the other end in Y,” or nodes X colored red and Y colored blue. Graphs that contain cycles of odd length are automatically not bipartite.
To test to see if a graph is bipartite, we can start by selecting a node and coloring it red. Each of its neighbors will be colored blue, each of their neighbors red (that aren't already colored), and so on and so forth until each node is colored. If every edge has ends of opposite colors, then we have a bipartite graph and conversely if they aren't opposite colors, then we do not have a bipartite graph. This is particularly useful with BFS, as it follows a similar procedure. To implement it with BFS, we simply add in a line for describing the coloring of nodes in the BFS algorithm, where Color[v] = red if i + 1 is even, and Color[v] = blue if i + 1 is odd. Scanning each edge at the end of the algorithm tells us if the graph is bipartite or not. This runs in O(m + n), as this operation runs in constant time.
In summary, either there are no edges of a graph that join nodes on the same layer and it is bipartite, or there are edges joining nodes on the same layer and it has an odd cycle, so it is not bipartite.
This was a short section, so for that and its simple nature, I give it a 9/10.
This section started off by talking about how to represent directed graphs in algorithms, which is by using a version of the adjacency list. In this, each individual node will have a list for the nodes that it has edges and a list for the nodes of which it has edges from.
The interesting part here that I have some questions about was how a directed graph-version of BFS computes a path from s to t, just as the undirected version, but in this version t does not necessarily have a path back to s. How is that possible if we have lists that have edges both to and from? Unless it means that the “to” edges do not have a path back to s from t, which would then make sense. DFS also runs naturally on a directed graph and both algorithms keep their respective run times of O(m + n) and O(m).
If we want to find the paths to s instead of from, we can simply run B or DFS on Grev to get this, as it will run the same algorithm, just from t now.
To find if a directed graph is strongly connected, we can run an algorithm where we start at any node in the graph and run BFS from that node, then run BFS on the same node but in Grev. If even one of these algorithms fails, then it is not strongly connected. But if they are successful, then the nodes are mutually reachable and therefore the graph is strongly connected.
The last topic in this chapter, strong components, were a little fuzzy to me and I could use some extra clarification, which I will seek out. Overall, this chapter was a little more in depth than the previous few, though. 5/10 for readability and content.
This section started off with a bang: an undirected graph with no cycles, it is just a tree. Boom! (I think I'm having too much fun with DokuWiki syntax; now back to the fun stuff.) Despite this, it is possible for a directed graph to have a no cycles but still have a complex structure. This structure is called a DAG, or directed acyclic graph.
A simple way to think about DAGs is as if we have a set of classes the have prerequisites, as was the example in class. This has to be a DAG, because it just wouldn't make sense if a class that had a prerequisite of a lower-level class was itself a prerequisite of another lower-level class. To make sure this is true, there is what is called a topological ordering, where for every edge (vi, vj), i < j. Thus, every edge points in the forward or downhill direction and everything before a certain node must already be completed. This provides us the fact that if we have a graph G that has a topological ordering, then G is a DAG.
It actually goes to get proven that the converse is indeed true as well, where if G is a DAG, then G has a topological ordering. This is proven by finding a node v (in a DAG) that has no incoming edges. From this we can follow its path down and see that it has a topological ordering. The total run time of this algorithm can actually be reduced down to O(m + n), because it runs on n elements and selects each v and deletes it in constant time by maintaining which nodes have incoming nodes from other nodes still in our list and the number of edges from the nodes in our list.
This section was a more in-depth (first search; forgive the pun) than preceding sections, as was the last one. A question I still have that I have meant to pose over this chapter: why is O(m + n) have m listed first? I know it is equal either way, it just peaks my curiosity. I give this chapter a 7/10 for personal reasons.