The reading wasn't as helpful as the past few weeks because I felt like I already knew a lot of it from class.
Recall from Chapter 1 that a graph G is simply a way of encoding pariwise relationships among a set of objects: it consists of a collection V of nodes and a collection E of edges, each of which “joins” two of the nodes. We thus represent an edge e ∈ E as a two-element subset of V: e = {u,v} for some u,v ∈ V, where we call u and v the ends of e. Edges indicated a symmetric relationsihp between their ends. WE use a directed graph when we want to show asymmetric relationships. A directed graph G' consists of a set of nodes V and a set of directed edges E'. Each e' ∈ E' is an ordered pair (u,v). We call u the tail of the edge and v the head. We also say that the edge leaves node u and enters node v. By default, graph will mean an undirected graph. Two notes: an edge should be written as a set of nodes {u,v}, but will frequently be written as (u,v); a node is often called a vertex.
We define a path in an undirected graph G = (V,E) to be a sequence P of nodes v_1, v_2, …, v_(k-1), v_k with the property that each consecutive pair v_i, v_(i+1) is joined by an edge in G. P is often called a path from v_1 to v_k, or a v_1 - v_k path. A path is called simple if all its vertices are distinct from one another. A cycle is a path v_1, v_2, …, v_(k-1), v_k, in which k > 2, the first k - 1 nodes are all distinct and v_1 = v_k (it “cycles” back to where it began).
All of these definitions carry over naturally to directed graphs with the following change: in a directed path or cycle, each pair of consecutive nodes has the property that (v_i, v_(i+1)) is an edge. In other words, the sequence of nodes in the path or cycle must respect the directionality of edges.
We say that an undirected graph is connected if, for every pair of nodes u and v, there is a path from u to v. We say a directed graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u. In addition to simply knowing about the existence of a path between some pair of nodes u and v, we may also want to know whether there is a short path. Thus we define the distance between two nodes u and v to be the minimum number of edges in a u-v path.
We say that an undirected graph is a tree if it is connected and does not contain a cycle. For thinking about the structure of a tree T, it is useful to root it at a particular node r. We “orient” each edge of T away from r; for each other node v, we declare the parent of v to be the node u that directly precedes v on its path from r; we decalre w to be a child of v if v is the parent of w. More generally, we say that w is a descendant of v if v lies on the path from the root to w; and we say that a node x is a leaf if it has no descendants.
I give this an 8. It was pretty straightforward to read.
Suppose we are given a graph G = (V,E) and two particular nodes s and t. We'd like to find an efficient algorithm that answers the question: Is there a path from s to t in G? → teh problem of determining s-t connectivity.
In this section, we describe two algorithms for this problem - BFS and DFS.
BFS is when we explore outward from s in all possible directions, adding nodes one “layer” at a time. Thus, we start with s and include all nodes that are joined by an edge to s (first layer). We then include all additional nodes that are joined by an edge to any node in the first layer (second layer). We continue until no new nodes are encountered. Layer L_1 consists of all nodes that are neighbors of s. Assuming that we have defined layers L_1,…,L_j, then layer L_(j+1) consists of all nodes that do not belong to an earlier layer and that have an edge to a node in layer L_j. (Layer j is the set of all nodes at distance exactly j from s.) A node fails to appear in any of the layers iff there is no path to it. Thus, BFS is also computing shortest paths to them. It also produces a tree T rooted at s on the set of nodes reachable from s.
The set of nodes discovered by the BFS algorithm is precisely those reachable from the starting node s. We will refer to this set R as the connected component of G containing s. Once we know that the c.c. containing s, we can simply check whether t belongs to it so as to answer the question of s-t connectivity.
Key property of algorithm: The set R produced at the end is precisely the c.c. of G containing s.
DFS explores G by going as deeply as possible and only retreating when necessary. DFS(u):
To apply this to s-t connectivity, we simply declare all nodes initially to be not explored and invoke DFS(s).
DFS brings about a different tree than BFS. (look at exs. in book)
For any two nodes s and t in a graph, their connected components are either identical or disjoint.
I give this an 8.
BFS and DFS differ essentially only in that one uses a queue and the other uses a stack.
Two basic ways to represent graphs: by an adjacency matrix and by an adjacency list representation. Linear time = O(m+n), n = |V|, m = |E|
The simplest way to represent a graph is by an adjacency matrix, which is an n x n matrix A where A[u,v] is equal to 1 if the graph contains the edge (u,v) and 0 otherwise. If the graph is undirected, the matrix A is symmetric. The adjacency matrix representation allows us to check in O(1) time if a given edge (u,v) is present in the graph. However, it has two disadvantages:
In the adjacency list representation, there is a record for each node v, containing a list of the nodes to which v has edges.
Compare am. and al. representations. An am. requires O(n^2) space; an al. requires O(m + n). In an am., we can check in O(1) time if a particular edge (u,v) is present in the graph. In the al., this can take time proportional to the degree O(n_v): we have to follow the pointers on u's al to see if edge v occurs on the list.
Two options to maintain a set of elements:
Both can be easily implemented via a doubly linked list. In a queue, a new element is added to the end while in a stack, the last element is placed in the first position on the list. (These are done in constant time.)
BFS(s):
Set Discovered[s] = true and Discovered[v] = false for all other v Initialize L[0] to consist of the single element s Set the layer counter i = 0 Set the current BFS tree T = ~0 While L[i] is not empty Initialize an empty list L[i+1] For each node u ∈ L[i] Consider each edge (u,v) incident to u If Discovered[v] = false then Set Discovered[v] = true Add edge (u,v) to the tree T Add v to the list L[i+1] Endif Endfor Increment the layer counter i by one Endwhile
The above implementation of the BFS algorithm runs in time O(m + n) if the graph is given by the adjacency list representation.
DFS(s):
Initialize S to be a stack with one element s While S is not empty Take a node u from S If Explored[u] = false then Set Explored[u] = true For each edge (u,v) incident to u Add v to the stack S Endfor Endif Endwhile
The above algorithm implements DFS in the sense that it visits the nodes in exactly the same order as the recursive DFS procedure in the previous section.
The above implementation of the DFS algorithm runs in time O(m+n) if the graph is given by the adjacency list representation.
(Is there a way to get to the left of the “==“Headings”==”?) I give this section a 9 on the topic, but only a 7 on the interesting/helpful. I feel like I got a good grasp on this from class/my notes.
The Problem How do we figure out if a graph is bipartite?
- If a graph G is bipartite, it cannot contain an odd cycle.
Designing the Algorithm In fact, there is a very simple procedure to test for bipartiteness, and its analysis can be used to show that odd cycles are the only obstacle. First, we assume the graph is G is connected, since otherwise we can first compute its connected components and analyze each of them separately. Next, we pick any node s ∈ V and color it red. It follows that all the neighbors of s must be colored blue. And then their neighbors must be red. Etc. At the end, we either have a valid coloring or we don't. If we don't, it's not bipartite. We can see that we can implement BFS to do this!
Analyzing the Algorithm Let G be a connected graph and let L1, L2 be the layers produced by BFS starting at node s. Then exactly one of the following two things must hold.
See book for proof (p96). This part was a 8 for interesting, but had minimal knowledge gain I feel like? It might have just been that I thought it was relatively simple when we talked briefly about it during class.
Remember: in directed graphs, the edge (u,v) has a direction: it goes from u to v. (relationship is asymmetric)
To represent a dg, we use aversion of the adjacency list representation. Now, instead of each node having a single list of neighbors, each node has two lists associated with it: one list consists of nodes to which it has edges and a second list consists of nodes from which it has edges.
The Graph Search Algorithm BFS starts at node s, defines first layer, second layer, etc. The nodes in layer j are precisely those for which the shortest path from s has exactly j edges. running time = O(m+n). DFS also runs in linear time.
Strong Connectivity (strongly connected = u → v ^ v → u) Two nodes u and v in a dg are mutually reachable if there is a path from u to v and also a path from v to u. If u and v are mutually reachable and v and w are m.r., then u and w are m.r.
There is a simple linear time algorithm to test if a directed graph is strongly connected. We pick any node s and run BFS starting from s. we then also run BFS starting from s in G^rev. If one of the two searches fail to reach every node, G is not strongly connected.
For any two nodes s and t in a dg, their strong components are either identical or disjoint.
If an undirected graph has no cycles, then each of its connected components is a tree. But it's possible for a dg to have no cycles and still “have a very rich structure”. If a dg has no cycles, we call it a DAG (directed acyclic graph). The Problem
3.18 If G has a topological ordering, then //G// is a DAG.
Does every DAG have a topological ordering? How do we find one efficiently?
D and A the Algorithm (Spoiler alert: The converse of 3.18 is true.) Which node do we put at the beginning of the topological ordering? Such a node would need to have no incoming edges. (In every DAG G, there is a node v with no incoming edges) Algorithm: To compute a topological ordering of G: Find a node v with no incoming edges and order it first Delete v from G Recursively compute a topological ordering of G-{v} and append this order after v
We can achieve a running time of O(m+n) by iteratively deleting nodes with no incoming edges. We can do this efficiently by declaring nodes “active” and maintaining: - for each node w, the number of incoming edges that w has from active nodes; - the set S of all active nodes in G that have no incoming edges from other active nodes. At the start, all nodes are active. This allows us to keep track of nodes that are eligible for deletion.
This section was an 8 to read. I must have been tired in class this day or else we didn't cover it very much because the acyclic stuff makes way more sense now.