This is an old revision of the document!

3.1 Basic Definitions and Applications

Summary

Undirected graphs indicate symmetric relationships while directed graphs depict asymmetric relationships. An undirected graph is connected if, if for every pair of node u and v, there is a path from u to v. Graphs can be used to depict a number of things such as transportation networks, connection networks, dependency networks, information networks, and social networks. A directed graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u. One of the fundamental operations in a graph is that of traversing a sequence of nodes connected by edges, also known as paths. An undirected graph is a tree if it is connected and it does not contain a cycle. Rooted trees are crucial objects in computer science because they encode the notion of a hierarchy. If a graph is connected and does not contain a cycle, then it has n-1 edges. If a graph is connected and has n-1 edges, then it does not contain a cycle. If a graph does not contain a cycle and has n-1 edges, then it is connected.

Notes

Graph G is a way of encoding pairwise relationships among a set of objects: the consists of a collection V of nodes and a collection E of edges. Edges in a (undirected) graph indicate a symmetric relationship between their ends. Asymmetric relationships are defined in a directed graph, in which the roles of the two given elements are not interchangeable; there is a tail node (edge leaves) and a head node (edge enters). A node in a graph is also called a vertex.

Examples of Graphs

Transportation networks
- A map of routes served by an airline carrier: graph, nodes are airports, there is an edge from u to v if there is a nonstop flight that departs from u and arrives at v
- Rail network:node graph, for each terminal, edge joining u and v if there is a section of railway track that goes between them without stopping at any intermediate terminal
Communication networks
- Computers connected via a communication network: graph, node for each computer and an edge joining u and v if there is a direct physical link connecting them
  - Alternatively: graph, a node is a set of all machines controlled by a single internet service provider, edge joining u and v if there is an peering relationship between them
- Wireless networks: graph, nodes are computing devices situated at locations in physical space, edge from u to v if v is close enough to u to receive a signal from it
Information networks
- World Wide Web: directed graph, nodes are web pages, edge from u to v if u has a hyperlink to v
Social networks
- Network: nodes are people, edge joins u and v if they are friends with one another
  - Edge can mean different things beyond friendship: undirected edge (u, v) could mean u and v heave had a romantic relationship or a financial relationship, directed edge (u, v) could mean that u seeks advice from v, or u lists v in their email-address book
Dependency networks
- Captures interdependencies among a collection of objects
- A list of college courses at a university: node for each course, edge from u to v if u is a prerequisite for v
- A list of functions or modules in a large software system: node for each function, edge from u to v if u invokes v by a function call
- Ecosystem: nodes are different species, edge from u to v if u consumes v

Paths and Connectivity

One of the fundamental operations in a graph is that of traversing a sequence of nodes connected by edges.
A path in an undirected graph G = (V, E) to be a sequence of P nodes v1, v2, …, v(k-1), vk with the property that each consecutive pair vi, v(i+1) is joined by an edge in G. P is often called a path from v1 to vk or a v1-vk path.
- This definition carries over to directed graphs with the following change: in a directed path or cycle, each pair of consecutive nodes has the property that (vi, v(i+1)) is an edge. The path must respect the directionality of the edges.
An undirected graph is connected if, if for every pair of node u and v, there is a path from u to v.
A directed graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u.
The distance between two nodes u and v is the minimum number of edges in a u-v path.

Trees

An undirected graph is a tree if it is connected and it does not contain a cycle. Trees are the simplest kind of connected graph: deleting any edge from a tree will disconnect it.
It is useful to root a tree at a particular node r. We orient each edge of tree T away from r; for each other node v, we declare the parent of v to be the node u that directly precedes v on its path from r; we declare w to be a child of v if v is the parent of w.
Rooted trees are fundamental objects in computer science because they encode the notion of a hierarchy.
Each node other than the root has a single edge leading “upward” to its parent and each edges leads upward from precisely one non-root node. Therefore, every n-node tree has exactly n-1 edges.
Let G be an undirected graph on n nodes. Any two of the following statements implies the third.
- G is connected.
- G does not contain a cycle.
- G has n-1 edges.

Questions

Do dependency networks have to be depicted as a directed graph? Can dependency networks be represented with undirected graphs?
A path in an undirected graph G = (V, E) to be a sequence of P nodes v1, v2, …, v(k-1), vk with the property that each consecutive pair vi, v(i+1) is joined by an edge in G. P is often called a path from v1 to vk or a v1-vk path. I am slightly confused by this definition. What does consecutive pair vi, v(i+1) is joined by an edge mean? Is the i a position in the graph or is the i a position in the path within the graph?

Additional Information

In class, we did not go into any particular depth on directed graphs. So, after reading the section multiple times, I feel like really understand directed graphs now and how they differ from undirected graphs. Asymmetric relationships are defined in a directed graph. The roles of the two given elements are not interchangeable; there is a tail node (edge leaves) and a head node (edge enters). (u,v) does not equal (v, u). In the instance of (u, v) u is the tail and v is the head. Moreover, a directed graph is considered strongly connected it there is an edge that connect u to v and an edge that connects v to u. Finally, paths have to respect the direction of the edges. In other words, if there is an edge from u to v, the path cannot move in the direction v to u.

This section was a 9 out of 10 in readability because it was mostly an introduction section, which included a bunch of definitions. Thus, there was not a whole lot for me to have to wrap my brain around. Moreover, the definitions and properties of different graphs were explained very explicitly and clearly, which made comprehension easy. Also, the different examples of graphs really helped me understand the essence of graphs.

3.2 Graph Connectivity and Traversal

Summary

Notes

Node-to-node connectivity
Two ways of identifying s-t connectivity: breadth-first search and depth-first search

Breadth-First Search

Bread-first search explores outward from s in all possible directions, adding nodes one “layer” at a time
We start at s and “flood” the graph with an expanding wave that grows to visit all nodes that it can reach. The layer that has a node represents the point in time at which the node is reached. Layers are defined as L0, L1, L2,…
Layer L0 has the starting node. Layer L1 has all of the nodes that are neighbors to s (distance 1 from s). Layer Lj is the set of all nodes at distance j from s. If a node does not appear in any of the layers, there is no path from s to the node. BFS computes the shortest path from s to a node, not just connectivity.
For each j >= 1, layer Lj prodeuced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer.
BFS produces a tree T rooted at node s on the set of nodes reachable from s. For each node v that is not x, the moment at which v is first discovered by the algorithm happens when some node u in layer Lj is being examines, and we find it has an edge to a previously unseen node. We then add edge (u, v) to tree T, and u becomes the parent to v. This tree is known as a breadth-first search tree.
Let T be a BFS tree and let x and y be nodes in to belonging to layers Li and Lj, and let (x, y) be an edge of graph G. Then, I and j differ by at most 1.
- x can be y’s parent, y can be x’s parent, or x and y can have a common ancestor, which added both of them.

Exploring a Connected Component

set R of nodes discovered by the BFS algorithm is those reachable from the starting node s; this set is the connected component of G containing s
We can build the component R by exploring G in any order, starting from s. If we find an edge (u, v) where u is in R but v is not, we can add v to R.

Exploring a Connected Component Algorithm

 R will consist of nodes to which s has a path
 Initially R = {s}
 While there is an edge (u, v) where u is in R and v is not
    Add v to R
 Endwhile

The set R produced at the end of the above algorithm is the connected component of G containing s.
- For any node v in R, there is a path from s to v.
For any node t in the component R, observe that it is easy to recover the actual path from s to t. We simply need to consider for each node v, the edge (u, v) that was considered in the iteration in which v was added to R. This is an s-t path.

Depth-First Search

In depth-first search, you’d start from s and try the first edge leading out of it, to a node v. Then, follow the first edge leading out of v, and continue until you hit a dead end. Then, backtrack until you get to a node with an unexplored neighbor, and go from there.

DFS Algorithm

 DFS(u):
    Mark u as “Explored” and add u to R
    For each edge (u, v) incident to u:
       If v is not marked “Explored” then:
          Recursively invoke DFS(u)
       Endif
    Endfor

To apply this to s-t connectivity, we simply declare all nodes initially to not explored and invoke DFS(s).
The similarities between BFS and DFS are because they both build the connected component containing s, and they achieve qualitatively similar levels of efficiency.
Although both BFS and DFS yield a rooted tree T on the component containing s, the trees will have very different structures. BFS trees are widespread and bushy with layers. DFS trees are spindly, narrow, and deep.
For a given recursive call DFS(u), all nodes that are marked “Explored” between the invocation and end of this recursive call are descendants of u in T.
Let T be a DFS tree, let x and y be nodes in T, and let (x, y) be an edge of G that is not an edge of T. Then one of x or y is an ancestor of the other.
- Since y was not marked “Explored” when DFS(x) was first invoked, it is a node that was discovered between the invocation and end of the recursive call DFS(x).

The Set of All Connected Components

For any two nodes s and t in a graph, their connected components are either identical or disjointed.
- If there is no path between s and t, then there cannot be a node v that is in the connected component of each.
A natural algorithm for producing all the connected components of a graph, by growing them one component at a time. We start with an arbitrary node s, and use BFS or DFS to generate its connected component.

Questions

Additional Information

On a scale of 1 to 10, this section was an 8 in terms of readability. The differences between BFS and DFS were explained very well, and the two given algorithms were laid out very explicitly. The only reason this section was not a 10 was because, as usual, it takes me a couple of reads to fully understand the theorems (and what they mean in normal person language) and their proofs.

3.3 Implementing Graph Traversal

Summary

Notes

Representing Graphs

Two basic ways to represent graphs: adjacency matrix and adjacency list
We want to implement the basic graph search algorithms in time O(n + m)
An adjacency matrix is an n x n matrix A where A[u, v] is equal to 1 if the graph contains the edge (u, v) and 0 otherwise. If the graph is undirected, the matrix A is symmetric. We can check if an edge exists in O(1) time.
Two disadvantages of an adjacency matrix: takes O(n^2) space and examining all edges incident to a given node takes O(n) time, so examining all edges would take O(n^2)
Adjacency lists work better for sparse graphs
In an adjacency list, there is a record for each node v, containing a list of the nodes to which v had edges. We have an array Adj where Adj[v] is a record containing a list of all nodes adjacent to node v. In an undirected graph, node u appears on the list for node v and node v appears on the list for node u if there is an edge (u, v) in the graph.
Adjacency list requires O(n + m) space.
Deg(n) = number of incident edges the node has
The adjacency matrix representation of a graph requires O(n^2) space, while the adjacency list representation requires only O(n + m) space.
- m ⇐ n^2, the bound O(n + m) is never worse than O(n^2) and is much better when the underlying graph is sparse
The adjacency list takes O(deg(n)) to chck if edge (u, v) is in the graph
The list representation corresponds to a physical notion of exploring the graph, in which you learn the neighbors of a node u once you arrive at u, and can read them off in constant time per neighbor.

Queues and Stacks

A queue is a set from which we extract elements in first-in, first-out order
A stack is a set from which we extract elements in last-in, first-out order
Both will be implemented using a doubly linked list
Insertions can be done in constant time in both

Implementing Breadth-First Search

Adjacency list data structure is ideal for implementing breadth-first search
We will maintain an array Discovered of length n and set Discovered[v] = true as soon as out search first sees v.

BFS Implementation

 BFS(s):
    Set Discovered[s] = true and Discovered[v] = false for all other v
    Initialize L[0] to consist of the single element s
    Set the layer counter i = 0
    Set the current BFS tree T = Ø
    While L[i] is empty:
       Initialize an empty list L[i+1] 
       For each node u in the set L[i]
          Consider each edge (u, v) incident to u
          If Discovered[v] = false:
             Set Discovered[v] = true
             Add edge (u, v) to the tree T
             Add v to the list L[i+1]
          Endif
       Endfor
       Increment the layer counter i by one
   Endwhile

The above implementation of the BFS algorithm runs in O(m + n) time, if the graph is given by the adjacency list representation.
- Each node occurs on at most one list, so the for loop runs at most O(n) times over all iterations of the while loop. The for loop processing a node u can take less than O(n) time if u has only a few neighbors. The total time spent considering edges over the whole algorithm is O(m) – dependent on the number of edges. So the total time is O(m + n).
Instead of using n separate lists L[i] for each layer Li, we can implement the algorithm using a single list L that we maintain as a queue. Algorithm will process nodes in the order they are first discovered. This implementation is still O(m + n).

Implementing Depth-First Search

DFS is recursive and maintains the nodes to be processed in a stack.
The distinction between the act of discovering a node v – the first time seen – and the act of exploring a node v – when all incident edges to v are scanned, resulting in the potential discovery of other nodes.
When DFS explores a node u, it scans the neighbors of u until it finds the first not-yet-explores node v and then immediately shifts its attention to exploring v. As we explore v, we add the neighbors of v to a list, in stack order, so these neighbors will be explored before we return to explore the other neighbors of w. We only come back to other nodes adjacent when there are no nodes left.

DFS Implementation

 DFS(s):
    Initialize S to be a stack with one element s
    While S is not empty:
       Take a node u from S
       If Explored[u] = false then
          Set Explored[u] true
          For each edge (u, v) incident to u
             Add v to the stack S
          Endfor
       Endif
    Endwhile

We will maintain an Explored array where Explored[v] is set to be true when we scan v’s incident edge. The adjacency list of a node being explored can be processed in any order. The algorithm pushes all adjacent nodes onto the stack before considering any of them.
The above algorithm implements DFS, in the sense that it visits the nodes in exactly the same order as the recursive DFS procedure I the previous section (except that each adjacency list is processed in reverse order).
To find the DFS tree, we need to have each node u on the stack S maintain the node that “caused” u to get added to the stack. We just need to use an array parent and set parent[v] = u when we add node v to the stack due to edge (u, v). It suffices to maintain one value parent[v] for each node v by simply overwriting the value parent[v] every time we add a new copy of v to the stack S, which is O(1).
To count the number of stack operations, it suffices to count the number of nodes added to S, as each node needs to be added once for every time it can be deleted from S.
Node v will be added to the stack S every time one of its deg(n) adjacent nodes is explored, so the total number of nodes added to stack S is at most 2m. So, the total runtime of the algorithm is O(n + m).
The above implementation of the DFS algorithm runs in O(m + n) time, if the graph is given by the adjacency list representation.

Finding a Set of All Connected Components

BFS and DFS spend work only on edges and nodes in the connected component containing the starting node. They never see any of the other, disjointed nodes or edges.
The algorithm only spends a constant amount of work on a given edge or node in the iteration when the connected component it belongs to is under consideration. The total runtime is O(n + m).

Questions

Additional Information

The difference between a node being “explored” and a node being “discovered” was pretty confusing to me, especially because in class no clear distinction between the two was made. When a node is discovered, it is the first time the node is seen. An edge found after a previous node was explored allowed us to see, and ultimately, discover this new node. Dissimilarly, when a node is explored, it is when al incident edges to that given node are scanned, and potentially yield new nodes for the connectivity algorithm to explore.

On a scale of 1 to 10, this section was a 7 in terms of readability. I was kind of confused on why the section was called 3.3 Implementing Graph Traversal Using Queues and Stacks when the BFS was not initially explained using a queue. The queue version of the implementation felt kind of like an after thought, especially because the queue versus the regular list implementation did not really make any improvements in terms of runtime. Maybe they did this to explicitly show the difference between BFS and DFS, but nonetheless it seemed like an afterthought. Besides that, the rest of the section was extremely well written and explained very clearly, especially the DFS implementation with stacks.

3.4 Bipartite Graphs: An Application of BFS

Summary

Notes

A bipartite graph is one where the node set V can be partitioned into sets X and Y in such a way that every edge has one end in X and the other end in Y. Set X will be colored red and set Y will be colored blue

The Problem

What are some natural examples of a non-bipartite graph, one where no such partition of V is possible?
If a graph G is bipartite, then it cannot contain an odd cycle.
- Consider a cycle C of odd length, with the nodes numbered 1, 2, 3, …, 2k, 2k +1. If we color node 1 red, node 2 must be blue, node 3 must be red, and so on. We color odd nodes red and even nodes blue. We must color 2k+1 red, but it has an edge to node 1, which is also red. So, the cycle of odd length cannot be bipartite.

Designing the Algorithm

Odd cycles are the only obstacle to bipartite-ness.
The coloring procedure described above is pretty much BFS: moving outward from s, coloring nodes as soon as we first encounter them.
The algorithm is just performing BFS, coloring s red (L0), all the odd layers blue and all of the even layers red. This can be implemented on top of a BFS and by adding an extra array Color over the nodes. When we add a node v to list L[i+1], we assign Color[v] = red if i+1 is even and Color[v] = blue if i+1 is odd. The total runtime for this coloring algorithm is O(m+n).

Analyzing the Algorithm

Let G be a connected graph, and let L1, L2,… be the layers produced by BFS starting at node s. Then exactly one of the following two statements must hold true.
- There is no edge of G joining two nodes of the same layer. In this case C is a bipartite graph in which the odes in even layers can be colored red, and the nodes in odd layers can be colored blue.
  - Proof: Every edge joins two nodes in adjacent layers. Our coloring procedure gives nodes in adjacent layers the opposite colors, so every edge has ends with opposite colors.
- There is an edge of G joining two nodes of the same layer. In this case, G contains an odd-length cycle, and so it cannot be bipartite.
  - Proof: If two nodes x and y in the same layer are joined by an edge, then the cycle through x, y and their lowest common ancestor z has odd length, so the graph cannot be bipartite.

Questions

Additional Information

The concept of an odd cycle was initially kind of confusing to me when it was explained in class. However, now to me it is clear that an odd cycle is a cycle that has an odd number of nodes in it and, therefore, has an odd number of edges. This would make a graph unable to be bipartite because starting at any point in the cycle, the first node you color and the last node that you color will have to be the same color. Therefore, the odd cycle within a graph keeps the graph from being defined as bipartite.

On a scale of 1 to 10, this section was a 9 in terms of readability. I was a little nervous about how easy it would be to comprehend this section because the section relied heavily on the concept of visualizing the binary coloring of theoretical nodes. However, the authors did a fantastic job of using the colors to clearly explain how to check if a graph is bipartite.

3.5 Connectivity in Directed Graphs

Summary

Notes

In directed graphs, the edge (u, v) has a direction from u to v, so the relationship between u and v is asymmetric.
In representing directed graphs, we will use a version of an adjacency list that we used for undirected graphs. Each node has two lists associated with it now: one list is made up of the nodes to which it has edges, and the second list has the nodes from which it has edges.

The Graph Search Algorithms

BFS and DFS are almost the same in directed graphs as they are in undirected graphs. Both still run in O(n + m) runtime.
Directed BFS computes the set of all nodes t with the property that s has a path to t. Such nodes may or may not have paths back to s.
Directed DFS is a recursive procedure that tries to explore as deeply as possible only following edges according to their inherent direction.
If we want the set of nodes with paths to s, we would define a new directed graph, G^rev, that we obtain from G, by reversing the direction of every edge.

Strong Connectivity

A directed graph is strongly connected if, for every two nodes u and v, there is a path from u to v and a path from v to u. Two nodes u and v are mutually reachable in a directed graph if there is path from u to v and a path from v to u.
If u and v are mutually reachable, and v and w are mutually reachable,, then u and w are mutually reachable.
There is a simple linear-time algorithm to test if a directed graph is strongly connected. Pick any node s and run BFS in G starting from s. Also, run BFS starting from s in G^rev. If one of these two searches fails to reach every node, then G is not strongly connected. If we find s has a path to every node and every node has a path to s, then s and v are reachable for every v.
We can define the strong component containing a node s ina directed graph to be set of all v such that s and v are mutually reachable. We run BFS starting from s both in G and G^rev; the set of nodes reached by both searches is the set of nodes with paths to and from s, the strong connected component. This is O(n+m).
For any two nodes s and t in a directed graph, their strong components are either identical or disjoint.

Questions

How exactly can you reverse the edges in graph G and copy them into graph G^rev? That process is a little confusing to me.

Additional Information

One thing that I was struggling to understand at first was how you can get set of nodes with paths to s instead of getting a set of nodes that s has paths to. This was difficult at first for me to grasp because with getting a set of nodes that s has a path to, it is just performing BFS and DFS with the additional condition of edge directions. However, after reading the section a second time, I was able to understand how you can get a set of nodes with paths to s. You merely need to reverse the direction of all of the edges in the original graph G and put the in a new graph, and traverse that one with BFS or DFS from s. I'm still a little confused on how exactly you can reverse the directions of the original graph. Hopefully, that will be explained to us in class.

On a scale of 1 to 10, this section was a 10 in terms of readability. This section was merely an extrapolation on what we learned in section 3.1 with directed graph connectivity and 3.2 and 3.3 with BFS and DFS. The authors did a fantastic job of explaining how directed graphs really aren't that different than undirected graphs; they just have the added condition of an edge having direction. This explanation went along well with their explanations of directed BFS, DFS, and strong connectivity.

3.6 Directed Acyclic Graphs and Topological Ordering

Summary

Notes

It is possible for a directed graph to have no (directed) cycles and still have a very rich structure. If a directed graph has no cycles, we call it a directed acyclic graph (DAG).

The Problem

DAGs can be used to represent precedence relations or dependencies.
For a directed graph G, a topological ordering of G is an ordering of its nodes as v1, v2,… so that for every edge (vi, vj), we have i<j; all edges point forward in the ordering. A topological ordering on graphs provides an order in which the tasks can safely be performed.
If G has a topological ordering then G is a DAG.
Does every DAG have a topological ordering? How do we find one efficiently?

Designing and Analyzing the Algorithm

Node v1 would need to have no incoming edges. In every DAG G, there is a node v with no incoming edges.
- If every node in G has an incoming edge, then the nodes form a cycle C.
If G is a DAG, then G has a topological ordering.

To compute topological ordering of G:

 Find a node v with no incoming edges and order it first
 Delete v from G
 Recursively compute a topological ordering of G-{v} 
    and append this order after v

Identifying a node v with no incoming edges, and deleting it from G, can be done in O(n). The algorithm runs for n iterations, so the total runtime O(n^2).
However, we can achieve a runtime of O(n + m) using the same high-level algorithm – iteratively deleting nodes with no incoming edges. We just need to be more efficient in finding these nodes.
- For each node w, the number of incoming edges that w has from active nodes, and
- The set S of all active nodes in graph G that have no incoming edges from other active nodes.
At initialization, all nodes are active, so we can initialize the two above statements with one pass through all of the nodes and edges.
Each iteration consists of selecting a node v from the set S and deleting it. After deleting v, we go through the nodes w to which v had an edge and subtract one from the number of active incoming edges that we are maintaining for w. If this makes the number of incoming active edges to w drop to zero, we add w to set S.
This way, we keep track of all nodes eligible for deleting at all times and expend constant work per edge.

Questions

Additional Information

At first, I was confused by the concept of implementing the topological order algorithm to make it have the tighter bound of O(m + n). But after reading that portion of the section again, it became a little more clear to me. What the authors are saying is that in order to get a tighter bound on the runtime, we need to be more efficient in finding the nodes that we are trying to delete. To do this we need to keep track of the number of incoming active edges a given node has and the set of all active nodes in graph G that has no incoming edges of active nodes. This would make the algorithm run more efficiently because the next node that you should delete to add to the topological ordering is the node that has no incoming active edges. This is because at that point, there are no process that have precedence and need to be performed before the node with no incoming edges. The active incoming edges symbolize the fact that there are processes that must be performed before that given node can be popped off.

On a scale of 1 to 10, this section is a 8 in terms of readability. The only reason that this section is not a 10 is because it took me a couple of reads to fully understand topological ordering and the runtime of the algorithm, which is used to compute it.

Table of Contents

3.1 Basic Definitions and Applications

Summary

Notes

Questions

Additional Information

3.2 Graph Connectivity and Traversal

Summary

Notes

Questions

Additional Information

3.3 Implementing Graph Traversal

Summary

Notes

Questions

Additional Information

3.4 Bipartite Graphs: An Application of BFS

Summary

Notes

Questions

Additional Information

3.5 Connectivity in Directed Graphs

Summary

Notes

Questions

Additional Information

3.6 Directed Acyclic Graphs and Topological Ordering

Summary

Notes

Questions

Additional Information