Chapter 3: Graphs

Section 1: Basic Definitions and Applications

Summary:

Graphs are useful in discrete math, and they show up EVERYWHERE if you're looking at things that way! A graph (a.k.a. Undirected Graph) consists of a collection V of nodes (a.k.a. vertexes) and a collection E of edges (each edge joins two nodes and is represented by a two-element subset of V). Edges indicate a symmetric relationship. Directed graphs have asymmetric relationships (directed edges represented by ordered pairs of nodes). Examples of graphs: transportation networks (ex. airlines - airports are nodes, nonstop flights are edges), communication networks (ex. computer networks modeled by a node for each machine and an edge for direct physical links), information networks, (ex. web pages as nodes and directed edges when there's a link from one to another), social networks (people are nodes, friendships are edges), dependency networks (ex. directed graphs to show courses that have prerequisites) … etc. A path in an undirected graph is a sequence of nodes such that each consecutive pair is joined by an edge. A cycle is a path that can come back to the same node. Same for directed graphs, but the path has to consider the directionality of the edges. If a graph is connected, that means that there's a path from every node to every other node. Directed graphs are strongly connected if there's a path from every node to and from every other node. Distance between two nodes is the minimum number of edges in the path from one node to the other. A graph is a tree if it's connected and doesn't contain a cycle. Deleting any edge will disconnect a tree. Rooting a tree is like grabbing one node and letting the rest of the tree hang down. Parent is the node that directly precedes a node in the path from the root to some other node. A node is its parent's child. Ancestor/descendant means that a node is somewhere in the path from a node to the root. A node is a leaf if it doesn't have any descendants. Rooted trees encode the notion of a hierarchy (ex. employees are nodes and report to the employee at their parent node). Rooting a tree can make it easy to answer some questions … how many edges are there in a tree? Well every node except the root has exactly one edge to its parent so the tree has one less edge than it has nodes. In fact, any two of these implies the third: G is connected, G does not contain a cycle, G has n-1 edges.

Insights and Questions:

Seems like pretty basic stuff. It's interesting to learn about the different examples of things that can be represented by graphs/trees (but I don't think the book did a very good job of listing the most interesting ones … we mentioned food chains and pixels on a screen in class and those seem way more interesting than the ones listed … maybe not as useful, though).

Readability:

I thought it was pretty easy to read, but it would have been more clear if they'd fully discussed undirected graphs and then gone into directed graphs instead of flip flopping between the two.

Section 2: Graph Connectivity and Graph Traversal

Summary:

Given two nodes, s and t, how do we efficiently figure out if there is a path between them (s-t connectivity a.k.a. maze-solving problem)?

1. Breadth-First Search: explore outward from s in all possible directions, adding nodes one “layer” at a time. First layer is all nodes directly connected to s. Second layer is any nodes connected directly to any of the first layer nodes (that haven't already been explored, etc. until no new nodes are encountered. We “flood” the graph starting at s with a “wave” that expands to visit all of the nodes that it can possibly reach. The layer a node is in is equal to its distance from s (i.e. BFS determines shortest paths from s to any node that it can reach). It also produces a tree rooted at s (“Breadth-first search tree”). If there's an edge between two nodes, then they must either be in neighboring layers or in the same layer (the book gives a proof of this and some more formal definitions and algorithms for BFS … these can also be found in our notes).

When you do searches like this, the set of nodes that end up in the tree are the “connected component” of G containing s. Actually the order you visit the nodes doesn't matter, just that you continue growing the set R of connected components until there are no more edges leading out of R.

2. Depth-First Search: since the order you explore the nodes/edges doesn't matter, here's another pattern for exploring them. You start at node s and try the first edge out of it, leading to v. Then you follow the first node out of v and continue until you get to a “dead end.” Then you backtrack until you get to a node with an edge to an unexplored node. This algorithm, depth-first search, goes as deep as possible into the graph and retreats when necessary. DFS visits nodes in a very different order than BFS (usually). It also produces a tree rooted at s, but it's narrower and deep instead of short and fat (does not contain shortest paths). Nontree elements can only connect ancestors to descendants (book proves this).

The set of all connected components - there's a connected component associated to each node (contains all of the nodes that are reachable from that node), but what's the relationship between these connected components? They're either identical or disjoint (book gives a proof & we went over it in class, so it's in the notes/slides).

Insights and Questions:

The book's way of drawing DFS graphs looked like how we started to draw topological graphs. I'm interested to see how/if we use DFS to obtain that graph.

Readability:

I didn't like the way they represented DFS trees (graphically). I understood it, and it looks a lot like what we started to discuss at the end of the last class with directed graphs and their topological order, but I think for a first-time look at DFS trees, it's easier to conceive of them other ways.

Section 3: Implementing Graph Traversal Using Queues and Stacks

Summary:

This section goes into implementation details of graphs and BFS and DFS. Graphs can be represented with an adjacency matrix (an N by N matrix where the cell contains a 1 if there's an edge between the row nummber-th and column number-th nodes) and an adjacency list (a list of length N where each element is a list of all of the nodes that the i-th node has an edge to). For discussing running times with this, we talk about them in terms of the number of nodes (n) and the number of edges (m). It's not always clear what better running times are because it depends on the relationship between m and n. Aim for running times is O(m+n), which is considered linear because that's the amount of time it takes to just read the input. For connected graphs, O(m+n) is the same as O(m) because m >= n-1. Adjacency matrix takes nxn space, is simple, takes O(n) time to find all edges, but that's not super efficient for graphs that are sparsely edge-ed. Adjacency lists only take O(m+n) space (the book goes into proofs of this). The book says the adjacency list is generally better, more intuitive, and that's what it'll use. Algorithms for graph-processing often have an inner step of processing a set of elements. Store those in a linked list, but what order should we consider them in? Two natural options are to keep them in a queue (see them in first-in-first-out order) or a stack (see them in first-in-last-out order). BFS is implemented using a queue. Each time a node is discovered, it's added to a queue, and the algorithm always processes the edges out of the node that's currently first in the queue (the book goes into more specific algorithms for BFS and proves that it runs in O(m+n) time. DFS is implemented using a stack, but is otherwise almost identical to BFS. There are also some differences in how nodes are “discovered” and “explored” (DFS is more “impulsive” about exploring) and there's (again) an algorithm and proofs that this is explores in the same way as described in the previous section and that it runs in O(m+n) time (around p. 93).

Insights and Questions:

I actually find it EASIER to think about BFS and DFS in terms of stacks and queues than abstractly.

Readability:

I thought it was very readable. It's awesome how consistent they are with the order of explaining things from chapter to chapter and section to section (i.e. BFS before DFS always).

Section 4: Testing Bipartiteness: An Application of Breadth-First Search

Summary:

A bipartite graph is one where the nodes can be partitioned into two sets X and Y such that every node has one end in X and one end in Y. Imagine X = red and Y = blue. What are natural examples of nonbipartite graphs where no partition is possible? A bipartite graph cannot contain an odd-length cycle. How do you recognize if a graph is bipartite? Well, to we use a BFS to color the graph. We start with one node (any node) and color it red (could be blue, it doesn't matter). Then we color all of the nodes connected to it blue (i.e. the first layer in BFS), then we color all of the nodes connected to those nodes red (layer 2), etc. until we finish the BFS. This doesn't take any more time than BFS so it's O(m+n). Then the book goes through a proof of how this determines whether or not the graph is bipartite → if there's an edge between nodes of the same layer it is not bipartite, but if there are no edges between nodes in the same layer then it is bipartite.

Insights and Questions:

We said that bipartite graphs apply to stable matching situations, but what else can they describe?

Readability:

Very readable. And super short :) yay! Kind of confusing that it's an “application of BFS” but it also seems like something “big” itself. I think the title almost sells it short.

Section 5: Connectivity in Directed Graphs

Summary:

This section goes into the similarities/differences between directed and undirected graphs (and focuses on directed). Directed graphs are represented much like undirected with an adjacency list, but instead of every node having a list of its neighbors, it has a list of nodes it has out edges to and a list of nodes it has in edges to. BFS and DFS work almost exactly the same as in undirected graphs, but we just discover edges that a node has a to-edge to. The algorithm still performs in O(m+n) time. But all of the paths discovered by BFS (or DFS) don't necessarily have paths back to the starting node in this case. You can also do a reverse BFS (or DFS) on the graph by discovering the nodes that a node has a from-edge to. Then the book goes into strong connectivity, which means that for every pair of nodes u and v, there's a path from u to v and from v to u (we call this property “mutually reachable”). If u and v are mutually reachable and v and w are mutually reachable, then u and w are mutually reachable (book has a proof, but it's pretty obvious). This lets us say that if we do a BFS on the in edges and a BFS on the out edges, the nodes that are found in both searches are the “strong component.” Also, just like for any two nodes s and t, their connected components are either identical or disjoint, the strong component of two nodes s and t in a directed graph either have disjoint or identical strong components (book gives a proof p.99).

Insights and Questions:

None, really (sorry).

Readability:

Makes a lot of sense. It was nice that this was JUST an explanation of directed graphs. Makes me wonder why they discussed them so much earlier, but I guess it was a good setup.

Section 6: Directed Acyclic Graphs and Topological Ordering

Summary:

Undirected graphs without cycles are really simple (connected components = trees), but it's possible for directed graphs to be “rich” (i.e. complicated) without having any (directed) cycles. A directed graph without cycles is a directed acyclic graph (DAG). DAGs can be used to encode precedence relations or dependencies. Ex. tasks that depend on each other where nodes are the tasks and the directed edge from node i to j (i, j) means that i must be done before j. If there's a cycle, ordering of those tasks doesn't make any sense (everything in the cycle has to be done before everything else). A valid ordering of the “tasks”/nodes in a DAG is called a topological ordering. The tasks are “lined up” and all of the edges “point forward.” Then the book proves (p. 101) that if G has a topological ordering, it's a DAG. Does the reverse work? And how do you find a DAG's topological ordering? Yes, the reverse does work. The first node in the ordering has no incoming edges (proof: any DAG has at least one node without incoming edges). Put that node first and delete it from the graph. Recursively compute a topological ordering from the leftover graph. Book goes into a more rigorous explanation.

Insights and Questions:

The difference that they pointed out between directed and undirected graphs (that directed graphs with the same total number of edges as a “corresponding” undirected graph can be much more complicated) is really interesting. I guess it makes sense, but I hadn't really thought about comparing them in that way before.

Readability:

As always, this section was very readable. I am glad I read it after we went over it in class, though, mostly because the word “topological” is kind of scary (haha …).