Table of Contents

Chapter 3: Graphs

3.1 Basic Definitions and Applications

A graph represents pairwise relationships between objects and consists of nodes and edges. In order to represent asymmetric relationships between ends of a graph, we would use a directed graph, which consists of a set of nodes and a set of directed edges. It is important to note that edges have a head and a tail, positions which are not interchangeable. We can also have undirected graphs which would refer to a graph that is not directed, while we will usually just call a “graph”. The following are some examples of graphs:

  • transportation networks
  • communication networks
  • information networks
  • social networks
  • dependency networks

When working with graphs, it is frequently necessary to be able to traverse a sequence of nodes and edges. Another important term here is a path, which is a sequence of nodes such that each consecutive pair of nodes is joined by an edge. There can also be a cycle in a graph, which means the path “cycles back” to its initial node. A graph can be connected, meaning every pair of nodes are connected by paths. To find the distance between two nodes, we consider the number of edges in the path from node 1 to node 2.

An undirected graph (or simply, graph) can be a tree as well. This means that a graph is connected and does not contain a cycle. There are two important facts about trees:

1. Every n-node tree has n - 1 edges.

2. Given G, an undirected graph with n nodes, any two of the following implies the third: i.) G is connected, ii.) G does not contain a cycle, iii.) G has n - 1 edges.

3.2 Graph Connectivity and Graph Traversal

There are two natural algorithms that can be used to solve the problem of determining s-t connectivity, or whether there exists a path between node s and node t in graph G. The first is a Breadth-First Search (BFS). In this algorithm, one starts at node s, looks at all possible edges extending from node s, and adds all nodes connected to these edges to a layer. This pattern is repeated for all nodes in the layer we have just created, and so on until no new nodes are discovered. The layers (L1,…, Lj) can be characterized by these facts:

1. layer L1 consists of all nodes which are neighbors to s

2. layer Lj + 1 (assuming we have defined L1,…, Lj) consists of all nodes not belonging to an earlier layer but containing an edge to a node in Lj.

The number of the layer is also the distance from s to every node in that layer. A breadth-first search tree is the tree produced by running the BFS algorithm.

The second algorithm used to determine paths between nodes is the Depth-First Search (DFS). This algorithm searches from s and follows to first edge to a connected node, then searches from this node's first edge to a connected node, and so on until a “dead end” is reached. At this point you must backtrack to a node which has an unexplored edge, and you would repeat the pattern from this node. Like BFS, DFS produces a rooted tree, called a depth-first search tree, though it will likely have a very different structure due to the nature of the algorithm.

An important fact is given regarding the relationship between connected components within a graph: For any two nodes s and t in a graph, their connected components are either identical or disjoint.

This section was a good review and an excellent explanation of BFS and DFS as concepts. I do wish there had been better visuals because the dotted lines do not adequately demonstrate the process. I would still give it a 9 for readability.

3.3 Implementing Graph Traversal Using Queues and Stacks

The two basic ways to represent graphs are adjacency matrices and adjacency lists. When assessing the running time of algorithms which use either of these representations, the run time will be expressed as a function of two parameters, n for nodes and m for edges. The goal is to implement graph search algorithms in O(m + n) running time, referred to as linear time. An adjacency matrix is the simplest data structure to use for graph implementation, but it has its disadvantages. For one, it requires Θ(n) time. In addition, checking for all edges takes Θ(n) time when using an adjacency matrix, which is not ideal. Because of this, we prefer to use adjacency lists for graph implementation. The space required for an adjacency list is O(m + n) and checking all edges requires only O(deg(u)) running time.

When processing a set of elements, which often occurs in graph search algorithms, it is critical to determine how to ideally represent these elements. Stacks and queues are frequently used in this role, because it is frequently necessary to maintain a particular order for traversing the elements. Queue elements are extracted in first-in, first-out (FIFO) order, and stack elements are extracted in last-in, first-out (LIFO) order. The BFS algorithm uses a queue-like structure, as can be seen in the following sketch of the algorithm:

  BFS(s):
      set Discovered[s] = true and Discovered[v] = false for all other v
      initialize L[0] to consist of s
      set layer counter i = 0
      set current BFS tree T = ø
      while L[i] not empty:
          initialize empty list L[i + 1]
          for each node u ∈ L[i]
              consider each edge (u, v) incident to u
              if discovered[v] = false:
                  set discovered[v] = true
                  add edge (u, v) to tree T
                  add v to list L[i + 1]
              endif
          endfor
          i += 1
      endwhile
      

The DFS algorithm, sketched below, uses a stack-inspired implementation:

  DFS(s):
      initialize S to be a stack with one element s
      while S is not empty:
          take a node u from S
          if Explored[u] = false:
              set Explored[u] = true
              for each edge (u, v) incident to u
                  add v to S
              endfor
          endif
      endwhile

Both the BFS and DFS algorithms shown above run in O(m + n) time when using an adjacency list representation.

Now that we have spent about one week discussing BFS and DFS I feel very comfortable with the concepts and the run time analyzation. This section packed a lot of information in which would have been challenging to understand if I had not already learned BFS and DFS in class. I would rate it a 7 as a result.

A graph is bipartite if the node set V can be divided into X and Y such that every edge has one end in X and one node in Y. It is important to note that a graph G cannot be bipartite if it contains an odd cycle. We can use the BFS algorithm, combined with an extra array 'Color', to determine if a graph is bipartite, assigning nodes colors which then can be tested to determine bipartiteness. We also know the following as a result of this algorithm:

Let G be a connected graph and L1, L2,… be the layers produced by BFS starting at node s. Then exactly one of the following holds:

  There is no edge of G joining two nodes of the same layer (G is bipartite)
  There is an edge ing G joining two nodes of the same layer (G is not bipartite)
  

Though this section was short, I found it helpful in the follow-up from our class discussion on bipartiteness. I now feel that I have a better grasp on the definition of bipartite (without the use of colors). I would rate this section a 9.

3.5 Connectivity in Directed Graphs

Recall: a directed graph's edges have directions. When representing directed graphs in algorithms, it is necessary to associate two lists with each node, one which keeps track of nodes to which it has edges, and the other maintains the nodes from which it has edges. Fortunately, BFS and DFS do not differ greatly whether looking at directed or undirected graphs. For example, in BFS layers are composed of nodes to which s has an edge, then look at the nodes to which these nodes have edges, and so on. Since the algorithm is very similar in terms of work, the running time is still O(m + n). It is important to node that performing BFS (or DFS) on a graph will only find the nodes to which s has edges, not necessarily those that have paths back to s. We can say that nodes u and v in a graph G are mutually reachable if there is a path from u to v and v to u in G. We also know that if u and v are mutually reachable and v and w are mutually reachable, then u and w are mutually reachable. The strong component of a graph is the portion which contains the set of all mutually reachable nodes. For any nodes u and v in a directed graph, we know that their strong components are either identical or disjoint.

This section was well-written and straightforward. I would rate it an 8 for understanding since I do feel that I can takeaway an understanding of mutual reachability and its relationship to connectivity.

3.6 Directed Acyclic Graphs and Topological Ordering

A directed acyclic graph (DAG) is a directed graph which contains no cycles. Two real-world examples of DAGs are precedence relations and dependencies. A topological ordering of a directed graph G is an ordering of its nodes v1, v2,…, vn such that each edge (vi, vj) has i < j. A topological ordering of G implies that G has no cycles. We also know that if graph G is a DAG, it has a topological ordering, as proved below through an algorithm which computes a topological ordering upon a graph:

  to compute a topological ordering of G:
  find node v with no incoming edges and order it first
  delete v from G
  recursively compute a topological ordering of G – {v} and append this order after v

The running time of this algorithm is O(n2) because it takes O(n) to identify the node and delete it and the algorithm runs n times.

This section was a little bit more convoluted than the others since we have not fully covered DAGs or topological orderings in class yet. I understand the basic concepts, but I think the algorithm to topologically order a graph is complicated. I would rate this section a 6 because I do not feel like the sections in the book which discuss a specific problem rather than a broad application are as clear.