This is an old revision of the document!

Chapter 3: Graphs

Section 1: Basic Definitions and Applications

A graph is a data structure that provides a way of “encoding pairwise relationships among a set of objects”. It contains a set of nodes, which can also be called vertices, and collection of edges, where each edge joins a pair of nodes. If you are creating a graph where edges indicate a “symmetric relationship” between the nodes they connect, you are creating an undirected graph. However, if you want to code asymmetric relationships, you want to use a directed graph, where each edge has a head and a tail node. Graphs are useful for representing a number of things, which include: Transportation networks, communication networks, information networks, social networks, and dependency networks. Transportation networks can be represented by nodes as the cities or particular locations, and the edges representing the paths of travel between the locations. Communication networks utilize the nodes as devices, and the edges as representing the link between them. Information networks represent the webpages as the nodes, and hyperlinks between two webpages as the edges. Social networks represent people as the nodes, and social connections between them as the edges. Dependency networks represent elements as the nodes, and dependencies/requisites as the edges. “One of the fundamental operations in a graph is traversing a sequence of nodes connected by edges”. A simple path is a path in which all vertices are distinct from each other, meaning no vertex is visited more than once. A cycle is a path that begins at a node and ends at the same node where the nodes visited in the path are only visited once. A connected graph is a graph that maintains a path between every pair of vertices. The distance between two nodes is the minimum number of edges that must be traversed to get from one node to another. A graph can be defined as a tree if it is connected and does not contain a cycle. Trees look very similar to the heaps discussed in the previous chapter where each node has a parent and descendants. Rooted trees are trees that have a designated root node. If g is an undirected graph with n nodes, then if it has two of these three characteristics it automatically has the third, and thus means that it is a tree: 1) G is connected, 2) G does not contain a cycle, 3) G has n-1 edges.

Section 2: Graph Connectivity and Graph Traversal

One of the main questions regarding graphs is the problem determining s-t connectivity. This problem looks at a two nodes, s and t, within a graph and attempts to determine whether or not there is a path between them. For smaller graphs, it is very easy to just look at them to attempt to find a path, however the goal is to obtain an algorithm that will solve the problem when dealing with very large graphs. There are two main algorithms that are used to solve this problem, which are breadth-first search (BFS) and depth-first search (DFS). Breadth-first search starts at the designated start node, and explores outwards in all directions, moving along layer by layer. Each layer of nodes is a specific distance away from the start node, and are connected to the start node through nodes contained in the previous layers. All nodes that are one edge away from the start node are included in layer 1. All nodes that are 1 layer away from nodes in layer 1 AND are not included in layer 1 are then included in layer 2. This process repeats until all nodes in the graph are contained within a layer. This search algorithm can be thought of as a “flood”, where we slowly move outwards in all directions passing over nodes. 'For each j >= 1, layer Lj produced by BFS consists of all nodes at distance exactly j from s. There is a path from s to t if and only if t appears in some layer'. One property of a BFS is that it produces a tree as its result, with s as the root node. 'Let T be a breadth-first search tree, let x and y be nodes in T belonging to layers Li and Lj respectively, and let (x, y) be an edge of G. Then i and j differ by at most 1.' 'The set R produced at the end of the BFS algorithm is precisely the connected component of G containing S.' The BFS determines the distance between a node and s by result of what layer the node is located in. The depth-first search takes the approach if exploring a path all the way to the very end, and then backtracking until it finds another unexplored path to follow and repeats the process using a recursive form. Once a node has been explored, it is marked to that the DFS knows where it has already traversed. DFS also produces a tree with s as the root, however the tree has a much different structure. The BFS tree is very wide but short, whereas the DFS tree is very skinny but tall. 'Let T be a depth-first search tree, let x and y be nodes in T, and let (x, y) be an edge of G that is not an edge of T. Then one of x or y is an ancestor of the other.' 'For any two nodes s and t in a graph, their connected components are either identical or disjoint.' That is, they can either be divided into two separate graphs or are connected in some way. This section was very easy to read and understand, given I have prior experience with these algorithms, as well as talking about them extensively in class. I'd give this section a 9/10.

Section 3: Implementing Graph Traversal Using Queues and Stacks

In this chapter, we see that BFS and DFS essentially only differ in that one uses a queue and the other uses a stack. There are two main ways of representing graphs: as adjacency matrices, and as adjacency lists. Graphs have two main inputs, which are nodes and edges. When discussing the run time of algorithms that deal with graphs, we denote run times using the edges and nodes. Adjacency matrices are nxn matrices where a 1 denotes an edge between a row/column, and a 0 means there is no edge. This representation allows is to check in constant time whether a particular edge exists in the graph. However this representation takes O(n^2) space, and takes O(n) to see find all edges incident to a particular node. Adjacency lists word better for 'sparse graphs', which have fewer than n^2 edges. In adjacency lists, there is an array of lists that contain all the edges for each node. Adjacency lists take up O(m+n) space, take O(degree(u)) to see if an edge exists, and takes O(m) to find all the edges incident to a node. 'A queue is a set from which we extract elements in first-in, first-out (FIFO) order (select elements in the same order in which they were added. A stack is a set from which we extract elements in last-in, first-out (LIFO) order (select elements in the opposite order in which they were added). In both structures, you select the first element in the list, however in queues you add to the end of the list and in stacks you add to the front of the list. The adjacency list structure is ideal for the BFS algorithm, as for each node visited you want to see what nodes it is connected to. 'The BFS algorithm runs in O(m+n) if the graph is given by the adjacency list representation.' For the BFS either a queue or a stack can be used. For the DFS, you use a stack to maintain the set of nodes, which maintains its recursive structure. 'The DFS algorithm runs in O(m+n) if the graph is represented by an adjacency matrix.' This section was quite easy to traverse and understand. I would give it an 8/10.

Section 4: Testing Bipartiteness: An Application of Breadth-First Search

'A bipartite graph is a graph where the node set can be partitioned into sets X and Y in such a way that ever edge has one end in X and the other end in Y.' A way to restate this is a graph that can have its nodes split into two colored groups where no node in one group has an edge to another node in that group. Every edge of the graph should have a node of one color on one end and a node of a different color on the other end. One characteristic of a bipartite graph is that it cannot contain an odd cycle. A BFS of an algorithm must have 1 of the following two characteristics: 'there is no edge of G joining two nodes of the same layer', or 'there is an edge of G joining two nodes in the same layer'. If the first condition is true, then the graph can be colored in such a way that the definition of bipartiteness is true, however if the second condition is true then the graph has an odd length cycle and therefore is not bipartite. I have prior experience with graphs and conditions for bipartite graphs, so this section was very easy to understand. Talking about in class before reading this section also made it very easy to follow and understand. I'd give it a 9/10.

Section 5: Connectivity in Directed Graphs

Previously, we have only been considering undirected graphs, where each edge has no particular orientation. In a directed graph, an edge does have direction and goes from one node u to a node v. 'The relationship between u and v is asymmetric, which changes the structure of the graph.' To represent an undirected graph, we use an adjacency list, where each node has a list of nodes to which it has edges, and a list of nodes from which it has edges. Thus the algorithm can go one step forward or one step backward from a particular node. BFS and DFS are the exact same for directed graphs as they were for undirected graphs. BFS has a running time of O(m+n) once again. However, for directed graphs it is possible to have a path from s to t and no path from t to s. You can also find a set of nodes with paths to s by creating a graph of reversed edges, and finding the paths from s to nodes in that graph. A directed graph is strongly connected if for every two nodes u and v, there is path from u to v and a path from v to u. This can also be restated as a graph that every pair of nodes is mutually reachable. 'If u and v are mutually reachable, and v and b are mutually reachable, then u and w are mutually reachable. The strong component of a graph is the set of all v such that s and v are mutually reachable. For any two nodes s and t in a directed graph, their strong components are either identical or disjoint. It is possible to compute the strong components for all nodes in a total time of O(m+n). This section was easy to understand. I would give it an 8/10.

Section 6: Directed Acyclic Graphs and Topological Ordering

'If an undirected graph has no cycles, then it has an extremely simple structure: each of its connected components is a tree.' A directed graph with no cycles is called a directed acyclic graph (DAG). A topological ordering of a graph is an ordering of nodes so that every edge points forward in the ordering from a node of lower numbering to a node of higher numbering. 'If a graph has a topological ordering, then G is a DAG.' 'In every DAG, there is a node with no incoming edges.' The total running time of this algorithm is O(n^2). However, it can be improved to be O(m+n) with a more efficient algorithm. This section was a little more difficult to understand. Once it is gone over in class, it should be more easy to understand. I'd give this section a 7/10.

Table of Contents