3.6 Directed Acyclic Graphs and topological Ordering

If a directed graph has no cycles, it's called a Directed Acyclic Graph(DAG). DAGS can be used to implement dependencies or precedence relations and are widely used in Computer Science.For instance, we can have a set of tasks {1, 2, 3,…, n} to be performed, the condition being that for certain tasks i and j, i must be performed before j. Each task is represented by a node, and there is an edge from i to j if i must be performed before j. In this kind of relation, the resulting graph must a DAG since there must be a task that is performed first, otherwise we would have a cycle which would mean that no task could ever be done.A topological ordering is an order in which the tasks could be performed so that all dependencies are respected. In other words, for every edge (v_i , v_j), we have i < j and all edges point forward in the ordering. Thus, in a topological order, whenever we come to a task v_j, all the tasks preceding v_j have already been performed. If G has a topological ordering, then G is a DAG. The main problem we have when dealing with a DAG is to find the efficiency of the algorithm that finds its topological ordering.

Designing And Analyzing the Algorithm

The first node v₁ in a topological order doesn't have any incoming edge.Every DAG has a node with no incoming edges.
In addition, if G is a DAG, then G has a topological ordering.

Algorithm

To computer a topological ordering on G:
Find a node v with no incoming edges and order it first
Delete v from G
Recursively compute a topological ordering G-{v}
and append this order after v

Analysis:

–>Identifying a node v with no incoming edges takes O(n) time. Thus for n iterations, we get a O(n²) time.
–> If G contains Θ(n²) edges, then it's linear in the size of the input and the running time is not really bad
–> But if we have a number of edges m less than n², a running time of O(m+n) is a much more improvement over Θ(n²).

To get a O(m+n) time:

–> We declare a node to be “active” if it has not yet been deleted by the algorithm and we keep two things:
–>For each node w, the number of incoming edges that w has from active nodes
–> And a set S of all active nodes in G that have no incoming edges from other active nodes.

At the start, we initialize both of the above things with a single pass through the graph since all of the nodes are active. Thus S consists of all of the nodes in G.
–> With each iteration, we select a node v from the set S and delete it.
–>After deleting v from S, we go through all of the nodes w to which v had an edge, and subtract one from the number of active incoming edges that we are maintaining for w.

–> If this causes the number of active edges to w to drop to zero, then we add w to the set w.
–> Proceeding in this way, we keep track of nodes eligible for deletion at all times, while spending constant work per edge during the execution of the algorithm.
Thus we get a O(m+n) time.

Since I wrote this after doing Problem set 4, everything made more sense and the section became really interesting. I give it a 9/10.