The interval scheduling problem involves a shared resource and a set of requests. Each request wants to use that resource from a certain point in time to a certain ending point in time. When a request is using a resource, no other request can be using the resource during any part of its time slot, meaning that there can be no more than one request scheduled on the resource at any given time.
The interval scheduling problem can be solved with a greedy algorithm. A greedy algorithm is an algorithm that, like an inductive proof, starts with a base case and takes growing steps in which it always makes the best choice at each step in the solution and each step makes progress toward the solution. The greedy algorithm to solve the interval scheduling problem is as follows (based on algorithm on Page 118):
Overall Runtime: O(n*log n)
Initially let R be the set of all requests, and let A be empty
While R is not yet empty
Choose a request i in R that has the smallest finishing time
O(log n)Add request i to A
O(1)Delete all requests from R that are not compatible with request i
EndWhile
Report the set A as the set of accepted requests
Another problem similar to the interval scheduling problem is the task scheduling problem. The task scheduling problem involves a set of tasks that have a set length of time but are flexible in when they are done. Each task has a deadline, however, to indicate when the task should be done by. Every task must be completed and no two tasks can be scheduled at the same time. The goal of the algorithm is to schedule the tasks such that the amount of time that every task is late is minimal (the “lateness” of the schedule).
The task scheduling problem can be solved with an intuitive greedy algorithm. The central concept of the algorithm is to minimize lateness by “being ahead” and scheduling tasks in order of earliest deadline to late deadline. The algorithm is as follows (based on algorithm on Page 124):
Overall Runtime: O(n*log n)
Order the jobs in order of their deadlines
O(n*log n)Assign labels to the ordered jobs such that d_i ≤ … ≤ d_n
O(n)Set the finishing time, f, to be the start time for all jobs
O(1)For each job j_i where 1≤i≤n:
O(n*1)Assign the job i to the time interval from s(i)=f to f(i)=f+t_i
Let f=f+t_i
O(1)End
Report the set of scheduled intervals [s(i), f(i)] for i=1 to i=n
The shortest path problem involves a weighted graph in which, given a start node s, the shortest possible path to any other node, v, in the graph can be known. As can be expected, this problem can be solved with a greedy algorithm. The greedy algorithm grows by adding one node at a time to a set S of nodes whose shortest path distances are all known. The set S starts with the start node s. The distance from it to itself is 0 and the distance to every other node starts out as ∞. In each growing step of the algorithm, a node that is outside of the set S but is directly connected to one of the nodes inside the set S is added to the set. When it is added to the set, the distances of all of the nodes are updated based on the distances calculated so far for all of the new node’s neighbors. This well-known algorithm is called Dijkstra’s Algorithm.
A spanning tree is a tree in which, given a set of nodes V and the non-negative cost of connecting each pair of connectable nodes, create a tree that connects all the nodes together with the least cost possible. A key concept in finding the minimum spanning tree of a graph is that the minimum spanning tree will contain no redundant edges and no cyclic paths. This problem can be solved with one of three algorithms: Kruskal’s Algorithm, Prim’s Algorithm, and the Reverse-Delete algorithm.
Kruskal’s algorithm from the previous section is special in that in order to use the algorithm to solve the problem in an efficient manner, it needs a special data structure. Unlike the general-purpose abstract data types like stack, queues, and lists, the union-find data structure is a specialist structure that is designed to be optimal for this particular algorithm. The union-find data structure has specific functionalities, including finding the name of a set containing a given element and a union operation on two named sets. Its operations deal with keeping track of edges in a way that optimizes Kruskal’s Algorithm.
Instead of previously discussed graph algorithms that provide more details about a graph, clustering is a way to generalize a graph to provide higher-level comprehension of the data represented. A k-cluster of a set of objects with the given “distances” between them, for example, is a graph of those objects with exactly k strongly-connected components or “clusters”. A k-cluster’s spacing is the minimum distance between any pair of points lying in different clusters. A clustering of maximum spacing is a clustering configuration in which the spacing of the graph is maximized.
At the lowest level, all data on a computer is simply represented as a list of the binary numbers 0 and 1. When data is converted from a human-readable format to a format that can be stored and recognized by a computer, each symbol in data is converted into a string of 1’s and 0’s. Instead of representing each symbol as a string of exactly n 1‘s and 0‘s, variable-length encoding schemes allow us to use shorter strings to optimize the amount of data needed to represent the original list of symbols. In order to take advantage of a variable-length encoding scheme without a loss of data, we use prefix codes. A prefix code is a set of encodings in which no code is a prefix of another code. Huffman codes allow us to take a symbol’s frequency of appearance in the medium into account and make shorter-length codes correspond to more frequent symbols in order to effectively compress the data.