"A Combinatorial Approach to Building Navigational Graphs for Dynamic Web Applications" (Wang, Lei, Kacker, Kuhn, Sampath, Lawrence)

The goal of this paper is to effectively model the navigation of a dynamic web application. They divide the problem into two:

1. The page explosion problem–the number of possible pages makes it hard to consider each page individually.

2. The request generation problem–there are many pages that can only be reached when the user has already visited certain other pages.

Their contributions are: 1. An “abstraction scheme” to address the page explosion issue–“pages that are likely to have the same navigation behavior are grouped together” as one node 2. Combine parameter values by using pairwise coverage.

Notably, they do not use user sessions to create the models.

This group splits a URL into two components–the “base” and the “query.” The query is composed of the parameter name-value pairs, and the base is the rest of it. An abstract URL, for them, is the base + parameter names. They consider pages with the same abstract URL equivalent, like the other papers we have been reading.

The Combinatorial Strategy… For a form with k parameters, each of which has a different number of values (d), one cannot try to test every possible permutation. In their testing, “given any two out of the k parameters, [they] ensure that every combination of the two parameters is covered in at least one submission.” An important feature that they note is that all parameter values get tested in 2^n tests, where n is the number of parameters. They use an extended version called In-Parameter-Order, which slowly extends the pairwise test to cover all the parameters

This paper also talks a lot about web crawling and compares the algorithms of web crawlers to the way the navigation model is created. They say, “One common approach used…is to build a pre-defined list of values for the parameters that are frequently encountered,” and go on to write that this was helpful for them as well.

Something I do not understand about the paper is why they talk about using the combinatorial approach and pairwise coverage, yet on page 215 discuss how to get parameter values–ie either from the user as they are needed, or from a pre-approved list. How do these mesh?

To create the navigation model, they essentially perform a non-recursive depth-first search, starting from the home-page of the web application. To maintain state, they undo any state changes when “backing up” in the tree. They also avoid loops by disregarding any duplicate pages (which, they concede, is a limitation, but necessary).

Limitations:

-Pairwise coverage… why does this seem like a good idea?? Parameters are dependent on one another. Also, all parameters do not have the same sets of possible values. Also, with IPO–isn't this extremely inefficient??

-(p 214) They assume that people always start at the home page of a web application, which is certainly not true.

-They “assume that the navigation behavior of a web page…does not depend on specific parameter values” (216). This is definitely not a safe assumption to make! For example, imagine an application that lets users log in, and suppose there is more than one type of user (i.e. 'customer' and 'administrator' or 'seller'). If they assume that parameter values do not make a difference, the pages associated with these types of users might not get tested.

-Another limitation, which they specifically say, is that “the number, the size…, and the specific technologies…of the subject applications prevent a generalization of [their] results.” So, isn't the entire point to generalize???