"Statistical Testing of Web Applications" (Tonella, Ricca)

Summary

This paper's goal is to semi-automatically recreate the realistic test suites for any given web application. They analyze the html code generated by the application and use employ humans to determine all the input values “which cover all relevant navigations.” They also use a reverse engineering tool that they previously developed to “automatical[ly] extract… [an] explicit-state model of the … web application.”

Based on the variables pre-specified by the user, which create equivalence classes, they supply the application with a particular list of input in a separate file. When the user has not identified equivalence classes of input or when the web application is in a different state, for which the user's specifications do not apply,, the authors use a semi-automatic process called page-merging to “simplify the explicit state model” and determine equivalence classes for the input. They have three (decreasingly automatic) criteria for comparing two dynamically-generated html pages:

a) pages that are literally identical are considered the same page

b) pages that have “identical structures but different texts, according to a comparison of the syntax trees of the pages, are considered the same page”

c) pages that “have similar structure, according to a similarity metric, such as the tree edit distance, computed on the syntax trees of the pages, are considered the same.”

Essentially: if the pages are not identical, they look at the syntax with varying degrees of leniency. They also give a summarization of the main phases in statistical testing:

1) Construction of a statistical testing model, based on available user data

2) Test case generation based on the statistics encoded in the testing model (modeled as a Markov chain, called the usage model; the transition probabilities )

3) Test case execution and analysis of execution putout for reliability estimation Their main contribution is in the first step, with the semi-automatic creation of equivalence classes. They also deal with state by exploiting the hidden variables that determine whether the state stays constant.

Limitations:

In general, this paper seems to have some excellent ideas–like state parameters and equivalence classes of parameter values–but it relies far too heavily on humans to make decisions that should be automated.

Why are these authors so intent on creating the most likely user sessions? Shouldn't everything be tested? They never really give a reason for wanting to order the test cases from most-likely to least-likely. They say “complete testing of all paths/interactions is unfeasible for any non-trivial system. Statistical testing aims at focusing on the potions of the system under test that are more frequently accessed, in order to ensure that reliability of the delivered product is high.”
The authors deal with state changes, but do not make it clear how they incorporate these hidden parameters into the models, or how they determine the hidden parameters. Making state a variable is an interesting idea, however, and I think it has a lot of potential, but I think it would be simpler to model it separately.
The authors have a usage model, which incorporates both the HTML requests and the parameter values. As the previous paper we read showed, the combination of parameter values and html sequence makes a model extremely complex. The question is, however, whether this complexity is necessary or whether it is better to separate into different models.
The paper also focuses on the rate of failure, but does not define what a failure is, and passes over the fact that “the test engineer…checks whether the pout is correct” for each page.
It is not feasible to have the equivalence classes created by a human, especially for very complex web applications. Additionally, it is easy for a human to make a mistake.