Summary
This paper's goal is to semi-automatically recreate the realistic test suites for any given web application. They analyze the html code generated by the application and use employ humans to determine all the input values “which cover all relevant navigations.” They also use a reverse engineering tool that they previously developed to “automatical[ly] extract… [an] explicit-state model of the … web application.”
Based on the variables pre-specified by the user, which create equivalence classes, they supply the application with a particular list of input in a separate file. When the user has not identified equivalence classes of input or when the web application is in a different state, for which the user's specifications do not apply,, the authors use a semi-automatic process called page-merging to “simplify the explicit state model” and determine equivalence classes for the input. They have three (decreasingly automatic) criteria for comparing two dynamically-generated html pages:
a) pages that are literally identical are considered the same page
b) pages that have “identical structures but different texts, according to a comparison of the syntax trees of the pages, are considered the same page”
c) pages that “have similar structure, according to a similarity metric, such as the tree edit distance, computed on the syntax trees of the pages, are considered the same.”
Essentially: if the pages are not identical, they look at the syntax with varying degrees of leniency. They also give a summarization of the main phases in statistical testing:
1) Construction of a statistical testing model, based on available user data
2) Test case generation based on the statistics encoded in the testing model (modeled as a Markov chain, called the usage model; the transition probabilities )
3) Test case execution and analysis of execution putout for reliability estimation Their main contribution is in the first step, with the semi-automatic creation of equivalence classes. They also deal with state by exploiting the hidden variables that determine whether the state stays constant.
Limitations:
In general, this paper seems to have some excellent ideas–like state parameters and equivalence classes of parameter values–but it relies far too heavily on humans to make decisions that should be automated.