"An Exploration of Statistical Models for Automated Test Case Generation"

Web-based application are difficult to test, and many test suites do not have good coverage and/or quality. The authors' goal is to present a new and improved technique for generating test cases and to investigate the effectiveness of the test suites that are generated from the various models discussed in this paper. The authors contribute several modeling methods based on statistical machine learning techniques that are accurate and have high coverage. These models represent the dynamic nature of web applications.

A common method for testing is to use logged user data to model the dynamic behavior of a web application. This paper expands on this idea, but instead of using the logged data directly, the data is used in conjunction with machine learning techniques to automatically build models from the data. They consider questions about most/least likely user sessions, order of navigation through web pages, and others to generate user sessions based on the statistical distribution represented in the logged data. To determine the probability of each request, they used conditional independence assumptions, or Markov assumptions. This method requires knowing fewer prior requests, so the probabilities can be represented compactly. A bigram is a case when only the previous request is needed to get the probability of the next request and a trigram requires the two prior requests.

The researchers evaluated the effectiveness of 6 variations of the model using 5 test suites containing 200 random user sessions. They were used in an example bookstore web application. The conclusion of the study was interesting. The 1-gram model, which looks at little prior history, had less successful book purchases, but had the highest percentage of code coverage. It would seem that by randomly generating test cases, this model managed to get to more error cases than the models designed to look more like real user sessions. The authors discuss several limitations, specifically the fact that only one application was tested and the way in which the user session data is generated. It is very possible that the 1-gram model is not the most effective overall if it were tested for a more varied set of applications. I think it would be a good idea to group types of web applications and compare the models that work best for each category. For example, web applications where you can browse and buy products should be tested differently than search engines or social networking sites. The authors also mention that users were given a list of suggested activities to perform when generating “random” user sessions. They worry that this does not represent accurate user sessions. I definitely think that we need to consider that there may be other factors besides history that could affect the model. Even the best model only has about 55% code coverage, which could definitely be improved.

Do you think the 1-gram would be as effective for other types of web applications? And how about a combination of a 1-gram and a higher-gram model to generate a distribution of random and realistic user sessions? What other factor could be integrated into the model to make it better?

You could leave a comment if you were logged in.
webapptesting/journals/pobletts_notes/an_exploration_of_statistical_models_for_automated_test_case_generation.txt · Last modified: 2011/01/24 16:56 by poblettsa
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0