Camille's Journal

5 most recent entries:

Update!

Since finishing with our paper reading, we've started to work on actually analysing data. I'm going to be looking into intersession dependencies. We intuit that sessions are dependent on one another. For example, a user registers on a website in one session, and in subsequent sessions logs in. The session with registration must come first in our test cases in order to have successful log ins in subsequent sessions.

Based on some issues we've already run into, I'm also starting to read a couple more papers for ideas about how we can approach the problem of splitting resource names into individual words. I'm also going to be looking for patterns in resource names based on what part of speech the words are. It should be really interesting!

I also completed an ethics course, since one of the sets of logs we'll be looking at is for a web app that has been used/is currently being used by students at our school.

2010/10/22 13:40 · 0 Comments

Combinatorial Approach Paper Comments

Statement of Problem/Goals

Overall Problem

Goals

Contribution to State of the Art

Technical Approach

Key Insights

Overall Approach/Strategy

Discussion/Critique

How did they evaluate their efforts?

Conclusions from evaluation results

What application/useful benefit do the researchers/you see for this work?

Limitations mentioned

Additional limitations

Questions

2010/10/07 13:38 · 0 Comments

Update - First Several Weeks

Week 0 - Read the Sant Paper. Got my journal page on this blog set up (sort of). The group met and discussed the paper, and our specific project. This paper wasn't very hard for me to get through, partly because I saw it two summers ago, but I think I got a lot more out of it this time! I think we had some really productive discussion about what directions this research could take us and what factors might be important for generating good test cases for Web Apps.

Weeks 1 and 2 - Read the Statistical Testing of Web Applications Paper , which was a lot harder to get through! We also started working on the course work (labs) from Sara Sprenkle's spring term Web App Course.

Weeks 3 and 4 - Read another article. Will post notes from that reading soon. Continued work on the labs, but got stuck on a servlets lab. Hope to get working on my specific project later this week, when the issues with these labs are resolved. Will meet again this Thursday.

2010/10/06 03:33 · 0 Comments

Statistical Testing of Web Applications Paper Comments

Statement of the Problem/Goals:

Overall problem

Websites used to be static, but are now dynamically generated. That makes them hard to test, but they're super important to test and get running correctly.

Goals

Use a dynamic analysis technique to make testing web applications easier by modeling the application (not necessarily a goal to automate the entire process, though). Be able to “properly model” “both dynamic and static pages.” (p. 104)

Contribution to state-of-the-art

The beginnings of a new way to test web applications. Ideas for how to approach this new problem - model the web application (we sort of do this …).

Technical Approach:

Key insights

Use access logs that are automatically recorded by the Web server to generate a realistic model of the web app that can be “interpreted as a Markov chain on which statistical testing can be conducted.” (p. 104)

Navigation is probably one of the most important factors for the model.

Statistical testing (maybe not a new insight) – the parts of the application that are used more often should be most thoroughly tested (because it's hard/impossible to test the whole thing).

Overall approach/strategy

Use access logs to create a Markov model of the web application based on visited urls (navigation). Use this to estimate the reliability of the Web app and prioritize the execution of test cases … and to help decide when to stop testing. (p.115).

Discussion/Critique:

How did they evaluate their efforts?

“The number of failures occurring in such test sessions can be used to estimate reliability, since the behavior of users is stochastically reproduced in the test cases.” (I don't actually totally get this, and I think this might be their strategy for evaluating the web app they're trying to test, not their own methods …

They looked at the number of failures they got when they actually tested the apps with these models. BUT is this actually a good way to evaluate? The values weren't in their generated model, so how'd they decide what to plug in and how do they know that's not what's determining this?

Conclusions from evaluation results

It's really hard to test web applications. When you make these Markov models you get an almost fully connected graph. The occurence of a failure depends on the followed path and inserted inputs … so even though they seem to think their model is a good jumping-off point (which it is), they acknowledge that the values need to be addressed.

What application/useful benefit do the researchers/you see for this work?

A way to generate models that make it easier to test web applications. A jumping-off point for making it even easier … they didn't seem to be thinking about the automation of the testing process, but this is definitely a part of what we do.

Limitations mentioned

Failure depends on navigation and values (see above). Used a complex web application for one of their case studies, so it's hard to tell how things are working in there.

Back, forward, go-to commands from web browser not considered in edge traversal counts.

Additional limitations

Too many manual steps (p. 113).

Questions!

p. 103 – Why is advertising the motivation for learning to make testing web applications easier?

p. 116 – “The probability that the user exercises a path not seen during testing is 0.22%” –> This sounds really really good, but is it? I'm not sure exactly what that means.

p. 116 – Why do they start off with such a complex web app? We're still using pretty simple ones … that seems weird to me.

p. 123 – “For Web applications with an internal state having dependencies on the specific path followed, this results in a limited efficacy of the test phase. It is thus preferable to limit as much as possible the dependencies of the internal state on the path followed. Ideally, only the immediate predecessor should affect the internal state of the application.” – If I am reading this correctly, this is hilarious. Do they really expect that people are going to make Web applications that are simpler and not as cool just so that they can be tested more easily?

2010/09/22 03:30 · 0 Comments

Sant et al. Paper Comments

Statement of the Problem/Goals:

Overall problem

Web applications are hard to test, and current automated testing methods don't work well for web apps.

Goals

“Our goal is to use web logs to produce models that can be used to generate user sessions.”

Use logged user data to create models of web applications that can be used to generate new user sessions which can be combined into “effective” test suites. Use statistical/probabilistic methods to generate user sessions that are more/less likely to come up in the real world.

Contribution to state-of-the-art

“The main contribution of this paper are the design of data and control models that statistically represent the dynamic behavior of users of a web application, the algorithms to automatically construct the models, an approach that utilizes the models for automated test case generation, and a preliminary study evaluating the test cases.”

Instead of using logged user data directly (which was the state-of-the-art), they propose generating test cases/test suites based on the logged user data. This hopefully will result in realistic test cases/test suites that will more effectively test web apps than previously proposed methods.

Technical Approach:

Key insights

Use logged user data to generate different test cases. Separate process into a log parsing step and a model building step. In generating new test cases, take history into consideration. I.e. given the URLs that have already been visited (or the data values that has already been seen), what URLs (or data values) are more/less likely? Separate data and control models for data values and URL sequences respectively. (This is barely mentioned by Sant et al., but is significant to us because we use this insight in our work).

Overall approach/strategy

Collect user data for sample web applications. Parse user data. Create models (Markov models) based on the parsed user data that may include information about the history and possible pairings in data values. (Some models look farther back in history than others. What's the “right” amount?). Generate test cases by taking a “random walk” through the model.

Discussion/Critique:

How did they evaluate their efforts?

They evaluated their efforts by looking at the test cases/test suites generated by their models. They investigated the effectiveness of these test suites based on the rate of coverage for different types of models and the accuracy of the generated tests.

They asked themselves (1) if their models could be used to generate valid user sessions and test suites, (2) how the ordering of page requests within a session affects test case coverage results, (3) how the quality of the data model effects test case coverage results, and (4) if the ordering of user sessions within a test suite matters for validity or coverage.

Conclusions from evaluation results

All of their test suite generation methods produce valid test cases (successful book purchases), but (maybe?) less valid test cases than the original logged user data (based on successful book purchases). All test suite generation methods have equal coverage after 40 user sessions, but the 1-gram model (less history) achieves good coverage more quickly. Error checking is exercised less quickly when increasing amounts of history are considered, but more valid sequences of requests are produced by models that consider more history. Randomly reordering test cases within a test suite doesn't affect the validity (again, measured by successful book purchases), but does result in quicker coverage. They hypothesize that this is, in part, due to the artificial way in which the application was used … everyone registered within a short period at the beginning of the applications use, so without random reordering the registration pages were covered multiple times before other pages began to be covered.

What application/useful benefit do the researchers/you see for this work?

They see this work as a step toward making web applications easier to test. They see their contributions as “the design of data and control models that statistically represent the dynamic behavior of users of a web application, the algorithms to automatically construct the models, and approach that utilizes the models for automated test case generation, and a preliminary study evaluating test cases. I think the most useful benefit is simply that this method of using logged user data to generate new test cases that more effectively test the web application seems to be promising. Also the ideas about considering history and splitting the test case generation process into a data model and a control model are of particular interest to us.

Limitations mentioned

The study was only performed on one application (do the results generalize?). The application was not of very good quality. Users of their sample application (a bookstore application) were provided with a list of suggested activities (do people use real applications differently?).

Additional limitations

Could be more realistic. Factors other than history probably affect navigation through a web application and parameter values. Last summer we talked about different aspects of history, parameter interactions, and types of users/specific users. Looking at other ways in which navigation and parameter values are determined should result in more realistic test cases and, potentially, more diverse test cases. Is the number of successful book purchases really a good measure of the validity of test cases? Is good coverage necessarily an indicator of a good test suite? It seems that the main (important) parts of the application are covered by valid, realistic test cases, but the error pages (like for login without registration) are more quickly covered by invalid/less realistic test cases. Does the type of web application affect how it's used and, therefore, how to best test it? (i.e. not just the problem that they only had one application, but also the limitation that it's only one type of limitation). User sessions are currently being run sequentially (i.e. userSession1 runs completely before userSession2 begins etc.). Users interact directly in some applications, but even when they are not interacting directly, something one user does can affect another user. How much does this interaction matter (here vs. in the real world)?

Question …

Besides the rate of successful book purchases, how could we determine whether test cases were valid or not? (For some reason this successful book purchases as a way to measure validity is really bothering me right now, but maybe it's no big deal.)

2010/09/04 16:14 · 0 Comments

<< Newer entries | Older entries >>

Table of Contents