Setback and a new direction
For much of the semester, I have been working on separating the resource names into words and looking for patterns in the parts of speech of each of these words. There have been a few challenges. Not all of the resources are easy to separate into individual words automatically. We looked into AMAP, which would have done a good job of figuring how to correctly split the resources, but decided that this was probably unnecessary for our purposes (the percentage of resources split incorrectly in our applications was very small). Then I began to identify the words by their parts of speech with the goal of finding patterns in the parts of speech in the resources. This proved difficult, because, without context, a HUGE percentage of the words (almost all of them, in fact) were hard to determine a part of speech for (ex. Grade can be a verb or a noun). Even in context, it was often hard to determine what part of speech certain words were (ex. login and logout). So I did my best to figure out the correct part of speech in context, and started to look for patterns in the part of speech within the resources (ex. bookstore/verifyGrade's pattern would be noun verb noun). Unfortunately, this didn't seem to be very useful. There were some patterns, but there wasn't much we could really learn from any of them. Many of the resources are exactly the same except for one word (ex. maybe bookstore/verifyGrade and bookstore/verifyName are the same except for the last word), but they have the same part of speech pattern. When we saw patterns, it was usually a case like this with very similar resources except for one word. For example, the pattern would be a whole set of bookstore/verify«NOUN»s rather than a bunch of different «NOUN»«VERB»«NOUN» combos like bookstore/verifyGrade, house/cleanSink, earth/saveTrees, etc. which would all have the same pattern but very different resources. In these cases, we could learn almost as much information (if not more) by just looking at the resources or by JUST splitting them into words without identifying parts of speech.
So now I'll be going in a new direction just looking at what we can learn from the resource names and navigation from resource to resource.