Auto-Collation of defects from individual inspectors

Despite the success of the current algorithm within ASSIST, it is believed that even with the suggested improvements, the approach will fail to reach the required standard to allow successful industrial deployment. The current approach can be described as a statistical pattern matching schema. Hence, the project plans to introduce a new approach based upon lexical semantic techniques, specifically adapting Lexical Conceptual Structure representation for the auto-collation problem. Recently, Burstein et al. have shown that lexical semantic techniques can be used to classify (grade) freeform (student) responses to short essay-type questions. In a further paper, Burstein et al. have provided initial experiment results for their approach, using a large number of essays (539 essays, across 3 different domains) drawn from the educational system. The results from the automated system are compared with the grades from expert practitioners, and if the two grades [1] are equal, then the system is considered to have found a correct grade [2].

For two of the domains, the technique performed extremely well, successfully classifying more than 80% of the test cases. In the third domain, the technique faired relatively poorly, classifying only 48% of the test cases correctly, the authors of the paper believe that the principle reasons for this poor performance are:
(a)   Small conceptual differences between the classifications requiring ‘general real-world’ knowledge to resolve; and

(b)   Defects (or gaps) in the lexicon[3] and misunderstanding of the classifications (requires domain knowledge) by the authors of the paper.

Unfortunately, the paper provides insufficient detail to understand the relative impacts of the two problems.

Although both problems are real concerns, it is believed that both can be avoided in the inspection scenario; and hence that the other two trials can be considered a truer reflection of the strength of the technique. Further, it is believed that the auto-collation of defect lists is more amenable to lexical semantic techniques:

·         Defects possess additional information, which can be utilised in the comparison process; for example, position within the document and defect type.

·        The process can utilise tools to ‘tidy-up’ the syntactic nature of the individual defect lists. For example, spell-checkers and grammar checkers.

·        Descriptions of defects are always very short pieces of text.

·        Auto-collation only asks are two items semantically the same (binary decision), rather than asks for a measure of similarity (multi-way).

·        Most auto-collation ‘pairs’ will have no semantic similarity.

·        The retention of the occasional duplicate can be accommodated.

·        The process can have some interaction with the moderator, to resolve the occasional highly complex case.

In addition, to the automated processing, the problem can be simplified by utilising the fact that the system is working within a closed domain (defect descriptions). This can be achieved by introducing sets of standardised terms and expressions, while allowing the inspectors sufficient flexibility to remain productive. This will also help to minimise misinterpretations of defects by other inspectors. This is now an important consideration, as the elimination of the group meeting, removes the natural mechanism for clarifying ambiguous descriptions of defects.

Finally, if the basic matching proves successful, then the system could be extended to automatically classify defects by type. This moves the problem closer to the essay-grading experiments of Burstein et al. The principal difference now being that the classifications in the essay-grading experiment represent a continuous scale, whereas the classifications for auto-collation are a taxonomy. This classification process could be utilised in several ways; for example, as a crosscheck on the inspector’s classification. Points of difference also provide a measure of the inspector performance (vis-à-vis classification), an improvement mechanism (for classification), and a measure of the automated systems performance. The final output (a ‘corrected’ classification) provides a stable platform for process improvement activities, such as Pareto analysis in a statistical quality assurance program. Hence, this component can implement the idea of defect prevention being a secondary goal of an inspection process.

Although the auto-collation system is presented within an inspection context, it has a much wider application within software engineering, and can be applied to many defect-orientation situations, such as beta testing.

[1] Grades 1-6 are available for selection for both the automated system and the practitioner.

[2] The authors are unable to provide an estimate for the amount of misclassification from the expert practitioners. But, it should be noted that perfect performance for the automated system is undoubtedly less than 100%.

[3] If lexical defects continue to present a problem, example-based [47] or corpus-based [48] techniques could be utilised to tackle this deficiency.