Mutant Operators for XML Schemas


The eXtensible Markup Language (XML) (W3C 2000) is rapidly becoming the premier method for exchanging information across the Internet. The Document Type Definition (DTD) language, which has traditionally been the most common method for describing the structure of XML instance documents, lacks enough expressive power to properly describe highly structured data. There are a number of schema languages that have been developed to extend the constructs and allow for additional validity constraints, beyond those provided by DTDs, to be placed XML document instances. The schema language that has gained the most popularity is the W3C XML Schema Language, developed by the World Wide Web Consortium (W3C). It has seen the most use in data-centric applications, such as: E-Commerce Web Services Metadata Interchange/Harvesting

XML Schema (W3C 2001) provides a much richer set of structures, types and constraints for describing data and therefore is fast becoming the preferred means of defining and validating highly structured XML instance documents.

As the use of XML schema grows, so does the need for tools to manipulate it. In fact, XML Schemas is an exciting new technology with lots of power. It's brand new and without a large experience base. With anything new it is often difficult to know how to get started, especially when there are no guidelines to show the way. One might be tempted to turn to the XML Schema specification for guidance. However, the XML Schema specification is of no help with this issue, nor with any issue involving best practices. Such things are outside its scope. Ultimately, the specifics of designing a schema are dependent on the task at hand.
Although there are a number of parsers and tools that use schemas to validate or analyze XML documents, tools that allow querying and advanced manipulation of schema documents themselves are still being built. IBM Scheme Quality Checker (SQC) is one of the most useful tools to improve the quality of XML Schema documents. XML Schema Quality Checker is a program which takes as input an XML Schema written in the W3C XML schema language and diagnoses improper uses of the schema language. Where the appropriate action to correct the schema is not obvious, the diagnostic message may include a suggestion about how to make the fix. Although IBM XML SQC has obvious value for creators of schemas who want to make sure they conform strictly to the W3C Recommendation, one of the developers' motivations in creating it was to understand whether the W3C XML Schema specifications, as they were being developed, were unambiguous and precise. On the other hand, IBM's new XML Schema Infoset Model provides a complete modeling of schemas themselves, including the concrete representations as well as the abstract relationships within a schema or set of schemas. This library easily queries the model of a schema for detailed information. You can also use it to update the schema to fix any problems found and write the schema back out. Our interest is in using the mutation technique to examine the adequacy of test data for XML Schema documents. Obviously, using an existing traditional mutation system as it is may not be sufficient to adequately test XML Schema documents because the existing mutation systems were developed without considering the particular features of XML Schema and thus they do not include most kinds of errors likely to appear in XML Schema documents. The effectiveness of mutation testing, like other fault-based approaches, heavily depends on the types of faults the mutation system is intended to represent, as they actually decide what to test and point out where analysis should be done. In our opinion, it is mainly the flaws related to XML Schema special features such as namespace, inheritance, element and attribute declarations, type facets, and so on that the current mutation systems fail to adequately handle. To address this concern we have developed a method called XML Schema (XSD) Mutation, which particularly targets plausible faults that are likely to occur due to XML Schema unique features in designing XML Schema documents.