XQTC: A Static Type-Checker for XQuery

XQTC is the first static type-checker for XQuery based on backward type inference. It uses a novel technique based on a tree logic for representing intermediate portions of the inferred input type, and a mu-calculus satisfiability solver for checking containment between the inferred type and the input type. As a result, XQTC is notably the first type-checker capable of dealing with XPath backward axes.

Research Paper

The main principles of XQTC are explained in the following long paper. This is an extended version that contains all inference rules and auxiliary definitions, as well as more details about the approach:

Using the implementation

The whole system has been implemented in Java and is available as a runnable JAR package. The package can be executed with the following command line and parameters:

  java -jar xqtc.jar input.dtd inputroot sample.xq output.dtd outputroot

where:

The above call will in order: (1) parse the XQuery program, (2) parse the output type, (3) compute the inferred input type, (4) print the corresponding logical formula, (5) parse the input type, (6) perform type-checking, and (7) generate a counter-example (if appropriate).

Notice that the inferred formula can be used with the online solver in particular for checking whether some arbitrary type implies this formula (containment). The online solver provides a predicate type("some.dtd", "root") that automatically converts some type (e.g. here "some.dtd") into its logical representation, so that everything is brought in the same logic. This allows one to formulate and solve even more advanced static analysis problems since any other boolean combination can also be decided (not only containment).

Some Examples Explained

This section presents a set of interesting examples. Their goal is to illustrate some capabilities of the type-checker. Each example is made as simple as possible in order to be human readable and understandable while focusing on a particular feature (or a small set of features). For each example, the corresponding parameters (input type, output type, XQuery program) are given so that tests can be easily reproduced. The result of each test is indicated in the column "type-safety". The result is a link to the full trace generated by the system.

Given Input Type XQuery Expression Expected Output Type Type-Safety
(click to see the trace)
Explanation
<!ELEMENT r (b*,c?,d)>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
<!ELEMENT d EMPTY>
        <r>{for $y in child::* return <a>{$y}</a>}</r>  
        
<!ELEMENT r (a*)>
<!ELEMENT a (b|d)>
<!ELEMENT b EMPTY>
<!ELEMENT d EMPTY>
Unsafe Here, the type-checker detects that: (1) an element "c" may occur as a child of root element "r" in the input, and that (2) for this element the expression will output an "a" element with an "c" child which is forbidden by the content model of "a" elements expected as output.
<!ELEMENT r (b*,c?,d)>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
<!ELEMENT d EMPTY>
        <r>{for $y in child::* return <a>{$y}</a>}</r>  
        
<!ELEMENT r (a*)>
<!ELEMENT a (b|c|d)>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
<!ELEMENT d EMPTY>
Safe This is the same example as the previous one except that the output type is modified so that "c" is allowed inside an "a" element.
<!ELEMENT r (b,c,b*)>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>  
        <r>{for $x in child::* return $x} </r>  
        
<!ELEMENT r (b+,c,b*)>
<!ELEMENT b EMPTY>
<!ELEMENT c EMPTY>
Safe An example discussed in the paper by D. Colazzo and C. Sartiani (PPDP'11). We can observe in this case that backward type inference completely overcomes the precision problems met in approaches based on forward type inference. This is because backward type inference may simply copy portions of the output type into the inferred input type. This is a great advantage in terms of complexity since it avoids an exponential case analysis, and only uses a linear copy instead. In addition, for illustration purposes, we provide the trace made by the general backward inferrence rule in charge of the for loop (when the optimized rule for the specific case "for $x in child::* return $x" is removed) for this simple example (in order to show the equivalence, and also to illustrate how the general rule works).

More examples to be added soon...