Thursday, March 04, 2004

Ilya Tumanov (MS) from CF team has just posted on newsgroups really useful and insightful information on how you'd use XML with schema or without it:

 

"...

The first problem is a shortcoming of inference process. Inference is an
attempt to 'guess' schema of the XML file using set of predefined rules.
As any 'guess', this one could be wrong and might produce some unexpected
results.

If your schema is created via inference process, columns are added in order
they've been found (might change in upcoming releases).
In the example below 'Column2' in 'Table1' will be found first, so it will
have lower ordinal than 'Column1'.
Moreover, inference process might create some hidden columns to account for
nested tables parent-child relations, so ordinals are quite unpredictable
with inference.

Let's load this sample using inference:

<?xml version="1.0" encoding="utf-8" ?>
<DS>
 <Table1>
  <Column2>Col2 Data 0 </Column2>
  <Table2>
   <Column3>Col4 Data 0 </Column3>
  </Table2>
 </Table1>
 <Table1>
  <Column1>Col1 Data 1</Column1>
  <Column2>Col2 Data 1</Column2>
  <Table2>
   <Column3>Col4 Data 1</Column3>
  </Table2>
 </Table1>
</DS>


Here's the result:

--------------------------- DataSet ----------------------
DataSet: 'DS'
    ----------------------- Tables -----------------------
    DataTable: 'Table1
        ------------------- Columns ----------------------
        Column: 'Column2'  'String'
        Column: 'Table1_Id'  'Int32' Unique Autoincrement
        Column: 'Column1'  'String'
        ----------------- Child Tables -------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        ----------------- Parent Tables ------------------
        --------------------------------------------------
    DataTable: 'Table2
        ------------------- Columns ----------------------
        Column: 'Column3'  'String'
        Column: 'Table1_Id'  'Int32'
        ----------------- Child Tables -------------------
        ----------------- Parent Tables ------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        --------------------------------------------------

We can see 'Column2' ordinal is zero; 'Column1' ordinal is 2!
And we have an extra column, 'Table1_Id' with ordinal 1 added to act as a
primary/foreign keys in these nested tables.

Unexpected? Sure. This is a reason why inference should _never_ be used.
Instead, you should design the schema you need so there will be no
surprises.

Let's take a look at the saved schema:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
 <xs:element name="DS" msdata:IsDataSet="true">
  <xs:complexType>
   <xs:choice maxOccurs="unbounded">
    <xs:element name="Table1">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="Column2" type="xs:string" minOccurs="0" />
       <xs:element name="Column1" type="xs:string" minOccurs="0" />
       <xs:element name="Table2" minOccurs="0" maxOccurs="unbounded">
        <xs:complexType>
         <xs:sequence>
          <xs:element name="Column3" type="xs:string" minOccurs="0" />
         </xs:sequence>
        </xs:complexType>
       </xs:element>
      </xs:sequence>
     </xs:complexType>
    </xs:element>
   </xs:choice>
  </xs:complexType>
 </xs:element>
</xs:schema>


Do you see 'Table1_Id' column? Nope. It will be created for you because you
have nested tables, but it will be done after 'Column2' and 'Column1' are
created.
Thus, ordinals will change:

--------------------------- DataSet ----------------------
DataSet: 'DS'
    ----------------------- Tables -----------------------
    DataTable: 'Table1
        ------------------- Columns ----------------------
        Column: 'Column2'  'String'
        Column: 'Column1'  'String'
        Column: 'Table1_Id'  'Int32' Unique Autoincrement
        ----------------- Child Tables -------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        ----------------- Parent Tables ------------------
        --------------------------------------------------
    DataTable: 'Table2
        ------------------- Columns ----------------------
        Column: 'Column3'  'String'
        Column: 'Table1_Id'  'Int32'
        ----------------- Child Tables -------------------
        ----------------- Parent Tables ------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        --------------------------------------------------

Now 'Column1' ordinal is 1, and it used to be 2.

How to avoid these problems? It's easy.. Do not use inference, design
schemas yourself.

Now, to the real problem... It is a known bug introduced in SP2 with
performance optimizations. It is fixed in upcoming V2.

If you have table elements with all columns mapped as attributes (like this
one: <network linktype=0 />) and a table is not a root table, every second
row will be lost.
Possible workarounds:

1. Map at least one column with not null data as element. Primary/Foreign
key is the best candidate.
2. Do not use nested tables, use related tables. It might also save you
some space in XML file, improve loading performance.

Here's a sample schema/data with related tables:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="DS" msdata:IsDataSet="true">
   <xs:complexType>
    <xs:choice maxOccurs="unbounded">
     <xs:element name="Table1">
      <xs:complexType>
       <xs:sequence></xs:sequence>
       <xs:attribute name="Column1" type="xs:string" />
       <xs:attribute name="Column2" type="xs:string" />
       <xs:attribute name="PrimaryKey" type="xs:int" />
      </xs:complexType>
     </xs:element>
     <xs:element name="Table2">
      <xs:complexType>
       <xs:sequence></xs:sequence>
       <xs:attribute name="Colum3" type="xs:string" />
       <xs:attribute name="ForeignKey" type="xs:int" />
      </xs:complexType>
     </xs:element>
    </xs:choice>
   </xs:complexType>
   <xs:key name="DSKey1" msdata:PrimaryKey="true">
    <xs:selector xpath=".//Table1" />
    <xs:field xpath="@PrimaryKey" />
   </xs:key>
   <xs:keyref name="Table1Table2" refer="DSKey1">
    <xs:selector xpath=".//Table2" />
    <xs:field xpath="@ForeignKey" />
   </xs:keyref>
  </xs:element>
 </xs:schema>
 <Table1 Column1="Data11" Column2="Data12" PrimaryKey="1" />
 <Table1 Column1="Data21" Column2="Data22" PrimaryKey="2" />
 <Table2 Colum3="Data31" ForeignKey="1" />
 <Table2 Colum3="Data32" ForeignKey="2" />
</DS>

...

"

 

XML
3/4/2004 9:06:20 PM (GMT Standard Time, UTC+00:00)  #     |