Wednesday, May 12, 2004

I've been following a heated discussions on XML parsing methods that exist in .NET right now: DOM and XmlReader, whereas the first one is the slowest and the second is fast but a more complicated to code against. A must read on this topic is the DonXML's post Waking Up From a DOM Induced Coma who underlines the existing problems with XML parsing and mentions a few other alternatives. All of the problems about XML parsing are relevant to .NetCF as well, therefore they are multiplied due to processing power and memory constraints of the WinCE devices. Don't forget a complete lack of XPath implementation in the current version of CF.

As lucky as we are, Dare Obasanjo and Co., a PM from Microsoft, have just posted another library: XPathReader. And as you can guess, I've immediately pulled the code from the article and tried to compile it against .NetCF... The result of the test was that it runs with a few minor changes to the original code!!! And it is more exiting since we get a some XPath functionlity as well. Here you can see a snapshot of the test program running in the PPC emulator:

Don't you love when something useful comes with the source?

You can download the test and a XPathReaderCF projects from here.

The next step will be to test a comparison performance of this library vs XmlDocument in CF.

5/12/2004 5:30:34 PM (GMT Daylight Time, UTC+01:00)  #     | 
 Thursday, March 04, 2004

Ilya Tumanov (MS) from CF team has just posted on newsgroups really useful and insightful information on how you'd use XML with schema or without it:

 

"...

The first problem is a shortcoming of inference process. Inference is an
attempt to 'guess' schema of the XML file using set of predefined rules.
As any 'guess', this one could be wrong and might produce some unexpected
results.

If your schema is created via inference process, columns are added in order
they've been found (might change in upcoming releases).
In the example below 'Column2' in 'Table1' will be found first, so it will
have lower ordinal than 'Column1'.
Moreover, inference process might create some hidden columns to account for
nested tables parent-child relations, so ordinals are quite unpredictable
with inference.

Let's load this sample using inference:

<?xml version="1.0" encoding="utf-8" ?>
<DS>
 <Table1>
  <Column2>Col2 Data 0 </Column2>
  <Table2>
   <Column3>Col4 Data 0 </Column3>
  </Table2>
 </Table1>
 <Table1>
  <Column1>Col1 Data 1</Column1>
  <Column2>Col2 Data 1</Column2>
  <Table2>
   <Column3>Col4 Data 1</Column3>
  </Table2>
 </Table1>
</DS>


Here's the result:

--------------------------- DataSet ----------------------
DataSet: 'DS'
    ----------------------- Tables -----------------------
    DataTable: 'Table1
        ------------------- Columns ----------------------
        Column: 'Column2'  'String'
        Column: 'Table1_Id'  'Int32' Unique Autoincrement
        Column: 'Column1'  'String'
        ----------------- Child Tables -------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        ----------------- Parent Tables ------------------
        --------------------------------------------------
    DataTable: 'Table2
        ------------------- Columns ----------------------
        Column: 'Column3'  'String'
        Column: 'Table1_Id'  'Int32'
        ----------------- Child Tables -------------------
        ----------------- Parent Tables ------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        --------------------------------------------------

We can see 'Column2' ordinal is zero; 'Column1' ordinal is 2!
And we have an extra column, 'Table1_Id' with ordinal 1 added to act as a
primary/foreign keys in these nested tables.

Unexpected? Sure. This is a reason why inference should _never_ be used.
Instead, you should design the schema you need so there will be no
surprises.

Let's take a look at the saved schema:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
 <xs:element name="DS" msdata:IsDataSet="true">
  <xs:complexType>
   <xs:choice maxOccurs="unbounded">
    <xs:element name="Table1">
     <xs:complexType>
      <xs:sequence>
       <xs:element name="Column2" type="xs:string" minOccurs="0" />
       <xs:element name="Column1" type="xs:string" minOccurs="0" />
       <xs:element name="Table2" minOccurs="0" maxOccurs="unbounded">
        <xs:complexType>
         <xs:sequence>
          <xs:element name="Column3" type="xs:string" minOccurs="0" />
         </xs:sequence>
        </xs:complexType>
       </xs:element>
      </xs:sequence>
     </xs:complexType>
    </xs:element>
   </xs:choice>
  </xs:complexType>
 </xs:element>
</xs:schema>


Do you see 'Table1_Id' column? Nope. It will be created for you because you
have nested tables, but it will be done after 'Column2' and 'Column1' are
created.
Thus, ordinals will change:

--------------------------- DataSet ----------------------
DataSet: 'DS'
    ----------------------- Tables -----------------------
    DataTable: 'Table1
        ------------------- Columns ----------------------
        Column: 'Column2'  'String'
        Column: 'Column1'  'String'
        Column: 'Table1_Id'  'Int32' Unique Autoincrement
        ----------------- Child Tables -------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        ----------------- Parent Tables ------------------
        --------------------------------------------------
    DataTable: 'Table2
        ------------------- Columns ----------------------
        Column: 'Column3'  'String'
        Column: 'Table1_Id'  'Int32'
        ----------------- Child Tables -------------------
        ----------------- Parent Tables ------------------
        Relation 'Table1_Table2': 'Table2'->'Table1, nested
        --------------------------------------------------

Now 'Column1' ordinal is 1, and it used to be 2.

How to avoid these problems? It's easy.. Do not use inference, design
schemas yourself.

Now, to the real problem... It is a known bug introduced in SP2 with
performance optimizations. It is fixed in upcoming V2.

If you have table elements with all columns mapped as attributes (like this
one: <network linktype=0 />) and a table is not a root table, every second
row will be lost.
Possible workarounds:

1. Map at least one column with not null data as element. Primary/Foreign
key is the best candidate.
2. Do not use nested tables, use related tables. It might also save you
some space in XML file, improve loading performance.

Here's a sample schema/data with related tables:

<xs:schema id="DS" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="DS" msdata:IsDataSet="true">
   <xs:complexType>
    <xs:choice maxOccurs="unbounded">
     <xs:element name="Table1">
      <xs:complexType>
       <xs:sequence></xs:sequence>
       <xs:attribute name="Column1" type="xs:string" />
       <xs:attribute name="Column2" type="xs:string" />
       <xs:attribute name="PrimaryKey" type="xs:int" />
      </xs:complexType>
     </xs:element>
     <xs:element name="Table2">
      <xs:complexType>
       <xs:sequence></xs:sequence>
       <xs:attribute name="Colum3" type="xs:string" />
       <xs:attribute name="ForeignKey" type="xs:int" />
      </xs:complexType>
     </xs:element>
    </xs:choice>
   </xs:complexType>
   <xs:key name="DSKey1" msdata:PrimaryKey="true">
    <xs:selector xpath=".//Table1" />
    <xs:field xpath="@PrimaryKey" />
   </xs:key>
   <xs:keyref name="Table1Table2" refer="DSKey1">
    <xs:selector xpath=".//Table2" />
    <xs:field xpath="@ForeignKey" />
   </xs:keyref>
  </xs:element>
 </xs:schema>
 <Table1 Column1="Data11" Column2="Data12" PrimaryKey="1" />
 <Table1 Column1="Data21" Column2="Data22" PrimaryKey="2" />
 <Table2 Colum3="Data31" ForeignKey="1" />
 <Table2 Colum3="Data32" ForeignKey="2" />
</DS>

...

"

 

XML
3/4/2004 9:06:20 PM (GMT Standard Time, UTC+00:00)  #     |