Monday, January 27, 2014

C# and XML

I really didn't expect this one to be much of a challenge.  I've dealt with parsing XML in PHP, Javascript, and VBScript already and they weren't too bad to do.  I suppose it doesn't help that C# has like five different methods for parsing XML, so when you do any sort of search you get results for all the different methods that you need to sort through.

This is the method that I picked out to use.  Mostly because it's the one I figured out the quickest, but partly because it also works in Mono.  The XDocument class looked like the easiest one to use, but it's not normally available outside of the Microsoft .Net framework.  The XmlDocument class is available everywhere, and isn't too hard to work with, so that's the one I'm supplying these notes for.

The class exists in the System.XML namespace, so that needs to be listed.  Next create a variable of type XmlDocument and use its Load() method to have it read in the XML file.  The LoadXml() method can be used to pass it the XML from a string variable.
1:  XmlDocument xdBookList;  
2:    
3:  xdBookList = new XmlDocument();  
4:  xdBookList.Load("Test.XML");  

The XML has been loaded, now it's time to read some stuff out of it.  Just so we have something to read in lets use this XML:
1:  <BookList>  
2:      <Fiction>  
3:          <Title Genre="Adventure">Treasure Island</Title>  
4:          <Title Genre="Horror">It</Title>  
5:      </Fiction>  
6:      <NonFiction>  
7:          <Title Genre="Art Instruction">Fun With A Pencil</Title>  
8:          <Title Genre="Programming">OpenGL Programming Guide</Title>  
9:      </NonFiction>  
10:  </BookList>  

With that loaded we can start parsing with the DocumentElement property.  This is the root tag of the XML, in my example it's the BookList tag.  DocumentElement in turn has a ChildNodes property which is a list of all the nodes within it, so a foreach loop can reveal all the child nodes.
1:  foreach (XmlNode xnTag in xdBookList.DocumentElement.ChildNodes) {  
2:      Console.WriteLine("Found tag: " + xnTag.Name);  
3:  }  

As you might guess the Name property of the XmlNode class is the name of the tag.  The output from this looks like so:

Each XmlNode class also has a ChildNodes property, so putting another foreach loop in this loop will get to all the nested children:
1:  foreach (XmlNode xnTag in xdBookList.DocumentElement.ChildNodes) {  
2:      Console.WriteLine("Found tag: " + xnTag.Name);  
3:    
4:      foreach (XmlNode xnNested in xnTag.ChildNodes) {  
5:          Console.WriteLine(" Found nested tag: " + xnNested.Name);  
6:      }  
7:  }  

Recursion along these lines will let you reach all of the tags in your XML.

If you know the structure of your XML then you can skip some of this exploration.  XmlNode objects have a method named SelectNodes() which builds a list of all tags which match an XPath expression.  With this you can quickly drill down to the tags you are interested in.  It will return XmlNodeList object, which is just a list of XmlNode classes, containing all the matches.

With this route the name of the tag isn't terribly useful, we already know it since it was likely part of the XPath.  If we use the InnerText property instead the contents of the tag are made available.
1:  foreach (XmlNode xnFiction in axnNodes) {  
2:      Console.WriteLine("Found fiction title: " + xnFiction.InnerText);  
3:  }  

There's also some attributes set on those tags, we can use the Attributes property to reach them.  The Attributes property is a list of all the attributes for the given tag, each entry in the list is a XmlAttribute class which has Name and Innertext values just like the XmlNodes class.
1:  foreach (XmlNode xnFiction in axnNodes) {  
2:      Console.WriteLine("Found fiction title: " + xnFiction.InnerText);  
3:    
4:      foreach (XmlAttribute xaAttrib in xnFiction.Attributes) {  
5:          Console.WriteLine(" Includes attribute: " + xaAttrib.Name + " with value: " + xaAttrib.InnerText);  
6:      }  
7:  }  

If we know the attribute we're looking for you can use it's name as an index to the Attributes list, letting you get that value directly.
1:  foreach (XmlNode xnFiction in axnNodes) {  
2:      Console.WriteLine("Found fiction title: " + xnFiction.InnerText);  
3:    
4:      foreach (XmlAttribute xaAttrib in xnFiction.Attributes) {  
5:          Console.WriteLine(" Includes attribute: " + xaAttrib.Name + " with value: " + xaAttrib.InnerText);  
6:      }  
7:    
8:      Console.WriteLine(" Genre is: " + xnFiction.Attributes["Genre"].InnerText);  
9:  }  

If for some reason you want to see all of this code together, you can do that here.

No comments:

Post a Comment