Thursday, September 23, 2010

How to parse large XML files efficiently using LINQ

Sometimes we want to parse a very large xml file and populate our custom objects and we need to do it efficiently, because the file is so large, we dont want to load the whole file in memory, using XmlReader looks good but needs a lot of coding and we have to take care of every details when parsing. I was searching for a good solution when I saw this beautiful article: Streaming with LINQ to XML so I decided to make an exmaple and give it a try. so here is the example:

<?xml version="1.0" encoding="utf-8" ?>
<entities xmlns='blah-blah'>
   <vehicles>
      <vehicle>
          <name>toyota</name>
          <model>2008</model>
          <features>
             <feature>Auto</feature>
             <feature>ABS</feature>
          </features>
      </vehicle>
   </vehicles>
</entities>
 
public class VehicleEntity
{
   public string Name{ get; set;}
   public string Model{ get; set;}
   public List<string> Features{ get; set;}
}
  
public class VehicleParser
{
    static IEnumerable<XElement> SimpleStreamAxis(string inputUrl, string matchName)
    {
        using (XmlReader reader = XmlReader.Create(inputUrl))
        {
            reader.MoveToContent();
            while (reader.Read())
            {
                switch (reader.NodeType)
                {
                    case XmlNodeType.Element:
                        if (reader.Name == matchName)
                        {
                            XElement el = XElement.ReadFrom(reader) as XElement;
                            if (el != null)
                                yield return el;
                        }
                        break;
                }
            }
            reader.Close();
        }
    }
  
      
     private List<VehicleEntity> ProcessVehicles()
    {
        string inputUrl = Server.MapPath("Vehicle.xml");
        IEnumerable<VehicleEntity> query =
            from n in SimpleStreamAxis(inputUrl, "vehicle")
            select new VehicleEntity
            {
                Name = (n.ElementAnyNS("name") != null) ? n.ElementAnyNS("name").Value : "",
                Model= (n.ElementAnyNS("model") != null) ? n.ElementAnyNS("model").Value : "",
                Features= (from o in n.ElementAnyNS("features").ElementsAnyNS("feature") select o.Value).ToList(),
            };
  
        return query.ToList();
    }
}
 
//Extension methods to ignore namespace if the xml file has a namespace
public static class Extensions
{
    public static IEnumerable<XElement> ElementsAnyNS(this XElement source,
 string localName)
    {
        return source.Elements().Where(e => e.Name.LocalName.Equals(localName, StringComparison.OrdinalIgnoreCase));
    }
    public static XElement ElementAnyNS(this XElement source, string localName)
    {
        return source.Elements().Where(e => e.Name.LocalName.Equals(localName, StringComparison.OrdinalIgnoreCase)).FirstOrDefault();
    }
}


I tried this method for a very large xml file 250MB and it only took about 20 seconds to parse, that is very interesting, cheers LINQ!

2 comments:

lauren said...

I was searching for an efficient way to parse a very large xml file and populate custom objects.I read the article that you also mentioned but really an example is always an help which you provided neatly.Thanks for sharing.I also tried this method for a very large xml file 300MB in size and its really very fast.
digital certificate

lingmaaki said...

More about....XML Parsing

Ling

Post a Comment