Parsing XML using DOMDocument and DOMXpath
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to SimpleXML, because it’s more powerful and offers more functionality especially when it comes to creating XML documents.
To show some possibilities of those classes I will demonstrate how to use them to parse Excel XML data. Here’s a screenshot of a small demo Excel workbook (a fictive pricelist):
You can save any Excel file in Excel XML format using Save As… and selecting XML Spreadsheet 2003 (*.xml):
After saving it to an XML file, the XML will look like this:
I collapsed the first few elements, because I’m going to focus on parsing the actual Worksheet data in this example.
So lets say you’re interested in parsing the cell data of this Excel sheet. If you only want to perform simple queries on your XML document the DOMDocument methods will suffice. In this case where you only want to get all Cell elements you could use the getElementsByTagName method, i.e.:
$xmlFile = 'pricelist.xml'; $domDoc = new DOMDocument(); $domDoc->load($xmlFile); $cells = $domDoc->getElementsByTagName('Cell');
After this the variable $cells will contain a DOMNodeList object, which is just a list of DOMNode objects which you can traverse.
If you want to perform more complicated queries on your XML document you could use XPath. XPath is an XML query language with a simple syntax and relatively powerful possibilities.
In PHP’s DOM class hierarchy we have the class DOMXPath to perform XPath queries. For example to do the same as the getElementsByTagName code above, we can perform this XPath query: //ss:Data
The double slash in that query means that it doesn’t matter where in the document the element ss:Data is located. This will simply fetch every single ss:Data element in the document. If you want to get specific elements, starting at the document root, you should use single slash syntax like this: /ss:Workbook/ss:Worksheet/ss:Table/ss:Row/ss:Cell/ss:Data.
Also note that I’m using the ss: prefix to tell XPath to look for elements in that namespace, which is the microsoft spreadsheet namespace in this case.
Executing a query using DOMXPath would go like this:
$xmlFile = 'pricelist.xml'; $domDoc = new DOMDocument(); $domDoc->load($xmlFile); $xpath = new DOMXpath($domDoc); $dataElems = $xpath->query('//ss:Data'); foreach ($dataElems as $curElem) { echo $curElem->nodeValue . "\n"; }
The cool thing about XPath is that you can use filterqueries that are quite advanced. You can specify extra elementfilters in square brackets. For example to get all Row elements with exactly 3 Cell elements that contain a Data element in the above Excel XML, you could use a count filter like this:
//ss:Row[count(ss:Cell/ss:Data)=3]
$xmlFile = 'pricelist.xml'; $domDoc = new DOMDocument(); $domDoc->load($xmlFile); $xpath = new DOMXpath($domDoc); $rows = $xpath->query('//ss:Row[count(ss:Cell/ss:Data)=3]'); foreach ($rows as $curRow) { // iterate through Cell elems foreach ($curRow->childNodes as $curCell) { // ... } }
Basically these classes make parsing XML a walk in the park, that is if you know how to use them. I hope this brief introduction will help you to get you going.
12 Comments
Leave a Reply
You must be logged in to post a comment.
Tuesday, September 6th 2011 at 9:57 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 9:57 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:00 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:02 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:02 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:04 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:07 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:09 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:10 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Tuesday, September 6th 2011 at 10:11 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Friday, September 9th 2011 at 10:01 pm |
Parsing Excel XML in PHP – RedevelopmentRedevelopment…
In many webprojects I need to parse XML files of different kinds and shapes. PHP offers the DOMDocument class and the SimpleXML extension to read, parse, query and create XML documents. I personally prefer using the DOMDocument class hierarchy to Simpl…
Sunday, September 18th 2011 at 9:57 am |
You have really interesting blog, keep up posting such informative posts!