XPath Tutorial

Purpose: Use XPath to address data in xml files.

XSLT

XPath is a language for finding information in an XML document. It is designed to address parts of an xml file, such as elements or attributes, and to get their assigned content.

Select Data

This very minimal template can select any data from the source file with the help of xpath.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
    <xsl:value-of select="xpath"/>
</xsl:template>
</xsl:stylesheet>

The important part, which is used to specify the content to be extracted from the xml file, is the following,

...
   <xsl:value-of select="xpath"/>
...

where xpath stands for an expression like /parent_element/child1/child2/../@attribute.

The template is applied with xsltproc

xsltproc template.xsl  input.xml

A short reference of the syntax can be found here.

EXERCISE

Extract the k-point grid, the title, and the name of the first speciesfile from an exciting input.xml file,.

Node Sets

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:template match="/">
        <xsl:for-each select="/input/structure/species">
            <xsl:value-of select="@speciesfile"/>
            <xsl:text>
</xsl:text>
        </xsl:for-each>
    </xsl:template>
</xsl:stylesheet>

EXERCISE

Transform the atomic positions of an input.xml to a simple text file. You may try to mimic your favorite chemical file format if you have one!

Python

Python has one of the nicest XML modules, called lxml. The following example parses an exciting input, selects the atomic positions, and after applying a transformation it writes the changed content back to a new file.

from lxml import etree as ET
 
fileobj=open("input.xml","r")
doc = ET.parse(fileobj)
#Get instance of root element:
root = doc.getroot()
 
#Get list of elements named "atom":
atoms = doc.xpath('//atom')
 
#Loop over all "atom" elements and set a new "coord" attribute:
for atom in atoms:
    x, y, z = atom.get('coord').split()
    print [float(x) , float(y) , float(z)]
    atom.set('coord',
             str(float(x) + 0.5) +" " +str(float(y) + 0.5) +" " +str(float(z) + 0.5) ) 
 
#Write changes to inputgen.xml:
fileobj2=open("inputgen.xml","w")
fileobj2.write(ET.tostring(root,
                              pretty_print=True,
                              xml_declaration=True,
                              encoding='UTF-8'))

EXERCISE:

Take the script and add code to change the scale attribute of the crystal element.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License