Parsing XML in Bash
The other day I was working on a shell script for printing the current weather conditions. There are already plenty of weather scripts out there, and most of them parse html or xml somehow. Here is one that does it with xslt (.tar.gz); this one uses some nasty regular expressions.
However, I wanted to use the National Weather Service's current observations xml feeds. As you can see from the sample feed, the xml is pretty simple. I came up with the following function, which should work to parse data from any simple xml.
get_data () {
local tag=$1
local xml=$2
# Find tag in the xml, convert tabs to spaces, remove leading spaces, remove the tag.
grep $tag $xml | \
tr '\011' '\040' | \
sed -e 's/^[ ]*//' \
-e 's/^<.*>\([^<].*\)<.*>$/\1/'
}
Call the function with the tag you want the data from and the location of the xml file, and it will print the data contained in the tag. For example, to print the wind direction from the sample xml, you would call:
get_data \<wind_string\> KSHD.xml
Clearly, the function has some limitiations. For instance, there is no concept of parents and children, so you can't specify <station_id> --> <latitude>. However, it works fine for simple cases where there is only one <latitude> tag in the entire document. I think the xml parsing function is much more interesting than my weather script, but if you want to check the script out, you can download it here. The git repo also includes a script for getting the 10 day forecast for a US based location.
XPath
When dealing with more complicated xml it is easier to use XPath, a query language for selecting nodes from an xml document. Perl's XML::XPath module comes with a command line tool xpath which can be called from shell scripts like so:
$ xpath KSHD.xml "//temperature_string/text()" 2> /dev/null
55 F (13 C)
W3schools has a nice tutorial on XPath.