elblogg

blah, blah, blag, blog

Posts Tagged ‘xml’

the right tool for the job

Wednesday, November 12th, 2008

So, you’re going to parse a webpage, to extract some information. For instance if you want to get the tracking information for your last online order, and you want to display the tracking information changes using growl, dbus notifications or xosd.
You know regular expressions, so you go to the job with your long range missiles ready. But wait a minute, you’ll probably solve the problem but is regular expressions really the right tool?
The pro for regular expressions is that you can use the same tool you always use for parsing jobs, but then again you doesn’t learn anything new out of this. You might fortify your position as regex wizard even more, but how about something completely different?

Now. Most webpages is written in HTML, and some even in XHTML, for HTML documents languages like Python has a built-in parser, after the model of the SAX-parser. (It’s probably the other way around, the SAX parser is built on the base of the HTML parser…) Most programming languages has good support for XML, so for XHTML documents, you can use the HTML-parser, a SAX-parser or even the XML-DOM parsers.

The benefit of doing it this way is that your parser will probably be more robust to minor changes in the webpage. You don’t reinvent the wheel (The best way I’ve found to parse HTML documents using regular expressions is to make a specialized SAX-like parser anyway). Your code will probably be readable in a year, and others might even be able to understand your code. And finally, you learn something new, which might give you a fresh view on a lot of problems.

Now back to the original issue, to make a parser for the parcel tracking of your postal service. Here’s an example parsing the shipment tracking page of posten, the norwegian postal service.

(more…)

My bookshelf

Sunday, January 30th, 2005

I took a picture of my new laptop the other day. It was then I saw how geeky my bookshelf was, containing these books:

  • Web Database Applications with PHP and MySQL
  • Learning XML
  • Learning XSLT
  • Python - how to program
  • MySQL reference manual
  • Programming Languages - concepts and constructs
  • HTML 4.01 Specification
  • Conputer Networks
  • A history of modern computing
  • A brief history of the future - The origins of the Internet
  • Learning Java
  • Data Structures and Algorithms in Java
  • Designing with web standards
  • On to Java
  • Java network programming and distributed computing
  • Windows ME annoyances
  • Java som første programmeringsspråk
  • Java software solutions
  • Learning WML and WMLScript
  • Human computer interaction