Static typing provides you with a security net when it comes to typos. Alhough, it doesn’t give you any security against logical errors.
You know regular expressions, so you go to the job with your long range missiles ready. But wait a minute, you’ll probably solve the problem but is regular expressions really the right tool?
The pro for regular expressions is that you can use the same tool you always use for parsing jobs, but then again you doesn’t learn anything new out of this. You might fortify your position as regex wizard even more, but how about something completely different?
Now. Most webpages is written in HTML, and some even in XHTML, for HTML documents languages like Python has a built-in parser, after the model of the SAX-parser. (It’s probably the other way around, the SAX parser is built on the base of the HTML parser…) Most programming languages has good support for XML, so for XHTML documents, you can use the HTML-parser, a SAX-parser or even the XML-DOM parsers.
The benefit of doing it this way is that your parser will probably be more robust to minor changes in the webpage. You don’t reinvent the wheel (The best way I’ve found to parse HTML documents using regular expressions is to make a specialized SAX-like parser anyway). Your code will probably be readable in a year, and others might even be able to understand your code. And finally, you learn something new, which might give you a fresh view on a lot of problems.
Now back to the original issue, to make a parser for the parcel tracking of your postal service. Here’s an example parsing the shipment tracking page of posten, the norwegian postal service.