Thursday, April 5, 2007

Parsing Piles of Plain Text Pretty Simple

This is a very interesting research project -- on Monday it saved me hours of writing parsing code or days of copying and pasting by hand. It's a little tricky to learn the first time, but it designs your parsing expression for you!

http://groups.csail.mit.edu/uid/lapis/

Say you have a list of text names and phone numbers and comments:

Aaron Powers 555-123-1234 (he's a wierd one)
John Cleese 01-331-109-091 (he's even wierder)
Bill Clinton 081-101-1010 (no comment)

But they're in this plain text format, without commas or any good delimiters and you want to get them into a spreadsheet. You could try to remember RegExp. Or you could highlight the phone number in Lapis and it'll highlight all the phone numbers. If it gets it wrong, you just tell it which one was wrong and it'll highlight it, and it'll figure out traits of the text that match your needs.
It does much more complicated parsing than this example.

Thanks, Lapis. I just wish it was integrated into more editors.