Evolving A Simple Twitter to Blog Ruby Program Part 1
In a “quick” and dirty exercise I built a little ruby program to grab my twitter posts and colate them into a list for posting a “tweets of the week” type blog post. It was a lot more about figuring out how to do it than the actual output (I’d hope you just follow me on twitter than rely on me posting my tweets here). Tonight at the FV.rb meeting @dkubb helped me a lot in pointing out some glaring non-rubyisms and I thought that going through some of the changes might help others moving from “old school” programming (structured, functional, perl-y) to “new hotness” programming (object oriented, yeilds, and awesomeness).
I started out with this code in a gist. Nothing hugely bad, it pulls in either a URL or a file, parses the XML for the status updates, and for each one does some HTML replacement (@user, #hash, and URLs get auto-linked) and then spits out some HTML that can be copied and pasted into a blog entry.
Starting Off
The first step was figuring out how to parse the XML. A bit of googling found some possibilities. Hpricot, libxml-ruby, and Nokogiri. The first post I saw noted that libxml-ruby was the fastest, which makes sense as it’ll be pretty close to the bare metal C libraries, so I took a run with that. Not great success, the biggest challenge was figuring out how Ruby dealt with XML structure. There was a lot of mucking around in IRB.
ruby-1.8.7-p249 > require 'xml' => true ruby-1.8.7-p249 > parser = XML::Parser.file('twitter.xml') => #<LibXML::XML::Parser:0x101170140 @context=#<LibXML::XML::Parser::Context:0x101170168> ruby-1.8.7-p249 > doc = parser.parse # snip xml spew to STDOUT ruby-1.8.7-p249 > doc.class => LibXML::XML::Document # Hmm.... does find work? ruby-1.8.7-p249 > s = doc.find('/status') => #<LibXML::XML::XPath::Object:0x1018ff398> ruby-1.8.7-p249 > s.methods # snip list of methods, and searching for what to do ruby-1.8.7-p249 > s.each { |node| puts node.class } => nil ruby-1.8.7-p249 > s.each { |node| puts node.inspect } => nil # WTF? OK, so what now then? |
It was a bit frustrating, though probably mostly because I just didn’t grok how the XML was being represented internally, and thinking of it more like a Perl hash-of-hashes than whatever libxml-ruby was using. So I moved on. (Ironically while re-doing some of this for this article I went back and was running the commands figuring now I would get it, and failed miserably
Next I looked at Hpricot, but the syntax in the readme and examples scared me away.
Starting Progress on the First Iteration
Someone at work suggested that Nokogiri was the way to go, and realizing that parsing a few kb of XML probably wasn’t going to run me into any performance issues, I took a run at it with this. I soon found that having a static XML file would be the easiest for testing, so I saved twitter.xml in the same directory as I was running IRB out of and played some more.
Much better. Then to find out to get a list of the statuses:
ruby-1.8.7-p249 > require 'nokogiri' => true ruby-1.8.7-p249 > doc = Nokogiri::XML(File.new('twitter.xml')) # snip ruby-1.8.7-p249 > doc.class => Nokogiri::XML::Document ruby-1.8.7-p249 > doc.xpath('/status') => [] ruby-1.8.7-p249 > doc.xpath('//status') # snip lots more xml spew and more testing until... ruby-1.8.7-p249 > doc.xpath('//status').each { |node| puts node.xpath(".//text").first.content } # snip lovely output of each of the tweets in the xml file |
Ok, so now I could run “.each()” on this, having discovered that the xpath() function basically allowed me to get a list of XML nodes with that path, and then I could get a list of node data from that, remembering to use the “start from current node” syntax (using the ‘.’ to represent the current location in the tree).
The next steps were (relatively) easy. Looping through each status, get some information (time, status ID, content, etc), format that into HTML, find and implement a couple of “convert @user to an HTML link” bits of code I found online, and voila, first iteration was completed and working.
Now With Some Expert Advice
So after reading the What I wish I had been told a year ago post, I figured the next stage was to convert it to a class, make it more ruby-y, and give it some tests. Dan Kubb of DataMapper fame thankfully answered my question to help and moved me on to this current version with some helpful advice.
I’ll continue this later on this week with Part 2, in which I iterate into more awesomeness!
May 4th, 2010 at 11:08 pm
Very glad you found inspiration from my article to create such a series of tutorials.
I’ll curiously read along..
May 8th, 2010 at 10:27 am
[...] me some good ideas for my little parse your twitter feed post is how to read Google Buzz with [...]
June 4th, 2010 at 2:45 pm
[...] in point my simple Twitter to HTML program. When I first wrote it, it was pretty much written from the point of view of where I was [...]