I’ve been a programmer since 1985 or so. Started off with good old COBOL then progressed through a long line of procedural language, from PL/1 up to VB.NET etc. I’ve never much played with dynamic languages before.
But I read about Ruby - Ruby on Rails - I see the code snippets, I read blogs…it’s a beautiful language and seems to bring out a new fresh approach to solving problems.
So I’m going to start playing with Ruby. I have quite a few projects on tap and Ruby seems like an excellent choice. Now, the easiest way to learn a new language is to simply write code in it, look stuff up and basically just learn as you go. A piece of one of the projects I have in mind is to utilize Google Hottrends in a mashup sort of way.
The first step is to read the Hottrends page and store the data. I have SQLLITE on my laptop so that’s where the data will be stored.
So, my first actual Ruby program. I’m sure its quite rough and a real Ruby guru would do it so much better but this is for learning and it does work. Simply scraping Hottrends using an XPATH expression and then storing a few pieces of info into a SQLLITE table (avoiding storing duplicate terms).
Here’s the code - beautifully concise and readable!
# hottrend-scraper
# 1/31/2008
# Extract anchor text from Google hottrends
# Store all anchor text strings into sqlite db
require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'sqlite3'
@url = "http://www.google.com/trends/hottrends"
@response = ''
begin
# open-uri RDoc: http://stdlib.rubyonrails.org/libdoc/open-uri/rdoc/index.html
open(@url, "User-Agent" => "Ruby/#{RUBY_VERSION}",
"From" => "",
"Referer" => "") { |f|
puts "Fetched document: #{f.base_uri}"
puts "\t Content Type: #{f.content_type}\n"
puts "\t Charset: #{f.charset}\n"
puts "\t Content-Encoding: #{f.content_encoding}\n"
puts "\t Last Modified: #{f.last_modified}\n\n"
# Save the response body
@response = f.read
}
# open database
db = SQLite3::Database.new( 'scrapedata.s3db' )
# HPricot RDoc: http://code.whytheluckystiff.net/hpricot/
doc = Hpricot(@response)
# Pull out hottrends anchor text - stuff into database but don't stuff in duplicates
(doc/"//td[@class='hotColumn']//table[@class='Z2_list']//td//a”).each do |anchor|
anchortext=anchor.inner_html
count = db.get_first_value(”Select count(*) from gtrends where scrapetext=’#{anchortext}’”)
if count.to_i==0
puts “INSERTING #{anchortext}”
db.execute(”INSERT INTO gtrends (scrapetext) VALUES(’#{anchortext}’)”)
end
end
db.close
rescue Exception => e
print e, “\n”
end