Entries Tagged 'Ruby' ↓

Playing with Ruby

I’ve been a programmer since 1985 or so. Started off with good old COBOL then progressed through a long line of procedural language, from PL/1 up to VB.NET etc. I’ve never much played with dynamic languages before.

But I read about Ruby - Ruby on Rails - I see the code snippets, I read blogs…it’s a beautiful language and seems to bring out a new fresh approach to solving problems.

So I’m going to start playing with Ruby. I have quite a few projects on tap and Ruby seems like an excellent choice. Now, the easiest way to learn a new language is to simply write code in it, look stuff up and basically just learn as you go. A piece of one of the projects I have in mind is to utilize Google Hottrends in a mashup sort of way.

The first step is to read the Hottrends page and store the data. I have SQLLITE on my laptop so that’s where the data will be stored.

So, my first actual Ruby program. I’m sure its quite rough and a real Ruby guru would do it so much better but this is for learning and it does work. Simply scraping Hottrends using an XPATH expression and then storing a few pieces of info into a SQLLITE table (avoiding storing duplicate terms).

Here’s the code - beautifully concise and readable!

# hottrend-scraper
# 1/31/2008
# Extract anchor text from Google hottrends
# Store all anchor text strings into sqlite db

require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'sqlite3'

@url = "http://www.google.com/trends/hottrends"
@response = ''

begin
  # open-uri RDoc: http://stdlib.rubyonrails.org/libdoc/open-uri/rdoc/index.html
  open(@url, "User-Agent" => "Ruby/#{RUBY_VERSION}",
    "From" => "",
    "Referer" => "") { |f|

    puts "Fetched document: #{f.base_uri}"
    puts "\t Content Type: #{f.content_type}\n"
    puts "\t Charset: #{f.charset}\n"
    puts "\t Content-Encoding: #{f.content_encoding}\n"
    puts "\t Last Modified: #{f.last_modified}\n\n"

    # Save the response body
    @response = f.read
  }

  # open database
  db = SQLite3::Database.new( 'scrapedata.s3db' )

  # HPricot RDoc: http://code.whytheluckystiff.net/hpricot/
  doc = Hpricot(@response)

  # Pull out hottrends anchor text - stuff into database but don't stuff in duplicates
  (doc/"//td[@class='hotColumn']//table[@class='Z2_list']//td//a”).each do |anchor|
     anchortext=anchor.inner_html
     count = db.get_first_value(”Select count(*) from gtrends where scrapetext=’#{anchortext}’”)
     if count.to_i==0
     	puts “INSERTING #{anchortext}”
     	db.execute(”INSERT INTO gtrends (scrapetext) VALUES(’#{anchortext}’)”)
     end
  end
 db.close
rescue Exception => e
  print e, “\n”
end