This blog no longer exists.
You will be redirected to something cool.

Tuesday, February 21, 2012

Disguising Load Times w Information

So I'm writing a script that essentially allows people to enter a url of their writing profile on a particular site. The script will then give them a list of articles that don't meet their own quality specifications and also provides the writer with fun facts about their writing. Users can customize some pre-set specifications. For example, the word count specification, which defaults at 1000 words, can be set to equal 700 words. Then, the user can see a list of articles they've written that are under 700 words. In addition, the script will show users their average word count across their entire account.

I have a few problems with the script. One of the larger problems is that for users with large account (500+ articles), the load times are ridiculous. I would like to kind of disguise the load time by offering information about articles as the script is running. As is, nothing is able to show up until the script has completely run.

I would like the script to show on-page, the current average(changing with each article), the number of articles the script needs to comb (or at least a % complete), and a list of the article that don't meet a particular qualification as it fails.

Currently, my script takes ALL urls and opens them, puts the words into a hash of arrays and does word count calculations from there. Since much of the load time is in opening urls, pulling and cleaning text, my current design does not really allow for giving information during load time.

Here is the current script.

#http://www.codegurl.com/2012/02/disguising-load-times-w-information.html
require 'nokogiri'
require 'open-uri'
def create_base_url(username)
#takes hub username and turns it into url using 'lastest'
base_url = "http://#{username}.hubpages.com/hubs/latest".to_s
end
def get_index_pages(base_url, username)
index_pages = []
doc = Nokogiri::HTML(open(base_url))
range = doc.xpath('//span[@class="range"]').inner_text
#finds number of hubs in 'range' string
str_array = range.split(' ')
number_of_hubs = str_array[2]
#strips out unnessesary info from range string and returns number of hubs
number_of_hubs = number_of_hubs.to_i
number_of_index_pages = number_of_hubs / 10 + 1
#finds the number of index pages, 10 hubs per page, one extra page for remainder.
while number_of_index_pages != 0
number_of_index_pages = number_of_index_pages.to_s
index_pages << "http://#{username}.hubpages.com/hubs/latest?page=#{number_of_index_pages}"
number_of_index_pages = number_of_index_pages.to_i
number_of_index_pages = number_of_index_pages - 1
end
return index_pages
end
def get_hub_urls(index_list)
hubs = []
index_list.each do |something|
doc = Nokogiri::HTML(open(something))
doc.xpath('//div[@class="hub_pic"]/a').each do |e|
hubs << e['href']
end
end
return hubs
end
def pull_text(hub_urls)
hubs = Hash.new
results = []
hub_urls.each do |something|
doc = Nokogiri::HTML(open(something))
main_text = doc.xpath('//div[@class="module moduleText color0"]').inner_text
blue_text = doc.xpath('//div[@class="module moduleText color2"]').inner_text
grey_text = doc.xpath('//div[@class="module moduleText color1"]').inner_text
table_text = doc.xpath('//div[@class="module moduleTable color0"]').inner_text
title = doc.search('title').inner_text
all_text = main_text + blue_text + grey_text + table_text
hubs[title] = all_text
end
return hubs
end
def clean_text(hubtxt_hash)
hubtxt_hash.each |key|
key = key.delete(",").gsub(" ", ",")
key = key.delete("\n").split(",")
return hubtxt_hash
end
puts "Enter HubPages username:"
username = gets.chomp
base_url = create_base_url(username)
index_pages = get_index_pages(base_url, username)
hub_urls = get_hub_urls(index_pages)
text = pull_text(hub_urls)
view raw hub_lister hosted with ❤ by GitHub

Another problem is the way in which I count words. It's way off. I've got to work on that, but I've already got a solution in mind. I just need to implement it.

Edit: Unfortunately, at this time, HubPages is testing several different layout changes. Because of the way in which I wrote the code (picking out bits of CSS), I've decided to hold off on this project. When HubPages calms down with the design changes, I will continue with the project. See you then! (June 2010)

0 comments: