This weekend saw forward progress on several of projects. The progress wasn't less than I hoped for but I was limited by back and wrist pain.
Userscripts.org got a bunch of spam removed, and I integrated uservoice.com's feedback service. Hopefully this will help with prioritization of what the users want most. I debated using Get Satisfaction but they do much more than just idea collection and voting - solving more problems than I have and resulting in a consuming UI (in my opinion).
I brainstormed with Ian about syncing for Taboo and UI cleanup. Then Jake Dahn pinged me that he had knew how to fix a bug with middle clicking not working (seems that gecko doesn't like middle click on a span only an anchor tag). I have been trying persuade him to contribute to open source for a while (since I was at Flock), so seeing his first push to github is awesome.
Book Burro, unfortunately didn't get as much time as I was hoping. A major rework of the extension is done, waiting on Librarius to be published. Librarius is a catalog of libraries that can be used by Book Burro or a new version of Library Lookup. I'm specifically aiming to fix the problem we hit last time, "the accuracy of those lists has decayed over time, and I'm not able to maintain them."
Taboo is missing a large feature: syncing. We have wanted to implement it since we started coding it. Lately I've thought a lot about the requirements for implementation.
I think the most important requirement is service reliability shouldn't affect non-syncing user interactions with the client. If the syncing service is unavailable, the user's experience should still be pleasant (although the user should probably be informed when they have changes that haven't been pushed to the service yet).
Another requirement is the cost of running the service should be low (with back of envelope costs I get a thousand active users per dollar per month). Addons.mozilla.org reports that we are about to break 20k Active Daily Users (note this doesn't mean they are active users of Taboo), and so I need to support the fraction of users that would use this feature without much cost. I need to make sure whatever algorithm we end up using isn't too expensive, but I have less worry here.
The burning question on my mind is what approach to syncing should I use? Smarts on the server or client? Does anyone know any papers which talk about approaches to syncing, including conflict resolution models (and their tradeoffs)?
I'd like to get an initial version of the syncing service done by the end of May, but it depends on finding a good solution that will scale and provide a good user experience.
I've managed to rewrite my blog again. This time to appengine using web.py.
I started with the demo appAaron put together the nite appengine was released, and with some pointers from Kragen I was quickly 80% done with the new site. The next 15% involved figuring out how to get reCaptcha and HTML sanitization/cleanup working. Once that was done a few DNS changes and the new site is live.
def submit(recaptcha_challenge_field,
recaptcha_response_field,
private_key,
remoteip):
"""
Submits a reCAPTCHA request for verification. Returns RecaptchaResponse
for the request
recaptcha_challenge_field -- The value of recaptcha_challenge_field from the form
recaptcha_response_field -- The value of recaptcha_response_field from the form
private_key -- your reCAPTCHA private key
remoteip -- the user's ip address
"""
if not (recaptcha_response_field and recaptcha_challenge_field and
len (recaptcha_response_field) and len (recaptcha_challenge_field)):
return RecaptchaResponse(is_valid = False, error_code = 'incorrect-captcha-sol')
params = {
'privatekey': private_key,
'remoteip' : remoteip,
'challenge': recaptcha_challenge_field,
'response' : recaptcha_response_field,
}
result = urlfetch.fetch(
url = "http://%s/verify" % VERIFY_SERVER,
payload = urlencode(params),
method = urlfetch.POST,
headers = {
"Content-type": "application/x-www-form-urlencoded",
"User-agent": "reCAPTCHA Python/AppEngine"
}
)
if result.status_code == 200:
return_values = result.content.splitlines()
return_code = return_values[0]
if (return_code == "true"):
return RecaptchaResponse(is_valid=True)
else:
return RecaptchaResponse(is_valid=False, error_code = return_values[1])
Grabbing remote IP from web.py via web.ctx['ip']) now allows a simple to query to the reCAPTCHA service to check if you are human.
For HTML sanitization, I used Beautiful Soup. My sanitization code is run when a comment is added (as sanitizing comments when viewing an article caused appengine CPU utilization warnings.) The code is a modification of a django snippet
First I only allow absolute URLs that begin with http[s]:// instead of removing javascript: from the urls (since there are other ways to build bad urls)
absolute_url_matcher = re.compile("^https?://")
def url(URI):
if absolute_url_matcher.match(URI):
return URI
...
tag.attrs = [(attr, val) for attr, val in tag.attrs
if attr in valid_attrs and url(val)]
As comments containing code snippets isn't uncommon, I tweaked how PRE tags are handled:
BeautifulSoup.QUOTE_TAGS['pre'] = None # don't parse inside of PRE tags
...
if tag.name == 'pre':
# convert < into <
tag.replaceWith('
%s
' % tag.contents[0].replace('<', '<'))
Finally I add a BR tag whenever I see two returns to create "paragraphs."
Unfortunately I need to make a few more tweaks as some of the old comments on my blog aren't formated nicely. I always prefer to store both the user's original input and the sanitized version, both so I can re-run the conversion and I can quickly see the offending html if a XSS hole is discovered.
So, why did I do this? I'm a fan of cloud computing, and have used every Amazon Web Service I could find a use/experiment for. While I prefer ruby to python, Google's cloud offering is very enticing, and only by using it can you really know the power/limitations.
Helen Thomas asked that after White House press secretary Dana Perino refused to address the issue of the White House's approval of harsh tactics (aka torture). Think Progressive has footage of the exchange.
Demo Girl is a cool site has demoed hundreds of screencasts, and I'm honored that she found Taboo useful enough to create a screencast about it.
In her taboo screencast, she covers how it works, including some of the newer features like the dropdown next to the [T] icon, and the hover feature on the grid view.
I'm sorry to say that I expect the screencast to be out of date soon, as I'm finishing a cool "mosaic" view of tabs, as well as resizing on the grid view. :)
I have a few more bugs to fix, but I hope to release the new version this week.
Reading the comments on those posts, I can see I need to do a better job of explaining what taboo is and is not as many where confused about what it did and why it was useful (or they can just watch Demo Girl's screencast!) There were several people wanting a syncing feature (which has been requested on the google group), and I'm hoping to implement a syncing feature next month. I also like the comment by ma5t3rw1tt about saving tab groups. I need to think about how the interaction would work.
I don't think Mozilla really hate us, they just don't understand what they are doing.
I've been developing extensions for several years now, and if there is one thing that is constant, it is how much of a pain Mozilla and addons.mozilla.org (AMO) can be. This is an expanded version of a comment I left on Mark Finkle's blog.
The Good
After years of not sharing any information about how many installs/users your extension has, the new site gives you that information and some of it is public (showing 4,316 downloads on both the extension listings and details pages).
The Bad
That leads us to the pain section. Users who visit the site have NO idea what download number means.
The only logical thing to assume is there have been a total of 4,316 downloads of the extensions. But what it actually means is downloads in the last week (as you can see from the information they show on the developers view of the site.)
The previous version of the site had both comments (reviews) and a simple threaded forum attached to each extension. The UI wasn't clear and lead to confusion as people would use both interchangeably. The two have been combined into a single feature in the new site, threaded reviews, which leads us to the ugly.
The Ugly
I build extensions that I want to use. I publish them to AMO because I hope others will want to use them as well. I want to interact with my users. I want to hear their feedback, both problems and kudos. When moving to the new version of the site AMO deleted all the old forum conversations. I took the time to visit AMO regularly to respond to my users. To talk with them to determine how to fix compatibility issues with Tab Max Plus.
When I said I visited the site regularly, I do mean manually visited my addon's page. AMO didn't and still does not provide feeds or emails to developers of user feedback. Responding to users who have problems requires constant checks.
To make matters worse, Mozilla has an inefficient editorial review process that removes any possibility of interacting with your users. Reviews can not show up for multiple weeks. Hanging out in #addons I got to see a conversation where an extension developer was asking why his responses to reviews on his extensions weren't showing up. The addon's editor responds that it wasn't personal, that they hadn't checked the queue in a while and when the editor checked there were thousands of comments awaiting review.
Communication with your users is impossible. A two sentence conversation would take a month at this rate.
Mozilla's inefficient editorial review doesn't limit itself to just reviews, they also review extensions before they are made public. Releasing a new version can take over a week, during which time users are frustrated as the extension is disabled in the newest beta.
Mozilla's release process for new versions of the beta leaves both developers and users in the dark. Mozilla seems to not know the meaning of a code freeze, as they continue making large changes to the beta after they say the code has frozen. Only after the beta is released the extension developer may upload a new version of extension to fix any issues and mark that it is compatible with the new version (no lead time before Mozilla's software updates for Firefox start occurring.) Then you must wait up to a week for them to review the new version and post it for users. During this period there is no visibility. Users aren't shown that a new version is being reviewed (although unreviewed extensions are shown when you search the site!), so the users who love your work the most end up writing angry emails asking when you will fix your extension. Developers don't know their place in the queue.
All of this makes me have a bad taste in my mouth when Mozilla tells us Update or Fade Away.
The Future
What can Mozilla do to improve? Think about the situation from an extension developer's point of view. Talk to us. Many of the things I pointed out can be treated as "bugs", but creating a checklist to be fixed as such is treating the symptoms. Think about ways to make it a better experience to develop for Mozilla, instead of being a pain in the ass.
AMO and all of the processes around extensions give firefox at large, firefox extension developers and AMO itself a bad reputation due to all the functional and process breakage.
I've been working on an extension for Firefox/Flock for a while now that lets you interact in what I think is a more natural way: S3://
With Yosh's help, we've wrapped access to S3 in a protocol handler. In english this means if you go to s3://bucket/key it will access the key from the bucket, using your AWS credentials if you've set them up.
Features
Creation/deletion of buckets
Uploading multiple files
Uploading files by DND onto s3://bucket/prefix/
Partial listing of files s3://bucket/files_start_with_this
Deleting of files
And more, there is still more to come, but I've waited long enough to publish this.
A couple of days ago I finally upgraded userscripts.org to Rails 2.0.
After I fixed all the broken routes it created (the way named routes are created was tweaked slightly) and replacing old-school pagination with Err's Will Paginate, everything seems to be working. Given that many useful rails plugins only work in Rails 2, I've wanted to upgrade for a long time.
This is nice, but built-in CSRF protection is what made the upgrade a must. I've woken up many nights with detailed nightmares of attackers doing all sorts of evil starting from a CSRF attack...
Sites with clean/guessable urls make both XSS and CSRF easier, since the attacker can easily generate the urls they want to attack. URL paths like /account/delete, /status/update, /script/create, ... are easy targets if you don't have adequate protection.
Interestingly, even gmail is not immune to messing up and exposing CSRF vulnerabilities. The attacker used a CSRF attack to add a filter to forward mail. The attacker was then able to steal the victim's domain name, since many services (including domain name registrars) use email to verify requests.
While that attack was a targeted attack, mass attacks are possible. Create or steal some interesting content (link bait) for social sites (Digg, Delicious, Reddit, ...), then attempt to add a filter that sends any email with the words password, account, ... having it send them to a unique account. This crowd
largely uses gmail, so it is only a matter of time until your attack yields useful information. Using the iframe trick to hide the referrer would help disguise the attacker....
Google has fixed this flaw.
A Case For Custom Browsers
I have several friends who use several web browsers for different purposes. For instance, Britt likes to use Flock for browsing and social stuff while using Firefox for development work (a common pattern since Firebug makes Firefox painful to browse with). Additionally mac people can use Mailplane, which is a gmail only web browser (with lots of cool customizations such as drag and drop attaching of files, ability to grab screenshots, ...; in general it makes gmail feel like a well integrated desktop app.)
I've thought that using different browsers for different use cases makes sense for a long time, but hadn't thought about it in terms of security. Using different browsers can make you more secure by limiting your exposure.
I use gmail via HTTPS only, which is a pain since google tries to send you to the HTTP url. My entire life lives in my gmail. I couldn't imagine loosing my account. I've thought I was doing a decent job at protecting myself... If there are future gmail holes, even if I only visit gmail via HTTPS, I am still vulnerable. All but one of the cookies for mail.google.com are valid for non-HTTPS even if you only visited HTTPS - hence visiting evil.com allows it to do a form post to gmail's http address :(
By using Mailplane (or other single-site web browsers), your cookies/sessions cannot be hijacked. Given how important my gmail account is, it might not be silly to never log into gmail from my regular web browsers.
And then there's Flash
Services that store session information in Flash (via "flash cookies" - technically "shared objects", which are not stored by the browser but by flash) present a slight wrinkle. Flash cookies are shared between all browsers (on OSX and linux, not sure about windows/ie since it might do plugins differently). While many financial institutions use systems that utilize flash cookies, I'm not aware of any vulnerabilities caused or worsened by flash sharing shared objects between browsers.
When I entered grad school, I decided to work on an open source project that used SDL to learn about graphics/game programming. I found a Tux Typing, and soon became a regular contributor. It was fun, and I still hear from folks who used it or their kids did. Eventually I became the project leader and learned a lot (since then others have taken the torch - since it is open source others came along and just started coding, then adopting it :) ).
Today my asus eee pc ultra-portable linux laptop came in. It is a really cool machine, and it includes Tux Typing (hey Asus - you should send me a free 8G model). What makes it particularly interesting is that my dev machine when I was coding on Tux Type was a Netpliance i-opener.
The specs were awesome: WINChip C6 180MHz, 32MB Dram (SODIMM), 16MB Sandisk Flash On Board, 10" color lcd 800x600 16bpp - of course I upgraded to 128MB ram, and used an external hard drive. It was no speed daemon, but it was a fun little device. The Asus feels the same way, except it has 900 Mhz pentium, 512MB ram (soon to be 2GB), 4GB flash (got 8GB SD card coming, plus it takes USB drives), and a 800x480 screen. I cannot wait to get my dev environment going on it. Debian etch net install, followed by a very minimal X environment (wmii, firefox, gvim maybe?), and then git/ruby/... It will be a few weeks before I'll have time to even think about upgrading the OS, so hopefully others will have blazed the trails on getting a good debian environment going.
Perhaps with this slower machine I'll be able to make Taboo fast for Kragen.
WiiMe is a twitter bot. It checks amazon every second to see if there are any Wii's available, and if so will update the status on Twitter. So if you 'follow' the bot, Twitter will send you a message as oon as the bot finds Wii's!
Implementing WiiMe was easy, as I had written much of the code for other projects which integrate with Amazon (Book Burro) and Twitter (Book Finder).
I've not mentioned Book Finder here, but it is a bot, that if you send it a direct message, you will get a response of the price range for the book at Amazon. You can send the bot either an ISBN or a string - sending the ISBN isbetter since it finds the exact book you are looking for.
Using Twitter as a platform for notifications is pretty cool. Twitter solves notifications to phone, IM, and (soon) email - I only have to write the code that is unique to my app.
Already 3 people have got a Wii thanks to WiiMe! (and that was before I had even shared it with anyone)
Taboo, a firefox and flock extension I created with Manish Singh and Ian Fischer has been released!
Special thanks to Ian, who besides coding a bunch of it, helped push it through addons.mozilla.org (noticing that it was reviewed and submitting it to be added!)
What is Taboo?
Taboo is a solution to a problem I have. I want be able to save a page for later. While I could bookmark the page (locally or using delicious), I would lose my position in the page. As well, scrolling through tons of bookmarks is not fun.
So with taboo, we save your page for you. We use the session saver code to record your current location in the page. We use the canvas code to take a screenshot. We use sqlite to store all the information so you can search for previously saved taboos.... Then you can close the tab. When you want to return, you can find it with all the other tabs (searching, visual browsing or finding it on the calendar) and click it to go back to it - scrolled to the same position, form fields filled out, the back/forward button reverted to the history when you saved the page.
A meta goal I had for taboo was to make it easy to extend. That is why we have both a calendar view and a grid view. The code for both is about 20-40 lines of javascript and then css. So, if you want to build a better view, talk to us (taboo google group), and we will help you do it!
If you are learning erlang, you might run into Bill Clementson's Beer Song, as it is currently on the front page of trapexit.org (the erlang community site).
Having only written a couple simple erlang programs, I was at first confused by the program. In learning exactly how the program worked, I removed the verse generated code (since for bears, counting is hard enough) and added a bunch of debug statements.
Go read Bill's version, then you can see mine.
-module(bearsong).
-export([start/0]).
start() ->
clear(),
lists:foreach(fun spawner/1, lists:seq(0,3)),
countdown(1),
countdown(2).
% spawner creates a new process that sends a message with a specific number
spawner(Num) ->
Pid = self(),
NewPid = spawn(fun() -> io:format("sending ~p message ~p~n", [Pid, Num]), Pid ! Num end),
io:format("spawner(~p) ~p spawned ~p~n", [Num, Pid, NewPid]).
% countdown from Num to 0.
% countdown listens for a message with the current value of Num, if
% it doesn't receive it, it will ask spawner to create a new message
% with it, and then recall countdown with the same value.
countdown(Num) ->
io:format("countdown(~p) ~p ", [Num, self()]),
receive
Num ->
io:format("ok~n"),
if
Num > 0 -> countdown(Num-1);
true -> ok
end
after 100 ->
io:format("timeout~n"),
spawner(Num),
countdown(Num)
end.
% helper function to remove all queued messages
clear() ->
receive
_ -> clear()
after 100 -> ok
end.
In Bill's version I was confused by his use of spawn in the spawner function. The process being spawned is a function that sends a message to a Pid, the Pid being generated by a self() call in the function. Coming from JavaScript, I had expected self to refer to the context of the function. I didn't expect self() to have the same value in spawner as it did in countdown. Self seems to be the Pid of the process (perhaps implicit) as we see spawner and countdown reporting the same pids.
The second problem I had was why he was using spawn in the first place. Spawn in this instances seems to act like setTimeout(function(){..}, 0) in JavaScript. I was trying to read more into it than delayed execution (combined with the confusion of a shared Pid above.) The program runs the same if you send the message inline instead of spawning a new process to execute the function which sends the message.
While I still don't understand the benefit of spawning new processes to send messages, I do understand how it is works. A recursive function is trying to countdown, it blocks on receiving a message with the current number, and if it doesn't receive a message, it will ask the spawner to resend a new message with the number so it can receive it and continue.
I finally borrowed a PPC mac, so I could recompile deSEDG to allow you to drop multiple files to be converted at once.
So, if you have OSX and unluckily bought a Samsung camera that produces SEDG files (which are really DivX files), you can either hexedit the SEDG chars to DIVX, or download my app.
While moving this site back to rails (edge) from a rake based static site, I added HTML validation to the articles model.
The first step is to install libxml-ruby. This can be done via rubygems: gem install libxml-ruby
To validate if a string is valid html, you will need to wrap it inside a div, otherwise you will get: parser error : Extra content at the end of the document
parser = XML::Parser.new
parser.string = "
#{html}
"
parser.parse
If you run the previous code in a IRB session, parser.parse returns an XML::Document even if the document has problems. If the document has problems stderr will contain the errors (pointing to them with a carrot.) In a web app, having the errors go to stderr is probably not what you want to do. To show the errors to the user, capture the errors by creating a custom error handler.
parser = XML::Parser.new
parser.string = "
#{self.body}
"
msgs = []
XML::Parser.register_error_handler lambda { |msg| msgs << msg }
begin
parser.parse
rescue Exception => e
errors.add("body", '
' + msgs.collect{|c| c.gsub('<', '<') }.join + '
')
end
I added a
around the error messages so that they can be presented to the user using the standard helper method error_messages_for. Then adding some css to make the errors fixed width, I get useful error reporting on invalid html.
I'm at the Computers In Libraries conference. If you want to talk about userscripts, bookburro, or tech related to libraries, find me or email me. I'll be presenting Tuesday 4:15 pm.
For a few months back in 2006, we misconfigured something in our rails/db stack and some improperly incoded data got in. Hence, working on userscripts.org on my laptop with a copy of the real database hasn't been possible. I can do a sql dump, but reloading I get:
% psql uso_dev -f dump.sql
psql:dump.sql:33795: ERROR: invalid byte sequence for encoding "UTF8": 0xfa
HINT: This error can also happen if the byte sequence does not match the encoding
expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY scripts, line 54
To help fix this, I wrote a small rake task I use for building a report of records with encoding issues.
def invalid_encodings( model, fields, deleted=false )
report = {}
records = if (deleted)
model.find_with_deleted(:all)
else
model.find(:all)
end
records.each do |record|
fields.each do |field|
begin
record[field].each_char { |char| char.unpack('U') } unless record[field].blank?
rescue
report[ record.id ] ||= []
report[ record.id ] << field
end
end
end
report
end
If you use acts_as_paranoid, you will need to send true for the third parameter, since the deleted records will still be in the dump and can cause problems if not encoded properly.
I'm almost able to use the database on my laptop, I've fixed:
While working on bringing userscripts.org into the 21st century, we've added validations that our models were missing.
Unfortunately this has the side effect of making previously valid records invalid. (email addresses with and @ sign, missing names, ...)
To help resolve those issues I built a small rake task, which iterates through each record, testing if it is valid, if not adding the id to a list.
def invalids(model, verbose = false)
errors = {}
records = model.find(:all)
records.each_with_index do |record, i|
next if record.valid?
record.errors.full_messages.each do |msg|
errors[msg] ||= []
errors[msg] << record.id
end
if verbose
print "#{i} of #{records.length} | "
print (errors.keys.collect { |k| "#{k}: #{errors[k].length}" }).join(', ')
print "\n"
end
end
errors
end
After running this overnight, I know what to fix on our production user table.
Email has already been taken: 54
Display name has already been taken: 3066
Email can't be blank: 4
Display name can't be blank: 107
And I have a list of the ids for each issue. Once I get these fixed I can move on to more exciting things like fixing tagging, versioning, and ...
Google discontinued their SOAP based search api recently, and bloggers have been going nuts. It's getting a bit old. In Google Maps API team says: Stop It! the author laments that Google is imposes 50k requests to their geocoding service and acts as if it is a sign of the apocalypse! First Google kills their search API, now they are limiting us! The man is keeping us down!
As someone who used Google's SOAP based search api in several projects I can assure you that it needed: either to be killed or to be resurrected. There are many reasons why it is good to have the API available, but no has admitted that it sucked. It was neglected. It returned Bad Gateway almost as often as it returned search results. It was a true case of bit rot. The person who created it is no longer interested or able to care for it and hasn't for years.
As for the "evilness" of rate limiting their geocoding service, a quick comparison to "the king of web services", Amazon's ECS whose rate limit is very similar (ecs is the api which lets you get pricing/availability/... of products, not to be confused with EC2 which is their on the fly processing power.) Given that Amazon expects to make money as a result of those queries and Google doesn't, Google's once every 1.73 seconds compared to Amazon's once every second seems very reasonable
(yeah yeah, the devils in the details - Amazon's is once every second per IP, if you are going to have a cluster of machines each requiring geocoding, why not set up your own internal geocoding service...)
As I said a bit ago I'm going to be focusing on improving userscript.org. It took a bit of time to get everything going, but with some help from some friends (Jeremy Dunck, Britt Selvitelle, and a new resource by Jeff Lindsay called DevjaVu) I think we are ready to rock and roll!
"Trunk" is currently much improved from what is running - but it isn't ready for prime time yet! Britt has done a lot of heavy lifting to remove a bunch of cruft that had been there since we were learning rails, and removed a lot of the gratitus ajax. My goal is to get userscripts.org running on trunk by the end of the year (so help with development, creating tickets for issues, testing (or writing testing code), helping with the css, or ... any help is appreciated.
If you want to get started, be running the current rails (perhaps we should move to 1.2?
svn co http://svn.devjavu.com/userscripts
Update a few config files (database.yml and userscripts.yml)
Wow, it's been a long time since I've touched Tux Typing. It had started to bit rot (due to a SDL_ttf blowing up when it tries to render white space). I had been getting regular emails about bugs (especially on mac -- wonder if it is the same issue?) and requests for enhancements.
I also moved tuxtyping to svn. Tux Typing is hosted on sourceforge, so no nice Trac goodness, only viewvc. During the shuffle I took the opportunity to reorganize the files/directories, which of course probably broke every build script besides my hand made makefile... I hope I can get Cal and a few of the other Tux4Kids guys to help fix this :)
My 2006 Goal is to get Tux Typing to the point where it can be including on the awesome OLPC project. I'm fully devoting myself for the rest of the year to accomplishing this and similar goals for userscripts.org, book burro, my unrelease "read-write-web" project, CoCoA, ... (got a long list and I'm working through them slowly)
If you have access to an OLPC, it would be awesome if you could check out SVN and let me know if it works (requires sdl libraries):
svn co https://tuxtype.svn.sourceforge.net/svnroot/tuxtype