Scraping for Hinman Box Numbers

Dartmouth’s CS50 “Software Design and Implementation” class is a gift that keeps on giving. Yes, it was basically a full-time job on top of my other coursework, and yes, it kept me from attending as many frat parties as I wanted to last spring. But it also taught me shell scripting, which has proven incredibly useful on numerous occasions.

The latest example: my fraternity recently had a party where we wanted everyone we invited to get physical invitations in their mailboxes. Here’s the shell script that made that possible:

while read p; do
  NAME=`echo $p | sed 's/ /%20/g'`
  curl -silent "http://dndlookup.dartmouth.edu/datapage_dartmouth.php?name=${NAME}&fmat=1" | grep -o "HB [0-9]*"
done < $1

The script takes an input text file with one name per line and then scrapes the Dartmouth Name Directory for each person’s Hinman box using curl. I kind of feel like Mark Zuckerburg downloading Harvard’s facebook data every time I run it.

Advertisements

HTML Emails that Work

Successfully branding a web app requires carefully designing every aspect of its interaction with users. If you’ll be sending any emails to your customers, you’ll probably want to send something a little more impressive than simple plain-text. To do this, you’ll have to code an HTML email. This is not an easy task — making a complicated layout look good in a variety of email clients is an order of magnitude more challenging than making a website cross-browser compatible.

The main rule: forget everything you learned about CSS3 best-practices, and go back to how you coded websites ten years ago. If you don’t want to read past this second paragraph, here are the rules of thumb that inspired all the rest of the tips in this post:

  • Keep the total width of the email less than or equal to 600px. You can ensure this by enclosing the whole email in a table with a width attribute of “600”Use HTML formatting tags instead of CSS whenever possible.
  • All CSS that you do use (and it’s totally legitimate to use CSS for most things, just avoid positioning) must be inline. You can code your email with CSS in <style> tags in the <head> and then run it through Mailchimp’s free automatic CSS inliner tool to make this process a lot easier.
  • All email content must be entirely static — you can’t use any Javascript to help with layout.

Layout Without the Box Model

You can still achieve pretty complicated layouts without using CSS positioning, it just takes a bit more work to get everything to look good in a variety of email clients. Some HTML emails get around this layout problem by simply sending an email with a large picture that has all of the email content on it. While these emails are easy to create, avoid this temptation. Having all text in your email as actual text will help please spam filters, is much more user friendly, and will help email clients give relevant text previews of your email.

You’ll have to use tables to achieve anything other than very basic layouts. Using cellpadding, align, and valign attributes on table tags, you should be all set. Properly aligning text and images can be difficult and may require image slicing. Make sure that all images have “display: block” CSS applied to them, or else images that are meant to be displayed flush with each other will have a small separation between them in Gmail and Hotmail (for mysterious reasons).

Adjusting space above and below elements should, in most cases, be done with br tags. You can also use the line-height CSS attribute on p tags, just be sure to test this in a variety of email clients because many handle this attribute differently.

Styling Links

People tend to respond to default blue text links in emails, so don’t stray from this styling unless you have a good reason. If you do want custom links, be sure to define any custom colors within a font tag that is within the a tag. This is necessary because some email clients (like Gmail) make all links target=”_blank”, and in the process, strip out any color CSS you’ve added to the link.

Sometimes email clients will link text that you don’t want to be linked, such as text that looks like a URL or email addresses. There are two ways to deal with this:

  • You can control the style of the link by explicitly defining the text as a link and styling it appropriately.
  • You can enclose the text in an anchor tag without a href= attribute to make it behave like normal text and prevent it from being automatically linked. This won’t work in all email clients, most notably the iPad/iPhone.

Images

The best way to include images in HTML emails is to host the image somewhere with a publicly-accessible URL, and use that full URL to refer to the image in the src attribute of an img tag. As explained above, all images should have the display: block CSS attribute so that Gmail and Hotmail handle them correctly without adding on extra space between images.

Resizing images isn’t smart in some email clients, i.e. if you just specify a width or a height for the image, it won’t preserve the aspect ratio of the image. If you want to display an image not in its original size, you’ll have to calculate both the width and the height of the image and explicitly write them in HTML. It’s good practice to define the width and the height in both the style attribute and the width and height attributes, because some email clients won’t recognize the width and height CSS definitions. The easiest way I’ve found to determine both the height and the width for an image given only one is to define one and then inspect the element with Chrome to determine the other measurement.

Even if the image will be displayed in its original size, it’s good practice to define the width and height anyway. Many email clients don’t display images by default, and by defining a width and height for the images, the placeholder that the client uses will be the proper size and won’t break your layout. In most email clients, specifying a background-color CSS attribute and a bgcolor img attribute will display a colored block instead of the default image placeholder before the image loads, which can greatly improve the look of your email when images are turned off.

Other Tips

  • It’s not currently possible to use fonts that aren’t installed on a user’s computer, i.e. font embedding won’t work
  • Avoid custom characters for <li> elements; it’s not possible to define pseudo-classes inline, and Gmail strips out the text-indent property, making it impossible to space it manually.
  • Use Mailchimp’s inbox inspection tool (not free) to see how your email looks in multiple email clients and OSs

Hunting Bugs or: How I Learned to Stop Worrying and Love git bisect

Getting assignments to write new features on a web app is always way more exciting than looking at Pivotal Tracker and seeing a list of bugs to fix. But I have to admit that probably the most satisfying feeling as a programmer is the feeling you get when you’ve solved a complicated bug in a clean way. I’m not talking about bugs that are caused by syntax errors or accidentally using the wrong methods. I’m talking about the bugs that have seemingly innocuous symptoms, but end up taking you deep into the rabbit hole of your app.

The work here is the hunt — once you’ve actually tracked down what’s causing the problem, it’s generally very easy to fix the issue. When you’re working on a large codebase with multiple contributors for an asynchronous web 3.0 app, this can get really interesting (and frustrating). What do you do when the problem isn’t immediately apparent?

About 25% of the time, you get lucky and Google does all of the work for you — simply copy and paste the error message that the application is giving you or type in a short description of the issue and you’ll be surprised how many people have encountered the same problem and have solved it already. Stack Overflow is a goldmine of bug solutions, and so is the vast ecosystem of tech bloggers (like me) who give you step-by-step instructions on how to solve bugs.

70% of the time, it’s a bit more of a challenge. The bug is specific to the way your team has set up the application and no one has encountered a bug quite like the one you’re currently facing. Google doesn’t help at all. Here’s where bug-hunting becomes much more of an art than a science.

Typically the process is fairly quick if you’ve written the code that you’re debugging — you already know what assumptions you’ve made in developing the feature and what events are firing when, and so strange behavior can often be tracked down fairly quickly. Luckily, you’re using git (right?), so even if you didn’t write the code you’re working with, you can quickly track down who did. Simply run git blame path/to/file in your console and git will output a line-by-line summary of who wrote each line of code.

If that doesn’t get you closer to a solution, the next step is to console.log (or its equivalent) everything. I’ve had the opportunity to see expert programmers hunt down bugs at Art.sy, and that’s exactly what they do. There’s generally no need to use fancy debugging software or set breakpoints or anything complicated like that — just output to the console when events are firing, when methods are being called, and what the value of variables are at specific points in time. Obviously, you need to make educated guesses as to what might be causing the problem, and then figure out whether you’re right by outputting values to the console.

What about the last 5%? It’s reserved for the worst kind of bug. The bug that is impossible to fix. You’ve searched all over the internet and no one seems to ever have had the same problem. The code you’ve written and poured over is perfect and definitely is not the cause of the problem. The project you’re working on has thousands of lines of asynchronous code — literally anything could be going wrong and you have no leads whatsoever. Don’t worry, there’s still hope. git bisect is still your friend.

git bisect works when you know that a feature worked at one point in your app’s history and no longer works in your most recent commit. Don’t know offhand when the feature actually worked? git blame can help here — find out when the feature that’s no longer working was merged into master, and you can hopefully safely assume that the feature was working at that point.

Once you’ve located a good point and a bad point in your code, run git bisect start in your current branch. Then, since a git repository is basically a sorted list of previous states of your app’s codebase, git can perform a step-by-step binary search to find the first point in your app’s history where the feature stopped working. At each step, you run either git bisect good or git bisect bad in your console depending on whether the feature works or doesn’t work in the commit that git bisect picks. Eventually, the program will output the commit id of the commit that broke the feature. It’s an amazing feeling when you’ve been struggling with a bug for a day or two.

git bisect isn’t always that simple when you’re running it on an app that has a bunch of dependencies. Here’s how to get around any snags if you’re running git bisect on a rails app:

  • Restart your rails server and run bundle install at each commit that git bisect chooses. You might also need to manually compile assets at each point if something isn’t doing that for you automatically.
  • Make sure your “good” starting commit is the commit that merged the feature you’re debugging into master. Programmers don’t always commit 100% working code, but when it comes time to actually merge a pull request, you can be sure that the app ran smoothly and specs were passing.
  • Sometimes the app won’t run on old commits because paths in old versions of your Gemfile aren’t valid anymore. For example, when I used git bisect on Art.sy’s codebase, a co-worker’s change to his Github username caused multiple bundle install failures. You’ll have to address these issues manually. If you can’t figure out why a gem isn’t installing, just replace the line with the corresponding line in the app’s most recent Gemfile.

Aside from those potential issues, git bisect feels like magic for those epic bugs that refuse to be solved. Once you’ve found the commit that broke the feature, you’re almost certainly 99% of the way to solving the problem.

Embedding API Sandboxes in Documentation

Last week I released a jQuery plugin called API Sandbox to help developers of web apps expose their API in a guided sandbox environment. The usage is simple. On a template, make sure there is a dedicated div element available to place a sandbox and then simply call the apiSandbox function on it.

$("#user").apiSandbox("get","/api/v1/users?user_id=")

This would create a nicely animated sandbox environment with proper fields for the parameters expressed in the path (in this case, one for user_id). API Sandbox supports generic URL parameters at the end of the API path, in addition to symbol wildcards anywhere in the path, preceded by a “:”.

There are a bunch of applications for this, and I’m still making improvements to the plugin, but I want to talk about one particularly cool application today. This plugin combined with the latest version of Redcarpet (2.0.0b), a Markdown parser, can allow you to easily embed sandboxes inline with your API documentation.

What’s so cool about the new version of Redcarpet is that instead of simply relying on the plugin author’s interpretation of how Markdown should be transformed into HTML, you can specify your own rendering rules and just let Redcarpet do all of the parsing. To easily embed sandboxes, I chose to override Redcarpet’s default rendering of links and change links to API paths into dynamic sandboxes. I created a subclass of Redcarpet::Render::HTML and created override methods for link and doc_header.

class DocsParser < Redcarpet::Render::HTML
  def link(link, title, content)
    if link.include? "SANDBOX: "
      path = link.gsub("SANDBOX: ","")
      id = path.gsub(/[\/?&=\[\]]/,"")
      rendered = "<div id='sandbox'>"
      rendered += content + "<div id='" + path + "' class='sandbox " + id + "'></div>"
      rendered += "</div>"
    else
      link = link || ""
      title = title || ""
      content = content || ""
      rendered = "<a href='" +link + "'>" + content + "</a>"
    end
    rendered
  end

  def doc_header
    "<script>new App.Views.Docs({ el: $('#main_content') });</script>"
  end
end

All I did here was render a div element with the id of sandbox instead of a link when the path has a prefix of SANDBOX:, and then added in a script in the header of all generated documentation pages to load the script that changes the divs into sandboxes (the site I’m working on uses Backbone.js, hence the new App.Views.Docs).

Then you have to write an epic 2 lines of CoffeeScript code to turn all of the links into awesome API sandboxes.

$("div.sandbox").each ->
      $(this).apiSandbox "get", $(this).attr("id")

Not bad at all.

API Sandbox is just the first part of a CoffeeScript/SASS/Ruby API explorer I’ll be releasing over the next couple of weeks, so get excited.

Building Site Navigation with Markdown and Nokogiri

Who knew that Markdown — in my opinion the best text-to-HTML syntax available — could be used for something other than blog posts (or Github readme files)? Turns out that Markdown can be used as a powerful way to make web app navigation really simple to edit, even by non-developers. The task I set out to do was to turn something like this:

* Home
* About
    * About Us
    * API
    * Terms of Use
* Browse
* Etc.

into a nice dropdown menu. Making this markdown text available to a view is a fairly simple task. In Ruby on Rails, if you have a Page model that can hold Markdown text, you simply have to fetch it in the controller and call .to_html.html_safe on it. Also, with the help of some ternary operator elegance, you should make sure that if the page doesn’t exist, the code doesn’t blow up.

c_nav = Page.find_by_slug 'client-nav'
@client_nav = c_nav.nil? ? '' : c_nav.to_html.html_safe

So now, with the addition of a simple line in the view where we dump the contents of @client_nav onto the page, we’ve accomplished displaying a terrible looking unordered list when we could have just hard coded a nice looking menu in Haml. Here comes the fun part.

The obvious first choice for this task was to use jQuery to dynamically add CSS styles to the unordered list to turn it into a nice navigation menu. With the use of jQuery’s addClass method this is fairly simple, and it worked when I implemented it with just a couple lines of code. Unfortunately it resulted in annoying flickering as the page loaded because the unordered list is displayed un-styled for a split second (unacceptable). This could probably be fixed, but why bother doing something on the browser when it can be handled perfectly well on the server? After a couple of painful hours attempting to add classes to the generated HTML using Ruby string manipulation techniques and having little success, I discovered that this problem had already been solved by the creators of the Nokogiri gem.

To begin turning the raw HTML into styled goodness, I first stripped out the opening and closing ul tags manually. I enclosed the code to do this in a beginrescueend block because the test suite I’m using (Rspec) didn’t like me using negative numbers for indexes in a string for some reason.

begin
   @client_nav[0..3] = ""
   @client_nav[-5..-1] = ""
rescue
end

Then I let Nokogiri work it’s magic.

@client_nav = Nokogiri::HTML::DocumentFragment.parse(@client_nav)
@client_nav.css("li:root").each do |anchor|
   anchor['class'] = 'nav_item'
end
@client_nav.css("li:has(ul)").each do |anchor|
   anchor['class'] = "nav_item more"
end
@client_nav.css("ul:first-child").each do |anchor|
   anchor['class'] = "dropdown"
end
@client_nav = @client_nav.to_html.html_safe

Because the HTML I was passing to Nokogiri is just a fragment of an unordered list, I used the DocumentFragment.parse method to prepare the HTML for Nokogiri to dynamically add CSS. Then, using basic CSS selectors (as well as one really useful one — has() — taken from jQuery), I added the proper classes to the list elements. It worked like a charm.

I’ll leave the CSS magic up to you — suffice it to say that with the proper stylesheets, this can look really great and is infinitely configurable. Why bother with bulky client-side code when Nokogiri provides a just as (perhaps even more) elegant solution?

Tag Clouds in Ruby on Rails

Tag Cloud
My finished tag cloud.

One of the first projects I got to work on at Art.sy was a tag cloud that provides a quick visual representation of the types of artworks that are in the site’s database. These can be quite visually striking if done correctly, but can be ugly if done poorly. Some of this is a result of design considerations (the underlines on the second tag cloud aren’t helping) which is definitely not my area of expertise, but the algorithm behind a tag cloud is just as important if it is going to be successful.

Just like any good Ruby on Rails/jQuery developer, I first scoured the internet to see if someone had already solved this problem adequately. What I found wasn’t too promising — many of the plugins were using math that would output text in just a couple of preset font sizes, and wouldn’t fully capture the relative frequency of a given tag. I determined that I would have to build it myself, and because at the time I was much more comfortable with Ruby than I was with JavaScript, I decided that I’d build the logic into the backend rather than build a jQuery plugin. I did get help on the algorithm, however. The best one I found was the following (which I’ve translated into Ruby):

weight = (tag.count-minOccurs).to_f/(maxOccurs-minOccurs)
size = minFontSize + ((maxFontSize-minFontSize)*weight).round

For the code above, minFontSize and maxFontSize are constants that you decide beforehand that dictate the font sizes for the least occurring and the most occurring tag. For my purposes, a min font size of 5px and a max font size of 100px worked nicely. minOccurs and maxOccurs need to be determined algorithmically before running the above code (this is quite easy for my setup, and we’ll see). The way this code works is slightly cryptic, but basically it first determines a ratio of how much the tag differs from the lowest tag relative to the entire span of tag counts, and then uses that ratio multiplied by the span of font sizes to determine how much greater the font size should be for a given tag above the minimum font size.

The rest of the code isn’t too difficult, but took a while for me to configure correctly. To start, it’s a little unclear where exactly this code should go. I initially created a tag_cloud method in the Tag model, which I then called in the controller to pass to the tag cloud view, but that doesn’t really make sense because the logic here has to do with font sizes and other visual considerations and thus shouldn’t be in the model. I ended up putting all of the logic in its own method in the tags controller.

def tag_cloud
   tags = Tag.asc(:name)
   if tags.length > 0
      tags_by_count = Tag.desc(:count)
      maxOccurs = tags_by_count.first.count
      minOccurs = tags_by_count.last.count

      # Get relative size for each of the tags and store it in a hash
      minFontSize = 5
      maxFontSize = 100
      @tag_cloud_hash = Hash.new(0)
      tags.each do |tag| 
         weight = (tag.count-minOccurs).to_f/(maxOccurs-minOccurs)
         size = minFontSize + ((maxFontSize-minFontSize)*weight).round
         @tag_cloud_hash[tag] = size if size > 7
      end
   end
end

All I’m doing here is storing the tag objects and their calculated sizes in a hash to send to the view. I’m excluding tags from the hash with calculated sizes less than 7 for purely aesthetic reasons (there were a ton of tags, and having all of them in the tag cloud didn’t look great).

The view code, written in Haml, is even more straightforward.

#tag_cloud
%h1
   Most Frequently Used Tags
- if @tag_cloud_hash != nil
   - @tag_cloud_hash.each_pair do |tag,size|
      .cloud_element= content_tag(:a, tag.name.downcase, { :href => "#{tags_path}?tag=#{tag.slug}", :title => "#{tag.count} artworks", :style => "font-size:#{size}px;" })

Actually, the part of this project that took the most time were the stylesheets, mostly because I find CSS in general to be really counter-intuitive. It’s always unclear to me what styles are being inherited when I’m working with large stylesheets. Luckily, I’m writing the styles in SASS which helps a lot. Here’s what I came up with:

#tag_cloud
   display: block
   width: 600px
   margin-left: auto
   margin-right: auto
   h1
      text-align: center
   .cloud_element
      display: inline
      a
         text-decoration: none
         line-height: 110%
         vertical-align: middle

Put all of this together and you get a pretty nice looking tag cloud. There are of course tons of possible improvements, and it still looks nothing like my example of a well done tag cloud, but this is a good start and is a big step up from a lot of the tag clouds I’ve come across online (including the one on this site). If I ever come back to this project to improve it, I’d probably redo it in jQuery rather than keeping the logic on the backend. While writing the code in Ruby was (I think) much more elegant than a potential JavaScript solution, moving to frontend code would open up the doors to tons of fancy effects. And fancy effects are cool.