Tag Clouds in Ruby on Rails

Tag Cloud
My finished tag cloud.

One of the first projects I got to work on at Art.sy was a tag cloud that provides a quick visual representation of the types of artworks that are in the site’s database. These can be quite visually striking if done correctly, but can be ugly if done poorly. Some of this is a result of design considerations (the underlines on the second tag cloud aren’t helping) which is definitely not my area of expertise, but the algorithm behind a tag cloud is just as important if it is going to be successful.

Just like any good Ruby on Rails/jQuery developer, I first scoured the internet to see if someone had already solved this problem adequately. What I found wasn’t too promising — many of the plugins were using math that would output text in just a couple of preset font sizes, and wouldn’t fully capture the relative frequency of a given tag. I determined that I would have to build it myself, and because at the time I was much more comfortable with Ruby than I was with JavaScript, I decided that I’d build the logic into the backend rather than build a jQuery plugin. I did get help on the algorithm, however. The best one I found was the following (which I’ve translated into Ruby):

weight = (tag.count-minOccurs).to_f/(maxOccurs-minOccurs)
size = minFontSize + ((maxFontSize-minFontSize)*weight).round

For the code above, minFontSize and maxFontSize are constants that you decide beforehand that dictate the font sizes for the least occurring and the most occurring tag. For my purposes, a min font size of 5px and a max font size of 100px worked nicely. minOccurs and maxOccurs need to be determined algorithmically before running the above code (this is quite easy for my setup, and we’ll see). The way this code works is slightly cryptic, but basically it first determines a ratio of how much the tag differs from the lowest tag relative to the entire span of tag counts, and then uses that ratio multiplied by the span of font sizes to determine how much greater the font size should be for a given tag above the minimum font size.

The rest of the code isn’t too difficult, but took a while for me to configure correctly. To start, it’s a little unclear where exactly this code should go. I initially created a tag_cloud method in the Tag model, which I then called in the controller to pass to the tag cloud view, but that doesn’t really make sense because the logic here has to do with font sizes and other visual considerations and thus shouldn’t be in the model. I ended up putting all of the logic in its own method in the tags controller.

def tag_cloud
   tags = Tag.asc(:name)
   if tags.length > 0
      tags_by_count = Tag.desc(:count)
      maxOccurs = tags_by_count.first.count
      minOccurs = tags_by_count.last.count

      # Get relative size for each of the tags and store it in a hash
      minFontSize = 5
      maxFontSize = 100
      @tag_cloud_hash = Hash.new(0)
      tags.each do |tag| 
         weight = (tag.count-minOccurs).to_f/(maxOccurs-minOccurs)
         size = minFontSize + ((maxFontSize-minFontSize)*weight).round
         @tag_cloud_hash[tag] = size if size > 7

All I’m doing here is storing the tag objects and their calculated sizes in a hash to send to the view. I’m excluding tags from the hash with calculated sizes less than 7 for purely aesthetic reasons (there were a ton of tags, and having all of them in the tag cloud didn’t look great).

The view code, written in Haml, is even more straightforward.

   Most Frequently Used Tags
- if @tag_cloud_hash != nil
   - @tag_cloud_hash.each_pair do |tag,size|
      .cloud_element= content_tag(:a, tag.name.downcase, { :href => "#{tags_path}?tag=#{tag.slug}", :title => "#{tag.count} artworks", :style => "font-size:#{size}px;" })

Actually, the part of this project that took the most time were the stylesheets, mostly because I find CSS in general to be really counter-intuitive. It’s always unclear to me what styles are being inherited when I’m working with large stylesheets. Luckily, I’m writing the styles in SASS which helps a lot. Here’s what I came up with:

   display: block
   width: 600px
   margin-left: auto
   margin-right: auto
      text-align: center
      display: inline
         text-decoration: none
         line-height: 110%
         vertical-align: middle

Put all of this together and you get a pretty nice looking tag cloud. There are of course tons of possible improvements, and it still looks nothing like my example of a well done tag cloud, but this is a good start and is a big step up from a lot of the tag clouds I’ve come across online (including the one on this site). If I ever come back to this project to improve it, I’d probably redo it in jQuery rather than keeping the logic on the backend. While writing the code in Ruby was (I think) much more elegant than a potential JavaScript solution, moving to frontend code would open up the doors to tons of fancy effects. And fancy effects are cool.


The Beginning

I’ve been a “liberal arts person” my whole life. For most of my academic career, I have identified English and History as my favorite courses. But oddly enough, Physics class turned out to be the most interesting class in high school, and BC calculus was my favorite class in the Winter term of my freshman year in college. I realized that this interest wasn’t isolated for the last couple of years — in my spare time, computer programming has been a hobby of mine since sixth grade. It’s taken me 8 years to realize that I actually love applied math, especially when it results in a cool video game or a groundbreaking website. About two months ago, in the middle of the first academic course I’ve ever taken in computer science (Intro to Java, of course), I realized that I had to pursue this discipline more seriously. I had way too much fun at school last term to not continue with computer science.

Since then, I’ve thrown myself headfirst into the tech world. Over the past two months, I’ve created a Twitter account (@mmcnierney), joined Hacker News, and created a GitHub account. There are probably more — I can barely keep track of it all. Oh yeah, there’s this blog too.

The best part of it all is I’ve had the incredible opportunity to intern at Art.sy this summer, a start-up based in New York City that wants to be the Pandora for the art world and has some serious potential to shake up the fine art industry. Over the past month, I’ve gone from knowing very little about Ruby on Rails and jQuery, not to mention never having heard of git, Backbone.js, Haml, SASS, or MongoDB, to having a fairly solid working knowledge of the “bleeding edge” of web development. Just as exciting has been all I’ve learned about start-up culture from working in General Assembly, which has turned out to be the greatest work environment imaginable, as well as all the lessons I’ve had in managing tech teams from both the founder and the head of engineering. More on that later.

I’m at the very beginning of what I hope will be a long career in this industry, and I want to record all that I’m observing and learning. I’m going to be documenting the programming challenges I’m working on, because the more I learn about open source the more I want to contribute back to the community that has given me so much for free. But I’m also going to be commenting on start-up culture and the more intangible things I’m learning, while talking about the projects I’m working on and what I’m doing during the school year to stay connected with this incredible tech world that I barely knew existed until very recently. I intend for this blog to have something for everyone, so enjoy! I know I will.