Skip to content

Scraping for Hinman Box Numbers

January 17, 2013

Dartmouth’s CS50 “Software Design and Implementation” class is a gift that keeps on giving. Yes, it was basically a full-time job on top of my other coursework, and yes, it kept me from attending as many frat parties as I wanted to last spring. But it also taught me shell scripting, which has proven incredibly useful on numerous occasions.

The latest example: my fraternity recently had a party where we wanted everyone we invited to get physical invitations in their mailboxes. Here’s the shell script that made that possible:

while read p; do
  NAME=`echo $p | sed 's/ /%20/g'`
  curl -silent "${NAME}&fmat=1" | grep -o "HB [0-9]*"
done < $1

The script takes an input text file with one name per line and then scrapes the Dartmouth Name Directory for each person’s Hinman box using curl. I kind of feel like Mark Zuckerburg downloading Harvard’s facebook data every time I run it.


Why My First App Worked

July 17, 2012

Hint: it wasn’t because of the code.

Just over a full year ago, Jason Laster (@jasonlaster11) approached me at the end of the term with an idea for a project. Dartmouth’s student government was looking for web developers to create an online database of all the organizations on campus, and he had responded to the job offer. Even though I had never worked on a web application before, and knew only simple web programming at the time, I jumped at the opportunity to get involved. If anything, I told myself, this short, 3-week project would give me a leg up on Ruby on Rails before I started my internship at that summer.

A year after that initial conversation, we launched DGD — short for the Dartmouth Group Directory — to campus, and the launch was more successful than any of us could have imagined. Just a week after we sent the initial email out to all undergraduates, over half of the student body had visited the site and were spending upwards of 5 to 10 minutes browsing each group’s pages, a statistic that is considered outstanding in the fast-paced world of the internet.

By the end of it all, I had learned a whole lot more than simply how to code a database-driven website. While I won’t pretend that getting undergraduate students to use a free service is that similar to launching a real product, the biggest lessons learned ended up being about marketing. Surprisingly enough, what ended up actually getting real users had little to do with how perfect our code was, and much more about conscious design decisions that impacted our final site and how it was presented.

Solve One Real Problem

Soon after agreeing to the project, I learned that Jason didn’t take the job just for the money or for the technical challenge. It turns out that he had actually tried a similar project in his freshman year, except that it was much more ambitious. It was intended to be a full-fledged social network for Dartmouth groups, where each group had a page, and students could follow group updates and have profiles of their own. But even after making strides in addressing the technical challenges behind creating such a service, it never took off. He eventually scrapped the project, thinking that there simply wasn’t a need for an app like he created.

We learned with the success of DGD that there certainly was a need for a group database. But even with a need, a lack of focused design prevented DGD’s predecessor from being successful. It simply tried to do too many things at once without clearly defining exactly what it was intended to solve. On the other hand, when we first discussed DGD, we were very clear about the goals of the site and boiled our ideas down to their essentials. DGD would be nothing more (and nothing less) than a searchable database of organizations, each with a text description.

But of course web apps are typically much more complicated than that. Even DGD has grown far from that initial goal. The key rule is that complexity can’t come before functional simplicity. Now, each group in DGD gets its own page, complete with a full HTML what-you-see-is-what-you-get page editor. But that feature did not come before we had simple group descriptions fully functional. You can still see the remnants of that decision in the code base: even though we now think of groups has having pages, the model that deals with the pages is still called Description. (Yes, we should probably get around to changing that.) Designing a huge web app around a complicated and poorly defined problem is destined to lose momentum and result in a disjointed product. A simple solution to a simple yet real problem that can be iterated on is a much more sustainable model.

Find Out Who Matters

As much as this was a painful realization, it has become very clear to me that the general public does not care about or trust the lone developer. Even if you are an expert coder and designer, and you’ve defined and solved a very real and important problem, simply throwing your app on Heroku and posting a link to it on Hacker News and Reddit has a very low chance of driving significant and lasting traffic to your app. There are certainly exceptions to this rule. But the simple fact is that the average person doesn’t know you. Their lives are going pretty well as it is without your service, and who are you to tell them otherwise? You’re a complete stranger.

If someone they know and trust is telling them to use a service, however, it’s a completely different story. As it turned out, being partnered with Dartmouth’s student government was essential to getting people to use and contribute to our app. Not only did Student Assembly have the ability to send out emails to all undergraduates advertising DGD, but also they had the authority of a ruling campus body to add legitimacy to our project. We found another strategic relationship with the webmaster at Dartmouth — luckily Jason knew the webmaster well and was able to secure the domain for our app, but only after a month of negotiating.

Get Personal

Looking at a Google Analytics dashboard, it’s hard to fully comprehend that each and every one of the total page visits was an individual sitting at his or her computer and experiencing your product firsthand. Even for a small web app like DGD, with weekly visits now settling in the hundreds, it would be impossible to connect personally with every visitor. But you cannot neglect the human side of your visitors. Your app has a life outside of the closed world that you create — people talk about it, have opinions about it, and have other things they’d rather be doing than using it. People would prefer to interact with other people over a soulless machine, so you have to make an effort to not lose the human element.

To address this, we first made real world rewards for using the app. DGD is a community directory, so the number of pages and the quality of those pages depends entirely on people’s willingness to write about the groups in which they participate. People don’t really like writing essays. But they do like gelato, especially the best gelato in the United States. So we offered $10 Morano Gelato gift cards to the top 10 contributors to the site after the first week, and got over 50% of our content from that marketing push.

There are also many tools available for interacting personally with your users. We chose Intercom and loved it. It allowed us to track the email addresses of the Dartmouth students signing in, and send them welcome emails suggesting that they add content. It’s easy to overdo emailing your users, but we got a lot of positive feedback about the few emails we did send. People are often surprised that you even have a record of them visiting your site, and respond well to personal messages thanking them for visiting and for contributing content. Of course it helped that the emails were coming from a fellow Dartmouth student, and people might be less receptive to random emails from developers. But Intercom allows you to send your users messages through the app, which can be much less intrusive and is a great way to keep connected.

Be Your Best User

No amount of automated integration tests can make up for actually using your site on a regular basis. I’m far from the first person to point this out: see this awesome post from the engineering blog and this short yet sweet post from the creator of Forrst. Actually becoming a user of your site will allow you to see how all of the separate features you’re working on come together to create a hopefully coherent whole experience. It’s important to try your best to think like an average user, rather than a developer, however. Initially, our page editor was built around Markdown, a clear choice for those already familiar with blogging syntaxes. But we soon realized that the average user would much prefer to use a simpler HTML page builder with a live preview of the results. (We used wysihtml5 and loved it.)

Sites with user generated content have another reason for their founders to be active — if not the best — users, at least in the sites’ early stages. Sites like these have a classic chicken-and-egg problem: the content needs to come from the average citizens of the internet, but people are less likely to contribute to a site that doesn’t already have a lot of content and readers. You can have exponential growth in users and contributors, but not without a starting spark of content to get people interested. That content has to come from you and your team. We seeded DGD with content from already existing organization’s websites scattered across the internet, and as a result provided a great service from day one. The first contributions were from organizations that edited their already existing pages, and soon after, groups that never had a web presence started creating pages.

This is far from an exhaustive list. Nevertheless, these four elements are essential best practices we discovered through making and marketing DGD and should be on your mind as early as possible during the design stage. We spend a lot of time learning how to engineer fantastic solutions to the world’s problems, but these efforts can fall flat if you forget the less tangible human element that governs whether or not people will realize just how great your product is.

Hacking is Cool: The True Lives of Computer Science Majors at Liberal Arts Colleges

July 1, 2012

Sudikoff, the computer science building at Dartmouth College, whose basement is filled later than any fraternity’s, has a way of making people go slightly insane. Maybe it’s the grueling lab assignments. Maybe it’s the long hours without sunlight. Maybe it’s the frustration of being told by a Teacher’s Assistant that you have to completely rewrite your implementation of breadth-first search, as I would be told at 4 am one late night.

Whatever it is, the building now known as Sudikoff has not changed much since it was first built in the late 1800’s. The bathrooms complete with showers hint at its last incarnation: the building used to be a mental ward for a now long-gone hospital. Today, “the Koff” houses the crazies that decide to major in computer science at a liberal arts college.

The stereotypical computer science nerd is male, wears thick-rimmed glasses and pocket protectors, and can speak shell script to the Linux kernel better than he can speak to girls. I’ve met this stereotype many times, and many of them are great people that I consider my good friends. But at Dartmouth, I have seen jocks from the South, preppy kids from New England boarding schools, and hipsters from California join the ranks of the outwardly eccentric in high-level computer science classes.

Regardless of their outward appearance or the initial impressions they are capable of giving, computer science majors at Dartmouth are distinct from the rest of campus. All of the students that get through the intro courses and still want more share a passion bordering on obsessiveness for building things with computers, and their “inner nerd” invariably seeps out of their academic lives.

“Obsessiveness” is definitely the correct word to describe my long history with programming. My fascination with computers has been with me my entire life – some of my earliest memories are of me playing Reader Rabbit games and dreaming up stories for my own video games. The problem was that, while I was an adept consumer of computer content and I had some conception that to make video games you had to type code into a computer, I had no idea where to actually type the code. Simply opening up Microsoft Word and typing instructions didn’t work. (Trust me, I’ve tried.)

Then came sixth grade computer class. Most of the year was spent bringing my deficient typing speed up to 30 words per minute, but the last project of the year involved building a website about a famous person that shares your birthday. I picked Harry Houdini.

I was so excited that I spent about 15 hours and learned the basics of two computer languages for an assignment that everyone else in the class finished using a website builder in two 45 minute class periods. While most peoples’ sites were nothing more than a white background with blocks of text largely copied from Wikipedia, mine was awesome. It was ominously black with red links. It had a rotating and fading spooky animation of the words “Harry Houdini” at the top. It had a section with three different Harry Houdini-related games accessible only if you knew the password. It represented everything that defined how the Internet looked nine years ago and makes the web-savvy of 2012 groan. I was incredibly proud of it.

The Harry Houdini website turned out to be just the tip of the iceberg. I remember many late nights spent trying to get a picture of a blue ball to bounce around my screen. I would give up in frustration and get into bed when it wasn’t working, only to have a sudden flash of insight and jump up to the computer to give it another shot. The “Tests and Random Websites” folder on my old laptop has about 50 different files, including a game where you have to drop a ball from one moving platform to another, a calculator program, and a clone of Frogger.

In high school, the games I made got more complicated. My two favorites were a lunar lander game, where the user flies a spacecraft through various obstacles, and a game called “Ant” that I emailed to my entire class where the user is an ant that has to collect leaves and twigs while avoiding a horde of bees.

Throughout the trials and tribulations of my early years as a programmer, I never thought what I was doing was a legitimate academic discipline that I might pursue in college. Making the ant game did manage to teach me the basics of trigonometry long before we covered it in math class. But programming was just a hobby. All I really wanted to do was tell stories of my own creation through video games and have a cool personal website that I could show off to people. Because of that mindset, computer people would call me a “hacker,” and they wouldn’t mean that I was skilled in accessing people’s personal, password-protected data in the way that most people use the term. They would use the term’s original meaning: a hacker is someone who builds things (or “hacks” them together) using computers.

My story, though strange, is far from unique. If you keep track of Hacker News, you’ll read a similar origin story once a week. People do not learn to program because they love code. They do it because they love the end result. Oftentimes they want to build a better video game or website. Sometimes they want to build a bot for the MMORPG they’ve spent way too much time playing. Nevertheless, every programmer that I’ve met recalls the wonder and excitement they felt when they first started to work with computers, and looking at their eyes, it’s easy to see that they still haven’t fully lost that sense.

Even at Dartmouth, in the more high-browed academic world of “computer scientists,” the hacker ethos in it’s original sense is still very much alive. Even computer science professors love the playfully creative, and sometimes devious, nature of hackers. One of my professors is fond of telling a story about a student who, during a lecture, figured out how to remotely access the professor’s screen and began writing “offensive” messages for the whole class to see. When the student confessed to the crime a couple of terms later, the professor was more impressed than angry. He would go on to judge that student’s thesis, and while the professor pretends that he sabotaged the student’s thesis and “got the last laugh,” he called the student a “genius.”

Dartmouth hackers sometimes switch into what appears to be a foreign language. While editing an issue of the school paper, I had a conversation with the paper’s technical director about the inner workings of their website. We were discussing how to make a headline span the entire top portion of if it was especially important. What we actually said was: “yeah you’re right that PHP is close to C, but Ruby is way more expressive, and gives you really modular code running on Rails compared to Zend or WordPress” and “that’s easy with MVC, you’ll just have the same model, but you’ll call a different layout in the controller” and “can you add my handle to the git repo?”

“Look at them in their natural habitat,” laughed the editor-in-chief, who was seeing the hacker side of me for the first time.

Hackers can get into arguments about seemingly trivial things, like how to write code efficiently. The more code you write, the more the little extra keystrokes start to get on your nerves – clicking on File, then Save, switching between windows, even using the mouse at all. Many hackers prefer to use primitive text editors that were made before computers had mouses so that their hands never have to leave the “left pointer on f, right pointer on j” typing position. In the text editor known as vim, you hit Escape if you want to issue a command, such as typing “j” to move the cursor down one line of text, and you press “i” if you want to actually type text at the position of the cursor. That’s overly complicated for the average user, but a lifesaver for the hacker.

Watching a fellow hacker code inefficiently can be painful. During my second college computer science course, I went to office hours to get help on an assignment, and as the TA watched me work with my computer, he became progressively more flustered. Finally, he stopped me, went on a rant about how Sublime Text 2 was a better text editor than the one I was using, and refused to help me further until I downloaded it. When I complained that I had already paid for my text editor and did not want to spend another $50, he said my argument did not make any sense according to fundamental principles of economics.

“If you’ve already paid for Justin Bieber tickets and you have the opportunity to buy tickets to a better concert, do you go to the Bieber concert, or do you buy the other tickets?” he asked.

I had no choice but to concede that he was right.

His reaction, however, did bring out one of the less amiable traits of hackers: while they are typically right, they are frequently stubbornly and arrogantly right. In a discipline where there are infinite ways to solve a given problem but usually only one elegant way, seeing “bad” code often triggers hacker disdain. What exactly is elegant code? That’s a question for which many millions of words of tech blog posts have been spilt. In general, it means code that is written in short (Facebook requires that lines of code never exceed 80 characters), expressive lines. The function of the line of code should be almost immediately clear to a proficient reader.

What is certain is that code that does not meet this elusive criteria of elegance can often be mocked or immediately written off by good programmers. In that same meeting with the TA, when I was not immediately able to see the right way to solve my problems, he just told me that I must be brain-dead from lack of sleep. He prescribed a nap followed by caffeine. (I will admit that this was helpful.)

I am guilty of this arrogance as well. When I’ve tried to help a fellow student with a computer science problem, and they don’t immediately understand what I am saying, I have to muster all of my strength not to burst out laughing.

I remember having trouble explaining an implementation of a doubly-linked list to a fellow classmate in my intro to computer science course. The ten lines of code were intimidating when looked at as a block, and filled with odd syntax, but when taken slowly, line-by-line, every line has a purpose and makes sense. After drawing numerous diagrams, going through each line multiple times, and explaining the overall concept more than once, I burst out laughing. She was not pleased, to say the least. But to me, the arrogant hacker who already understood the problem, it seemed as simple as understanding her next sentence: “You’re not very good at explaining this stuff.”

While many hackers will wave their arms and say “well, and then there’s some magic” as they explain how some code works, they are just being lazy. Computer science is not magic. Every last character in a line of code has meaning, and there are no concepts that you “just have to memorize,” as a Chemistry major once told me as he was describing how awful Organic Chemistry is. In programming, everything can be explained down to the level of transistors transferring electricity.

Those few who actually understand computer science feel a sense of geek-superiority over the rest of the Dartmouth campus. Certainly we anticipate a quick payoff for our hours of work. At 4 am in the basement of Sudikoff, I asked the TA – who, due to both dedication and his being paid by the hour, was still in Sudikoff with the ten students trying to finish the assignment – whether this late night would typify the rest of my time at Dartmouth. With a smile that said “you don’t even know how right you are,” he nodded.

“It’s three years of your life for a guaranteed six-figure salary,” he said. “It’s worth it.”

Even the computer science professors are arrogant, if a bit facetiously. “What did the computer science major say to the art history major?” my professor asked on the first day of class. He paused, then answered, smirking, “Can I get fries with my burger?”

Often enough, computer science majors get so caught up in this mindset that they find themselves unable to perform in classes that are not math or engineering-based. A friend of mine who is planning on double majoring in computer science and engineering sciences, told me that he tries to minimize the time he spends in classes he deems useless. He told me recently that focusing on his international studies class in the previous term proved impossible.

“I just kept thinking, what am I doing here? And so I stopped doing the readings,” he explained.

But that attitude is changing, as a more diverse crowd of people flock to the major. What used to feel like an exclusive club has very recently begun to broaden to include people who are also interested in other disciplines. One girl I talked to decided to stop majoring in computer science and instead major in creative writing, but still kept computer science as her minor. These two disciplines seem to epitomize the unbreachable left-brain right-brain split, and yet she told me she felt they complemented each other nicely.

The numbers speak for themselves: one of my professors spent the whole first lecture trying to scare us into dropping the class because the enrollment was three times too high this term. Yet the room was still packed for the next class session.

“Mission: Scare people away from this class = fail,” a friend sitting next to me scribbled in his notebook.

A hacker friend at an internet start up I worked at has a theory on why interest in computer science has skyrocketed over the past year: he blames The Social Network, the 2010 blockbuster film that glorifies the college years of hacker-extraordinaire Mark Zuckerberg, the founder of Facebook. While the movie probably is not solely to blame, it is true that computer programming has suddenly become cool. With the wild success of Facebook, Twitter, Foursquare, Instagram, and countless other companies founded and run by hackers, dropping out of school to be a tech entrepreneur seems to be slowly replacing corporate recruiting as the most sought-after outcome of an undergraduate education.

Right now, many Dartmouth hackers that saw themselves as a special breed aren’t too happy with the influx of new majors. If just anyone can hack, then it’s not quite as exclusive as we thought. Time will tell whether those lured by the get-rich-quick promises of hacking will endure the suffering required of computer science majors.

Back in the basement of Sudikoff, I put these grand thoughts in the back of my mind as I frantically tried to finish a computer program that could, given any actor’s name, print out the shortest distance between that actor and Kevin Bacon, using only their co-stars and the movies they have been in. That’s a pretty difficult problem, made even more difficult by the fact that it needed to be written in the esoteric programming language called Haskell. It’s not easy to describe why Haskell is so challenging – suffice it to say that it edges out “moments of inertia” from AP physics as the most difficult concept I’ve ever had to hold in my brain. The assignment was due the next day.

At 5 am, the TA announced that those of us that had entered the cramped lab at 9 pm and were still around had just completed a full 9-to-5 workday. So had he.

“Congratulations,” he said.

A couple months later, as I walked into Sudikoff yet again, I knew that I had easily another 15 hours hunched over my Macbook before my latest lab assignment could be submitted. I thanked myself for learning that starting early is not only recommended, but essential to surviving a computer science major.

I settled into the desk chair and heard the familiar whirr of computer fans and the sound of typing and the buzzing of the electric lights above my head. I was actually looking forward to those 15 hours. Maybe I really had gone a little insane.

HTML Emails that Work

June 30, 2012
tags: , ,

Successfully branding a web app requires carefully designing every aspect of its interaction with users. If you’ll be sending any emails to your customers, you’ll probably want to send something a little more impressive than simple plain-text. To do this, you’ll have to code an HTML email. This is not an easy task — making a complicated layout look good in a variety of email clients is an order of magnitude more challenging than making a website cross-browser compatible.

The main rule: forget everything you learned about CSS3 best-practices, and go back to how you coded websites ten years ago. If you don’t want to read past this second paragraph, here are the rules of thumb that inspired all the rest of the tips in this post:

  • Keep the total width of the email less than or equal to 600px. You can ensure this by enclosing the whole email in a table with a width attribute of “600”Use HTML formatting tags instead of CSS whenever possible.
  • All CSS that you do use (and it’s totally legitimate to use CSS for most things, just avoid positioning) must be inline. You can code your email with CSS in <style> tags in the <head> and then run it through Mailchimp’s free automatic CSS inliner tool to make this process a lot easier.
  • All email content must be entirely static — you can’t use any Javascript to help with layout.

Layout Without the Box Model

You can still achieve pretty complicated layouts without using CSS positioning, it just takes a bit more work to get everything to look good in a variety of email clients. Some HTML emails get around this layout problem by simply sending an email with a large picture that has all of the email content on it. While these emails are easy to create, avoid this temptation. Having all text in your email as actual text will help please spam filters, is much more user friendly, and will help email clients give relevant text previews of your email.

You’ll have to use tables to achieve anything other than very basic layouts. Using cellpadding, align, and valign attributes on table tags, you should be all set. Properly aligning text and images can be difficult and may require image slicing. Make sure that all images have “display: block” CSS applied to them, or else images that are meant to be displayed flush with each other will have a small separation between them in Gmail and Hotmail (for mysterious reasons).

Adjusting space above and below elements should, in most cases, be done with br tags. You can also use the line-height CSS attribute on p tags, just be sure to test this in a variety of email clients because many handle this attribute differently.

Styling Links

People tend to respond to default blue text links in emails, so don’t stray from this styling unless you have a good reason. If you do want custom links, be sure to define any custom colors within a font tag that is within the a tag. This is necessary because some email clients (like Gmail) make all links target=”_blank”, and in the process, strip out any color CSS you’ve added to the link.

Sometimes email clients will link text that you don’t want to be linked, such as text that looks like a URL or email addresses. There are two ways to deal with this:

  • You can control the style of the link by explicitly defining the text as a link and styling it appropriately.
  • You can enclose the text in an anchor tag without a href= attribute to make it behave like normal text and prevent it from being automatically linked. This won’t work in all email clients, most notably the iPad/iPhone.


The best way to include images in HTML emails is to host the image somewhere with a publicly-accessible URL, and use that full URL to refer to the image in the src attribute of an img tag. As explained above, all images should have the display: block CSS attribute so that Gmail and Hotmail handle them correctly without adding on extra space between images.

Resizing images isn’t smart in some email clients, i.e. if you just specify a width or a height for the image, it won’t preserve the aspect ratio of the image. If you want to display an image not in its original size, you’ll have to calculate both the width and the height of the image and explicitly write them in HTML. It’s good practice to define the width and the height in both the style attribute and the width and height attributes, because some email clients won’t recognize the width and height CSS definitions. The easiest way I’ve found to determine both the height and the width for an image given only one is to define one and then inspect the element with Chrome to determine the other measurement.

Even if the image will be displayed in its original size, it’s good practice to define the width and height anyway. Many email clients don’t display images by default, and by defining a width and height for the images, the placeholder that the client uses will be the proper size and won’t break your layout. In most email clients, specifying a background-color CSS attribute and a bgcolor img attribute will display a colored block instead of the default image placeholder before the image loads, which can greatly improve the look of your email when images are turned off.

Other Tips

  • It’s not currently possible to use fonts that aren’t installed on a user’s computer, i.e. font embedding won’t work
  • Avoid custom characters for <li> elements; it’s not possible to define pseudo-classes inline, and Gmail strips out the text-indent property, making it impossible to space it manually.
  • Use Mailchimp’s inbox inspection tool (not free) to see how your email looks in multiple email clients and OSs

Hunting Bugs or: How I Learned to Stop Worrying and Love git bisect

June 30, 2012

Getting assignments to write new features on a web app is always way more exciting than looking at Pivotal Tracker and seeing a list of bugs to fix. But I have to admit that probably the most satisfying feeling as a programmer is the feeling you get when you’ve solved a complicated bug in a clean way. I’m not talking about bugs that are caused by syntax errors or accidentally using the wrong methods. I’m talking about the bugs that have seemingly innocuous symptoms, but end up taking you deep into the rabbit hole of your app.

The work here is the hunt — once you’ve actually tracked down what’s causing the problem, it’s generally very easy to fix the issue. When you’re working on a large codebase with multiple contributors for an asynchronous web 3.0 app, this can get really interesting (and frustrating). What do you do when the problem isn’t immediately apparent?

About 25% of the time, you get lucky and Google does all of the work for you — simply copy and paste the error message that the application is giving you or type in a short description of the issue and you’ll be surprised how many people have encountered the same problem and have solved it already. Stack Overflow is a goldmine of bug solutions, and so is the vast ecosystem of tech bloggers (like me) who give you step-by-step instructions on how to solve bugs.

70% of the time, it’s a bit more of a challenge. The bug is specific to the way your team has set up the application and no one has encountered a bug quite like the one you’re currently facing. Google doesn’t help at all. Here’s where bug-hunting becomes much more of an art than a science.

Typically the process is fairly quick if you’ve written the code that you’re debugging — you already know what assumptions you’ve made in developing the feature and what events are firing when, and so strange behavior can often be tracked down fairly quickly. Luckily, you’re using git (right?), so even if you didn’t write the code you’re working with, you can quickly track down who did. Simply run git blame path/to/file in your console and git will output a line-by-line summary of who wrote each line of code.

If that doesn’t get you closer to a solution, the next step is to console.log (or its equivalent) everything. I’ve had the opportunity to see expert programmers hunt down bugs at, and that’s exactly what they do. There’s generally no need to use fancy debugging software or set breakpoints or anything complicated like that — just output to the console when events are firing, when methods are being called, and what the value of variables are at specific points in time. Obviously, you need to make educated guesses as to what might be causing the problem, and then figure out whether you’re right by outputting values to the console.

What about the last 5%? It’s reserved for the worst kind of bug. The bug that is impossible to fix. You’ve searched all over the internet and no one seems to ever have had the same problem. The code you’ve written and poured over is perfect and definitely is not the cause of the problem. The project you’re working on has thousands of lines of asynchronous code — literally anything could be going wrong and you have no leads whatsoever. Don’t worry, there’s still hope. git bisect is still your friend.

git bisect works when you know that a feature worked at one point in your app’s history and no longer works in your most recent commit. Don’t know offhand when the feature actually worked? git blame can help here — find out when the feature that’s no longer working was merged into master, and you can hopefully safely assume that the feature was working at that point.

Once you’ve located a good point and a bad point in your code, run git bisect start in your current branch. Then, since a git repository is basically a sorted list of previous states of your app’s codebase, git can perform a step-by-step binary search to find the first point in your app’s history where the feature stopped working. At each step, you run either git bisect good or git bisect bad in your console depending on whether the feature works or doesn’t work in the commit that git bisect picks. Eventually, the program will output the commit id of the commit that broke the feature. It’s an amazing feeling when you’ve been struggling with a bug for a day or two.

git bisect isn’t always that simple when you’re running it on an app that has a bunch of dependencies. Here’s how to get around any snags if you’re running git bisect on a rails app:

  • Restart your rails server and run bundle install at each commit that git bisect chooses. You might also need to manually compile assets at each point if something isn’t doing that for you automatically.
  • Make sure your “good” starting commit is the commit that merged the feature you’re debugging into master. Programmers don’t always commit 100% working code, but when it comes time to actually merge a pull request, you can be sure that the app ran smoothly and specs were passing.
  • Sometimes the app won’t run on old commits because paths in old versions of your Gemfile aren’t valid anymore. For example, when I used git bisect on’s codebase, a co-worker’s change to his Github username caused multiple bundle install failures. You’ll have to address these issues manually. If you can’t figure out why a gem isn’t installing, just replace the line with the corresponding line in the app’s most recent Gemfile.

Aside from those potential issues, git bisect feels like magic for those epic bugs that refuse to be solved. Once you’ve found the commit that broke the feature, you’re almost certainly 99% of the way to solving the problem.

Embedding API Sandboxes in Documentation

August 18, 2011

Last week I released a jQuery plugin called API Sandbox to help developers of web apps expose their API in a guided sandbox environment. The usage is simple. On a template, make sure there is a dedicated div element available to place a sandbox and then simply call the apiSandbox function on it.


This would create a nicely animated sandbox environment with proper fields for the parameters expressed in the path (in this case, one for user_id). API Sandbox supports generic URL parameters at the end of the API path, in addition to symbol wildcards anywhere in the path, preceded by a “:”.

There are a bunch of applications for this, and I’m still making improvements to the plugin, but I want to talk about one particularly cool application today. This plugin combined with the latest version of Redcarpet (2.0.0b), a Markdown parser, can allow you to easily embed sandboxes inline with your API documentation.

What’s so cool about the new version of Redcarpet is that instead of simply relying on the plugin author’s interpretation of how Markdown should be transformed into HTML, you can specify your own rendering rules and just let Redcarpet do all of the parsing. To easily embed sandboxes, I chose to override Redcarpet’s default rendering of links and change links to API paths into dynamic sandboxes. I created a subclass of Redcarpet::Render::HTML and created override methods for link and doc_header.

class DocsParser < Redcarpet::Render::HTML
  def link(link, title, content)
    if link.include? "SANDBOX: "
      path = link.gsub("SANDBOX: ","")
      id = path.gsub(/[\/?&=\[\]]/,"")
      rendered = "<div id='sandbox'>"
      rendered += content + "<div id='" + path + "' class='sandbox " + id + "'></div>"
      rendered += "</div>"
      link = link || ""
      title = title || ""
      content = content || ""
      rendered = "<a href='" +link + "'>" + content + "</a>"

  def doc_header
    "<script>new App.Views.Docs({ el: $('#main_content') });</script>"

All I did here was render a div element with the id of sandbox instead of a link when the path has a prefix of SANDBOX:, and then added in a script in the header of all generated documentation pages to load the script that changes the divs into sandboxes (the site I’m working on uses Backbone.js, hence the new App.Views.Docs).

Then you have to write an epic 2 lines of CoffeeScript code to turn all of the links into awesome API sandboxes.

$("div.sandbox").each ->
      $(this).apiSandbox "get", $(this).attr("id")

Not bad at all.

API Sandbox is just the first part of a CoffeeScript/SASS/Ruby API explorer I’ll be releasing over the next couple of weeks, so get excited.

Building Site Navigation with Markdown and Nokogiri

August 2, 2011

Who knew that Markdown — in my opinion the best text-to-HTML syntax available — could be used for something other than blog posts (or Github readme files)? Turns out that Markdown can be used as a powerful way to make web app navigation really simple to edit, even by non-developers. The task I set out to do was to turn something like this:

* Home
* About
    * About Us
    * API
    * Terms of Use
* Browse
* Etc.

into a nice dropdown menu. Making this markdown text available to a view is a fairly simple task. In Ruby on Rails, if you have a Page model that can hold Markdown text, you simply have to fetch it in the controller and call .to_html.html_safe on it. Also, with the help of some ternary operator elegance, you should make sure that if the page doesn’t exist, the code doesn’t blow up.

c_nav = Page.find_by_slug 'client-nav'
@client_nav = c_nav.nil? ? '' : c_nav.to_html.html_safe

So now, with the addition of a simple line in the view where we dump the contents of @client_nav onto the page, we’ve accomplished displaying a terrible looking unordered list when we could have just hard coded a nice looking menu in Haml. Here comes the fun part.

The obvious first choice for this task was to use jQuery to dynamically add CSS styles to the unordered list to turn it into a nice navigation menu. With the use of jQuery’s addClass method this is fairly simple, and it worked when I implemented it with just a couple lines of code. Unfortunately it resulted in annoying flickering as the page loaded because the unordered list is displayed un-styled for a split second (unacceptable). This could probably be fixed, but why bother doing something on the browser when it can be handled perfectly well on the server? After a couple of painful hours attempting to add classes to the generated HTML using Ruby string manipulation techniques and having little success, I discovered that this problem had already been solved by the creators of the Nokogiri gem.

To begin turning the raw HTML into styled goodness, I first stripped out the opening and closing ul tags manually. I enclosed the code to do this in a beginrescueend block because the test suite I’m using (Rspec) didn’t like me using negative numbers for indexes in a string for some reason.

   @client_nav[0..3] = ""
   @client_nav[-5..-1] = ""

Then I let Nokogiri work it’s magic.

@client_nav = Nokogiri::HTML::DocumentFragment.parse(@client_nav)
@client_nav.css("li:root").each do |anchor|
   anchor['class'] = 'nav_item'
@client_nav.css("li:has(ul)").each do |anchor|
   anchor['class'] = "nav_item more"
@client_nav.css("ul:first-child").each do |anchor|
   anchor['class'] = "dropdown"
@client_nav = @client_nav.to_html.html_safe

Because the HTML I was passing to Nokogiri is just a fragment of an unordered list, I used the DocumentFragment.parse method to prepare the HTML for Nokogiri to dynamically add CSS. Then, using basic CSS selectors (as well as one really useful one — has() — taken from jQuery), I added the proper classes to the list elements. It worked like a charm.

I’ll leave the CSS magic up to you — suffice it to say that with the proper stylesheets, this can look really great and is infinitely configurable. Why bother with bulky client-side code when Nokogiri provides a just as (perhaps even more) elegant solution?