Posted in software

URL Shortening Services

I always used to wonder how or why URL shortening services work. I also wrote earlier what would a DDoS attack on URL shortening service will lead it to.

As it turns out, URL shortening service is no magic – we use a hash function to compute the hash of the big URL and present the short URL or we use a random number to generate a key and then present the short URL. In both cases, there will be collisions and it needs to be resolved. Also, in both cases there needs to be a reverse look up table to match the hash to the long URL. But given that keys are short and more ordered, indices can be constructed to efficiently do the reverse lookup.

I learnt about 301 redirects today (to some extent) and its effects on SEO. There are 301 redirects and 302 redirects. The former is a permanent redirect while the latter is only a temporary redirect hinting that the source URL may as well come back to serve. Matt Cuts has a more detailed explanation of the same.

What implications this has on search engines? Imagine one fine day the crawler stumbles upon a shortened URL. The crawler indexes the contents (lets say the gyst of it is X and that this URL has a page rank of 1 😉 ). Now, in future, when a user searches for content X, then the search engine will show this link up in the first place. For a user (and a search engine), URLs are the key to contents. Now, should the search engine show the actual long URL or should it show the shortened URL? If it shows shortened URL, and the URL shortening service goes out of business the next day, then the first link is broken, bad for user and worse for the search engine.

Now back to 301 and 302 redirects – if the URL shortening service redirects to long URL with 302, then it means that it is only a temporary redirection and that the source URL will come back alive serving the content. If it is 301, then it tells the search engine that this source URL never serves the content and only the destination URL does. Here the search engine can make a decision to use the destination URL to index the content rather than the source URL. In the former case, what search engine does, Matt Cuts explains better. In either case, I think the decision to chose the URL is simple – in case of 301 redirects, pick the destination URL (the source URL may slowly vanish) and in the case of 302 redirects, pick the source URL (the destination URL may slowly vanish as the contents come back to the source URL).

The guy from explained that his URL shortening service uses 301 redirects and so does not meddle with SEO.

One thing to learn from all this is – use 301/302 redirects as appropriate instead of just removing the old links and keeping new links for them – because things will break else where.