there are numbers of url shortening services available these days, including the good old tinyurl and something really short like u.nu. now when you get the short url shortened by using any of these services, you dont know where your browser is taking you! so if you are interested to figure out the original url hiding behind these short url, you need to have a little knowledge on how these services actually work. if you go to any of these short urls, they tell your browser “HTTP 30X: Object has moved” HTTP HEADER (optionally, some does it, some doesn’t) and then asks your browser to move to the original url using “Location” in HTTP HEADER. so all you have to do is just get the HTTP HEADER out first (PHP and Curl is pretty good at doing this, heh heh) and then parse the “Location” parameter from it.
lets see how that works in code
[sourcecode lang=”php”]
< ?php
$url = "http://tinyurl.com/2dfmty";
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_HEADER,true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,false);
$data = curl_exec($ch);
$pdata = http_parse_headers($data);
echo "Short URL: {$url}
“;
echo “Original URL: {$pdata[‘Location’]}”;
function http_parse_headers( $header )
{
$retVal = array();
$fields = explode(“\r\n”, preg_replace(‘/\x0D\x0A[\x09\x20]+/’, ‘ ‘, $header));
foreach( $fields as $field ) {
if( preg_match(‘/([^:]+): (.+)/m’, $field, $match) ) {
$match[1] = preg_replace(‘/(?< =^|[\x09\x20\x2D])./e', 'strtoupper("")', strtolower(trim($match[1])));
if( isset($retVal[$match[1]]) ) {
$retVal[$match[1]] = array($retVal[$match[1]], $match[2]);
} else {
$retVal[$match[1]] = trim($match[2]);
}
}
}
return $retVal;
}
?>
[/sourcecode]
now you see that the output of this code is
[sourcecode lang=”HTML”]
Short URL: http://tinyurl.com/2dfmty
Original URL: http://ghill.customer.netspace.net.au/embiggen/
[/sourcecode]
pretty interesting huh? if you analyze the full headers for each of these services you will find that most of them are using PHP in backend with Apache. only http://u.nu is using mod_rails (hence RoR) and bit.ly uses nginx 🙂
have fun in expanding!