expanding short url to original url using PHP and CURL

there are numbers of url shortening services available these days, including the good old tinyurl and something really short like u.nu. now when you get the short url shortened by using any of these services, you dont know where your browser is taking you! so if you are interested to figure out the original url hiding behind these short url, you need to have a little knowledge on how these services actually work. if you go to any of these short urls, they tell your browser “HTTP 30X: Object has moved” HTTP HEADER (optionally, some does it, some doesn’t) and then asks your browser to move to the original url using “Location” in HTTP HEADER. so all you have to do is just get the HTTP HEADER out first (PHP and Curl is pretty good at doing this, heh heh) and then parse the “Location” parameter from it.

lets see how that works in code

< ?php
$url = "http://tinyurl.com/2dfmty";
$ch = curl_init($url);
curl_setopt($ch,CURLOPT_HEADER,true);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,false);
$data = curl_exec($ch);
$pdata = http_parse_headers($data);
echo "Short URL: {$url}<br/>";
echo "Original URL: {$pdata['Location']}";


function http_parse_headers( $header )
    {
        $retVal = array();
        $fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $header));
        foreach( $fields as $field ) {
            if( preg_match('/([^:]+): (.+)/m', $field, $match) ) {
                $match[1] = preg_replace('/(?< =^|[\x09\x20\x2D])./e', 'strtoupper("")', strtolower(trim($match[1])));
                if( isset($retVal[$match[1]]) ) {
                    $retVal[$match[1]] = array($retVal[$match[1]], $match[2]);
                } else {
                    $retVal[$match[1]] = trim($match[2]);
                }
            }
        }
        return $retVal;
    }
?>

now you see that the output of this code is

Short URL: http://tinyurl.com/2dfmty 
Original URL: http://ghill.customer.netspace.net.au/embiggen/

pretty interesting huh? if you analyze the full headers for each of these services you will find that most of them are using PHP in backend with Apache. only http://u.nu is using mod_rails (hence RoR) and bit.ly uses nginx :)

have fun in expanding!

About these ads

54 thoughts on “expanding short url to original url using PHP and CURL

  1. Very cool using Curl! Alternatively you can use:

    $url = “http://tinyurl.com/2dfmty”;
    $realLocation = get_headers($url,1);
    echo $realLocation['Location'];

    If you just want the url. But there definitely is a lot that can be done with curl. Thanks! =)

  2. @iDayDream – that’s also cool and even easier when you dont have curl available in your hosting accoutn! i didn’t notice get_headers() function before!! thanks :)

  3. Thanks guru for another interesting post, I knew that they are using php header location like function to do this type of job, now it’s more clear to all of us. :D

    @iDayDream thanks for the short way, I didn’t know the get_headers() function before. This function really impressed me a lot.

  4. Hasin bhai,
    I was getting the following error while I was just running your code.
    Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (?< at offset 3 in C:\xampp\htdocs\lifeundersun.php on line 18

  5. Too bad this doesn’t always work… Like on digg.com shorteners!!! Stupid framed crap. Also if you do it a lot on tr.im they will ip block you. At least that’s my experience.

  6. Another way is sending only HEAD request and parse the headers. Dont know how to send HEAD request by CURL. But using stream_* functions it can be done

  7. This code shows error like:

    Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (?< at offset 3 in C:\xampp\htdocs\test\short.php on line 18

    can any one help me quickly

  8. I get Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (?< at offset 3 in /var/www/html/dev01/test2.php on line 21

  9. i am getting this error
    Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /home/designer/public_html/curl.php on line 11
    HTTP error: 301. What to do?

  10. Simpler:


    function expand_shortlink($url) {
    $headers = get_headers($url,1);
    if (!empty($headers['Location'])) {
    $headers['Location'] = (array) $headers['Location'];
    $url = array_pop($headers['Location']);
    }
    return $url;
    }

    This code handles multiple levels of redirection as well.

  11. Note that using get_headers() is slower than using cURL by a lot. Nearly twice as slow. get_headers() uses GET instead of sending a head request which is what cURL does when you pass the option. You may also want to consider following directs (another option in cURL) because some short URLs can be shortened again by other services and who knows whatever other redirects. You also may wish to set a timeout option in the cURL option as well for safety.

    If anyone is considering using this in a batch capacity… Also note that in my tests, resolving about 4,000 links took about an hour and a half. So keep in mind how intensive this process is because of all the dns resolving. Note that there are caching options with cURL that I’m not sure you benefit from with get_headers().

    I would always use cURL, though if all you’re looking for is the URL you may wish to simplify the regex to something like:

    if(preg_match_all(‘/Location:\s(.+?\s)/i’, $headers, $matches)) {
    $url = trim($matches[1][count($matches[1]) – 1]);
    }

    This accounts for possible redirects…Note if you follow them, the cURL is going to output all the headers and you’ll be after that last Location: xxxxx value.

  12. function get_expand_url_api($url){
    $detail=$longurl=”;
    $parsedata=array();
    $url= “http://api.longurl.org/v2/expand?url=”.urlencode($url).”&response-code=1&format=php”;
    //get the contents from the site by file_get_contents.
    $sXML= @file_get_contents($url);
    $data= unserialize($sXML);
    //print_r($data); die();

    if ($data['response-code']==200){
    return empty($data['long-url'])? $url : $data['long-url'];
    }
    else {
    return ”;
    }
    }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s