Categories
Cool Webapps CURL idea PHP Tricks

expanding short url to original url using PHP and CURL

there are numbers of url shortening services available these days, including the good old tinyurl and something really short like u.nu. now when you get the short url shortened by using any of these services, you dont know where your browser is taking you! so if you are interested to figure out the original url hiding behind these short url, you need to have a little knowledge on how these services actually work. if you go to any of these short urls, they tell your browser “HTTP 30X: Object has moved” HTTP HEADER (optionally, some does it, some doesn’t) and then asks your browser to move to the original url using “Location” in HTTP HEADER. so all you have to do is just get the HTTP HEADER out first (PHP and Curl is pretty good at doing this, heh heh) and then parse the “Location” parameter from it.

lets see how that works in code

[sourcecode lang=”php”]
< ?php $url = "http://tinyurl.com/2dfmty"; $ch = curl_init($url); curl_setopt($ch,CURLOPT_HEADER,true); curl_setopt($ch,CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,false); $data = curl_exec($ch); $pdata = http_parse_headers($data); echo "Short URL: {$url}
“;
echo “Original URL: {$pdata[‘Location’]}”;

function http_parse_headers( $header )
{
$retVal = array();
$fields = explode(“\r\n”, preg_replace(‘/\x0D\x0A[\x09\x20]+/’, ‘ ‘, $header));
foreach( $fields as $field ) {
if( preg_match(‘/([^:]+): (.+)/m’, $field, $match) ) {
$match[1] = preg_replace(‘/(?< =^|[\x09\x20\x2D])./e', 'strtoupper("")', strtolower(trim($match[1]))); if( isset($retVal[$match[1]]) ) { $retVal[$match[1]] = array($retVal[$match[1]], $match[2]); } else { $retVal[$match[1]] = trim($match[2]); } } } return $retVal; } ?>
[/sourcecode]

now you see that the output of this code is
[sourcecode lang=”HTML”]
Short URL: http://tinyurl.com/2dfmty
Original URL: http://ghill.customer.netspace.net.au/embiggen/
[/sourcecode]

pretty interesting huh? if you analyze the full headers for each of these services you will find that most of them are using PHP in backend with Apache. only http://u.nu is using mod_rails (hence RoR) and bit.ly uses nginx 🙂

have fun in expanding!

54 replies on “expanding short url to original url using PHP and CURL”

Very cool using Curl! Alternatively you can use:

$url = “http://tinyurl.com/2dfmty”;
$realLocation = get_headers($url,1);
echo $realLocation[‘Location’];

If you just want the url. But there definitely is a lot that can be done with curl. Thanks! =)

@iDayDream – that’s also cool and even easier when you dont have curl available in your hosting accoutn! i didn’t notice get_headers() function before!! thanks 🙂

Thanks guru for another interesting post, I knew that they are using php header location like function to do this type of job, now it’s more clear to all of us. 😀

@iDayDream thanks for the short way, I didn’t know the get_headers() function before. This function really impressed me a lot.

Hasin bhai,
I was getting the following error while I was just running your code.
Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (?< at offset 3 in C:xampphtdocslifeundersun.php on line 18

This code shows error like:

Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (?< at offset 3 in C:xampphtdocstestshort.php on line 18

can any one help me quickly

I get Warning: preg_replace() [function.preg-replace]: Compilation failed: unrecognized character after (?< at offset 3 in /var/www/html/dev01/test2.php on line 21

i am getting this error
Warning: curl_setopt() [function.curl-setopt]: CURLOPT_FOLLOWLOCATION cannot be activated when in safe_mode or an open_basedir is set in /home/designer/public_html/curl.php on line 11
HTTP error: 301. What to do?

Simpler:


function expand_shortlink($url) {
$headers = get_headers($url,1);
if (!empty($headers['Location'])) {
$headers['Location'] = (array) $headers['Location'];
$url = array_pop($headers['Location']);
}
return $url;
}

This code handles multiple levels of redirection as well.

Note that using get_headers() is slower than using cURL by a lot. Nearly twice as slow. get_headers() uses GET instead of sending a head request which is what cURL does when you pass the option. You may also want to consider following directs (another option in cURL) because some short URLs can be shortened again by other services and who knows whatever other redirects. You also may wish to set a timeout option in the cURL option as well for safety.

If anyone is considering using this in a batch capacity… Also note that in my tests, resolving about 4,000 links took about an hour and a half. So keep in mind how intensive this process is because of all the dns resolving. Note that there are caching options with cURL that I’m not sure you benefit from with get_headers().

I would always use cURL, though if all you’re looking for is the URL you may wish to simplify the regex to something like:

if(preg_match_all(‘/Location:s(.+?s)/i’, $headers, $matches)) {
$url = trim($matches[1][count($matches[1]) – 1]);
}

This accounts for possible redirects…Note if you follow them, the cURL is going to output all the headers and you’ll be after that last Location: xxxxx value.

function get_expand_url_api($url){
$detail=$longurl=”;
$parsedata=array();
$url= “http://api.longurl.org/v2/expand?url=”.urlencode($url).”&response-code=1&format=php”;
//get the contents from the site by file_get_contents.
$sXML= @file_get_contents($url);
$data= unserialize($sXML);
//print_r($data); die();

if ($data[‘response-code’]==200){
return empty($data[‘long-url’])? $url : $data[‘long-url’];
}
else {
return ”;
}
}

Leave a Reply

Your email address will not be published. Required fields are marked *