Creating Thumbnail of WebPages using WebThumb API

Joshua Eichorn, the author of “Understanding AJAX” and a renowned php developer recently released WebThumb, a site to create thumbnails of web pages. The whole system is developed in c and it uses Mozilla engine to render the webpage into images. Shortly after publishing WebThumb, he released a set of API so that developers can create thumbnails from their applications. This API is very simple to use. In this Article we are going to discuss how we can incorporate WebThumb API in PHP to create thumbnail of web pages from our PHP applications.

Using WebThumb API, you can generate a thumbnail in three steps. First you have to place a request containing the URL. As soon as your request is successful, WebThumb store your request in queue. That means you are not getting the thumbnail instantly (well, there are other factors also. to fetch an url requires time, so it is not possible to generate the thumbnail in real time) – In second step you have to check whether your thumbnail has been generated or it is still in the queue. If you get a green signal, you will proceed to the third step where you have to request a download URL of your thumbnails. Before jumping into the code, let’s take a look to the API format.

Step 1: Place a request
To request thumbnail for an URL, you should post an XML message to the WebThumb server in the following format

<webthumb>
	<apikey>apikey here</apikey>
	<request>
		<url>webthumb.bluga.net</url>
	</request>
</webthumb>

You may wonder at this stage that we didn’t discus about the apikey yet. So what is this and where from you can obtain your key? Simple – just register into WebThumb at http:// webthumb.bluga.net and go to your user page. You will find your API key listed there. You can also supply optional height and width parameter beside this url.

<webthumb>
	<apikey>apikey here</apikey>
	<request>
		<url>webthumb.bluga.net</url>
		<width>800</width>
		<height>600</height>
	</request>
</webthumb>

If your request is successful, you will get a response in following format which contains a job_id of your request. You can proceed to next using this job_id, so save it.

<webthumb>
	<jobs>
		<job estimate='20' time='2006-08-30 09:39:30' url='http://webthumb.bluga.net'>wt44f5bf42a0e4a</job>
	</jobs>
</webthumb>

Step 2: Checking the Job Status
After placing a successful request, we need to check the status of our job. To check the status, make a request in following format containing job_id

<webthumb>
	<apikey>apikey here</apikey>
	<status>
		<job>wt44f5bf42a0e4a</job>
	</status>
</webthumb>

WebThumb server will respond to your request in following format

<webthumb>
	<jobStatus>
		<status id='wt44f5bf42a0e4a' submissionTime='2006-08-30 09:39:30'
		           browserWidth='1024' browserHeight='768'
	 	           pickup='http://webthumb.bluga.net/data/0e4a/wt44f5bf42a0e4a.zip'
		           completionTime='2006-08-30 09:39:38'>Complete</status>
	</jobStatus>
</webthumb>

In this response text you get the status “Completed”, if it is completed, you will also get the download url of a zip file which contains the thumbnail of your requested url in four different size.

Step 3: Download the thumbnail as an Image
In this step you can request your thumbnail as an image. WebThumb is able to serve your thumbnail in four different sizes which are “small”, “medium”, “medium2”, “large”. You can request in following format

<webthumb>
	<apikey>apikey here</apikey>
	<fetch>
		<job>wt44f5bf42a0e4a</job>
		<size>small</size>
	</fetch>
</webthumb>

So you can specify your desired size in this request. What WebThumb respond is the binary image data. So you may need to process it a bit different than other responses.

Getting your hands Dirty
So we discussed the basic structure of WebThumb API. Now we will create a PHP class which will be able to place all these requests and process responses respectively. So our class will feature the following methods.

requestThumbnail() – To place the first request
requestStatus() – to check the status
getThumbnail() – to download the thumbnail

Before writing our class, you may need to learn how we can place an XML request. Take a look at the following routine. We will use this function through out our code to place request.

private function _executeCurlRequest($request)
{
$_session = curl_init();
curl_setopt($_session, CURLOPT_URL, $this->_request_uri); curl_setopt ($_session, CURLOPT_POST, 1);
curl_setopt ($_session, CURLOPT_POSTFIELDS, $request);
curl_setopt($_session, CURLOPT_HTTPHEADER,
array( ‘Content-Type: application/xml’));
curl_setopt($_session, CURLOPT_RETURNTRANSFER, true);
$_response = curl_exec($_session);
return $_response;
}

Let’s code our class.

Listing 1: Webthumb.class.php

<?
/**
 *
 * This class generates thumbnail of any URL with the help of WebThumb API created by Joshua Eichorn
 *
 * @package 		WebThumb
 * @author 		Hasin Hayder [http://hasin.wordpress.com]
 * @copyright 	LGPL
 * @example 		usage.php
 * @since 		3rd September, 2006
 */
class WebThumb
{
	private $_api;
	private $_request_uri = "http://webthumb.bluga.net/api.php";

	/**
	 * just constructor
	 *
	 * @param string $api the api key from webthumb.com
	 */
	public function __construct($api=null)
	{
		$this->_api = $api;
	}

	/**
	 * manually enter the api key
	 *
	 * @param string $api the api key from webthumb.com
	 */
	public function setApi($api)
	{
		$this->_api = $api;
	}



	/**
	 * request a thumbnail
	 *
	 * @param string $url if you want to make thumbnail of a single URL, specifcy it here
	 * @param integer $width optional width of the thumbnail
	 * @param integer $height optional height of the thumbnail
	 * @return string JobStatus with id, estimated time, starting time and url of the job
	 */
	public function requestThumbnail($url = "", $width= "", $height = "")
	{

		$requests = "<url>{$url}</url>";
		if (!empty($height))
			$requests = "<height>{$height}</height>";
		if (!empty($width))
			$requests = "<width>{$height}</width>";

		$requests = "<request>".$requests."</request>";

		$_request = "<webthumb>
						 	<apikey>{$this->_api}</apikey>
						 	{$requests}
						 </webthumb>";

		$_response = $this->_executeCurlRequest($_request);

		$_sxml = simplexml_load_string($_response);

		$_jobs = array();
		foreach($_sxml->jobs->job as $job)
		{
			$_job = array("id"=>$job."",
			"estimate"=>$job['estimate']."",
			"time"=>$job['time']."",
			"url"=>$job['url']."");
			$_jobs[] = $_job;
		}
		return $_jobs;

	}

	/**
	 * return job status of a finished job
	 *
	 * @param string $job_id the job id returned by requestThumbnail() method
	 * @return array JobStatus with id, submissionTime, browserHeight, browserWidth, download URL, status and completionTime
	 */
	public function requestStatus($job_id)
	{
		$_request = "<webthumb>
							 <apikey>{$this->_api}</apikey>
							 <status>
							 	<job>{$job_id}</job>
							 </status>
						 </webthumb>";
		$_response = $this->_executeCurlRequest($_request);
		$_sxml = simplexml_load_string($_response);
		$status = $_sxml->jobStatus->status;
		$_status = array("id"=>$status['id']."",
		"submissionTime"=>$status['submissionTime']."",
		"browserWidth"=>$status['browserWidth']."",
		"browserHeight"=>$status['browserHeight']."",
		"pickup"=>$status['pickup']."",
		"status"=>$status."",
		"completionTime"=>$status['completionTime']."");
		return $_status;
	}

	/**
	 * return the thumbnail in preferred size.
	 *
	 * @param string $job_id the job id returned by requestThumbnail() method
	 * @param string $size it could be either of five types, "small","medium","medium2","large" and "zip"
	 * @return binarry data
	 */
	public function getThumbnail($job_id, $size)
	{
		$_request = "<webthumb>
							 <apikey>{$this->_api}</apikey>
							 <fetch>
								<job>{$job_id}</job>
								<size>{$size}</size>
							 </fetch>
						 </webthumb>";

		$_response = $this->_executeCurlRequest($_request);
		return $_response;
	}

	/**
	 * execute each request via CURL
	 *
	 * @access private
	 * @param string $request Request in XML Format
	 * @return string Response in XML Format
	 */
	private function _executeCurlRequest($request)
	{
		$_session = curl_init();
		curl_setopt($_session, CURLOPT_URL, $this->_request_uri); // set url to post to
		curl_setopt ($_session, CURLOPT_POST, 1);
		curl_setopt ($_session, CURLOPT_POSTFIELDS, $request);
		curl_setopt($_session, CURLOPT_HTTPHEADER, array( 'Content-Type: application/xml'));
		curl_setopt($_session, CURLOPT_RETURNTRANSFER, true);
		$_response = curl_exec($_session);
		return $_response;
	}

}
?>

Using our Class
Now it’s time to see the result. Let’s see how we can use this class

Listing 2: usage.php
<html>
	<title>Webpage Thumbnail generator</title>
	<body>
		<form method="POST">
			Please type the URL with "http://":<br/>
			<input type="text" name="url"><br/>
			<input type="submit" value="Generate"/>
		</form>
	</body>
</html>

<?
error_reporting("0");
if (!empty($_POST['url']))
{
include("Webthumb.class.php");
$wb= new WebThumb();
$wb->setApi("your-webthumb-api");
$job = $wb->requestThumbnail($_POST['url']);

$job_id = $job[0]['id'];

while (true)
{
	$job_status = $wb->requestStatus($job_id);
	//print_r($job_status);
	//echo "<hr>";
	$status = $job_status['status'];
	if ($status=="Complete")
		break;
	else
	{
		sleep(5);
		continue;
	}
}

echo "<img src = 'img.php?job_id={$job_id}' >";
}
?>

As WebThumb returns Image data directly so we mentioned earlier that we need to process it differently. That’s why we have to create another file which will process that binary data.

Listing 3: img.php
<?
error_reporting(0);
header("Content-type: image/jpeg");
include("Webthumb.class.php");
$wb= new WebThumb();
$wb->setApi("your-webthumb-api");
echo $wb->getThumbnail($_GET['job_id'], "large");
?>

That’s it.

Last minute talk
There are also some other Thumbnail services available like BrowserCam but they don’t expose API like WebThumb does. But they features thumbnails from a lot of browsers as they also offer paid services. WebThumb currently offer Thumbnail only rendered by Mozilla engine. The source code is also available for free. So the FOSS policy is the main reason why I took WebThumb. And I also like their nitty gritty API structure. Kudos to Joshua Eichorn for creating an amazing service like WebThumb and for making it free.