Categories
Cool Webapps PHP

Creating Thumbnail of WebPages using WebThumb API

Joshua Eichorn, the author of “Understanding AJAX” and a renowned php developer recently released WebThumb, a site to create thumbnails of web pages. The whole system is developed in c and it uses Mozilla engine to render the webpage into images. Shortly after publishing WebThumb, he released a set of API so that developers can create thumbnails from their applications. This API is very simple to use. In this Article we are going to discuss how we can incorporate WebThumb API in PHP to create thumbnail of web pages from our PHP applications.

Using WebThumb API, you can generate a thumbnail in three steps. First you have to place a request containing the URL. As soon as your request is successful, WebThumb store your request in queue. That means you are not getting the thumbnail instantly (well, there are other factors also. to fetch an url requires time, so it is not possible to generate the thumbnail in real time) – In second step you have to check whether your thumbnail has been generated or it is still in the queue. If you get a green signal, you will proceed to the third step where you have to request a download URL of your thumbnails. Before jumping into the code, let’s take a look to the API format.

Step 1: Place a request
To request thumbnail for an URL, you should post an XML message to the WebThumb server in the following format

<webthumb>
	<apikey>apikey here</apikey>
	<request>
		<url>webthumb.bluga.net</url>
	</request>
</webthumb>

You may wonder at this stage that we didn’t discus about the apikey yet. So what is this and where from you can obtain your key? Simple – just register into WebThumb at http:// webthumb.bluga.net and go to your user page. You will find your API key listed there. You can also supply optional height and width parameter beside this url.

<webthumb>
	<apikey>apikey here</apikey>
	<request>
		<url>webthumb.bluga.net</url>
		<width>800</width>
		<height>600</height>
	</request>
</webthumb>

If your request is successful, you will get a response in following format which contains a job_id of your request. You can proceed to next using this job_id, so save it.

<webthumb>
	<jobs>
		<job estimate='20' time='2006-08-30 09:39:30' url='http://webthumb.bluga.net'>wt44f5bf42a0e4a</job>
	</jobs>
</webthumb>

Step 2: Checking the Job Status
After placing a successful request, we need to check the status of our job. To check the status, make a request in following format containing job_id

<webthumb>
	<apikey>apikey here</apikey>
	<status>
		<job>wt44f5bf42a0e4a</job>
	</status>
</webthumb>

WebThumb server will respond to your request in following format

<webthumb>
	<jobStatus>
		<status id='wt44f5bf42a0e4a' submissionTime='2006-08-30 09:39:30'
		           browserWidth='1024' browserHeight='768'
	 	           pickup='http://webthumb.bluga.net/data/0e4a/wt44f5bf42a0e4a.zip'
		           completionTime='2006-08-30 09:39:38'>Complete</status>
	</jobStatus>
</webthumb>

In this response text you get the status “Completed”, if it is completed, you will also get the download url of a zip file which contains the thumbnail of your requested url in four different size.

Step 3: Download the thumbnail as an Image
In this step you can request your thumbnail as an image. WebThumb is able to serve your thumbnail in four different sizes which are “small”, “medium”, “medium2”, “large”. You can request in following format

<webthumb>
	<apikey>apikey here</apikey>
	<fetch>
		<job>wt44f5bf42a0e4a</job>
		<size>small</size>
	</fetch>
</webthumb>

So you can specify your desired size in this request. What WebThumb respond is the binary image data. So you may need to process it a bit different than other responses.

Getting your hands Dirty
So we discussed the basic structure of WebThumb API. Now we will create a PHP class which will be able to place all these requests and process responses respectively. So our class will feature the following methods.

requestThumbnail() – To place the first request
requestStatus() – to check the status
getThumbnail() – to download the thumbnail

Before writing our class, you may need to learn how we can place an XML request. Take a look at the following routine. We will use this function through out our code to place request.

private function _executeCurlRequest($request)
{
$_session = curl_init();
curl_setopt($_session, CURLOPT_URL, $this->_request_uri); curl_setopt ($_session, CURLOPT_POST, 1);
curl_setopt ($_session, CURLOPT_POSTFIELDS, $request);
curl_setopt($_session, CURLOPT_HTTPHEADER,
array( ‘Content-Type: application/xml’));
curl_setopt($_session, CURLOPT_RETURNTRANSFER, true);
$_response = curl_exec($_session);
return $_response;
}

Let’s code our class.

Listing 1: Webthumb.class.php

<?
/**
 *
 * This class generates thumbnail of any URL with the help of WebThumb API created by Joshua Eichorn
 *
 * @package 		WebThumb
 * @author 		Hasin Hayder [http://hasin.wordpress.com]
 * @copyright 	LGPL
 * @example 		usage.php
 * @since 		3rd September, 2006
 */
class WebThumb
{
	private $_api;
	private $_request_uri = "http://webthumb.bluga.net/api.php";

	/**
	 * just constructor
	 *
	 * @param string $api the api key from webthumb.com
	 */
	public function __construct($api=null)
	{
		$this->_api = $api;
	}

	/**
	 * manually enter the api key
	 *
	 * @param string $api the api key from webthumb.com
	 */
	public function setApi($api)
	{
		$this->_api = $api;
	}



	/**
	 * request a thumbnail
	 *
	 * @param string $url if you want to make thumbnail of a single URL, specifcy it here
	 * @param integer $width optional width of the thumbnail
	 * @param integer $height optional height of the thumbnail
	 * @return string JobStatus with id, estimated time, starting time and url of the job
	 */
	public function requestThumbnail($url = "", $width= "", $height = "")
	{

		$requests = "<url>{$url}</url>";
		if (!empty($height))
			$requests = "<height>{$height}</height>";
		if (!empty($width))
			$requests = "<width>{$height}</width>";

		$requests = "<request>".$requests."</request>";

		$_request = "<webthumb>
						 	<apikey>{$this->_api}</apikey>
						 	{$requests}
						 </webthumb>";

		$_response = $this->_executeCurlRequest($_request);

		$_sxml = simplexml_load_string($_response);

		$_jobs = array();
		foreach($_sxml->jobs->job as $job)
		{
			$_job = array("id"=>$job."",
			"estimate"=>$job['estimate']."",
			"time"=>$job['time']."",
			"url"=>$job['url']."");
			$_jobs[] = $_job;
		}
		return $_jobs;

	}

	/**
	 * return job status of a finished job
	 *
	 * @param string $job_id the job id returned by requestThumbnail() method
	 * @return array JobStatus with id, submissionTime, browserHeight, browserWidth, download URL, status and completionTime
	 */
	public function requestStatus($job_id)
	{
		$_request = "<webthumb>
							 <apikey>{$this->_api}</apikey>
							 <status>
							 	<job>{$job_id}</job>
							 </status>
						 </webthumb>";
		$_response = $this->_executeCurlRequest($_request);
		$_sxml = simplexml_load_string($_response);
		$status = $_sxml->jobStatus->status;
		$_status = array("id"=>$status['id']."",
		"submissionTime"=>$status['submissionTime']."",
		"browserWidth"=>$status['browserWidth']."",
		"browserHeight"=>$status['browserHeight']."",
		"pickup"=>$status['pickup']."",
		"status"=>$status."",
		"completionTime"=>$status['completionTime']."");
		return $_status;
	}

	/**
	 * return the thumbnail in preferred size.
	 *
	 * @param string $job_id the job id returned by requestThumbnail() method
	 * @param string $size it could be either of five types, "small","medium","medium2","large" and "zip"
	 * @return binarry data
	 */
	public function getThumbnail($job_id, $size)
	{
		$_request = "<webthumb>
							 <apikey>{$this->_api}</apikey>
							 <fetch>
								<job>{$job_id}</job>
								<size>{$size}</size>
							 </fetch>
						 </webthumb>";

		$_response = $this->_executeCurlRequest($_request);
		return $_response;
	}

	/**
	 * execute each request via CURL
	 *
	 * @access private
	 * @param string $request Request in XML Format
	 * @return string Response in XML Format
	 */
	private function _executeCurlRequest($request)
	{
		$_session = curl_init();
		curl_setopt($_session, CURLOPT_URL, $this->_request_uri); // set url to post to
		curl_setopt ($_session, CURLOPT_POST, 1);
		curl_setopt ($_session, CURLOPT_POSTFIELDS, $request);
		curl_setopt($_session, CURLOPT_HTTPHEADER, array( 'Content-Type: application/xml'));
		curl_setopt($_session, CURLOPT_RETURNTRANSFER, true);
		$_response = curl_exec($_session);
		return $_response;
	}

}
?>

Using our Class
Now it’s time to see the result. Let’s see how we can use this class

Listing 2: usage.php
<html>
	<title>Webpage Thumbnail generator</title>
	<body>
		<form method="POST">
			Please type the URL with "http://":<br/>
			<input type="text" name="url"><br/>
			<input type="submit" value="Generate"/>
		</form>
	</body>
</html>

<?
error_reporting("0");
if (!empty($_POST['url']))
{
include("Webthumb.class.php");
$wb= new WebThumb();
$wb->setApi("your-webthumb-api");
$job = $wb->requestThumbnail($_POST['url']);

$job_id = $job[0]['id'];

while (true)
{
	$job_status = $wb->requestStatus($job_id);
	//print_r($job_status);
	//echo "<hr>";
	$status = $job_status['status'];
	if ($status=="Complete")
		break;
	else
	{
		sleep(5);
		continue;
	}
}

echo "<img src = 'img.php?job_id={$job_id}' >";
}
?>

As WebThumb returns Image data directly so we mentioned earlier that we need to process it differently. That’s why we have to create another file which will process that binary data.

Listing 3: img.php
<?
error_reporting(0);
header("Content-type: image/jpeg");
include("Webthumb.class.php");
$wb= new WebThumb();
$wb->setApi("your-webthumb-api");
echo $wb->getThumbnail($_GET['job_id'], "large");
?>

That’s it.

Last minute talk
There are also some other Thumbnail services available like BrowserCam but they don’t expose API like WebThumb does. But they features thumbnails from a lot of browsers as they also offer paid services. WebThumb currently offer Thumbnail only rendered by Mozilla engine. The source code is also available for free. So the FOSS policy is the main reason why I took WebThumb. And I also like their nitty gritty API structure. Kudos to Joshua Eichorn for creating an amazing service like WebThumb and for making it free.

42 replies on “Creating Thumbnail of WebPages using WebThumb API”

I just wanted to say thanks to Hasin for writing this wrapper its great.

On Chris’s comment, yes its a possibliblity, there are two options. One is too get a pdf plugin working on my setup (though im not sure if that works or not). The other would be too integrate a pdf library into the rendering code. I’ll take a look at setting up a pdf plugin since thats pretty low effort.

Hi, i’m a frenchy 😉 thanks for you code very useful.it’s been a long time i was searching this kind of code to have thumbnails just putting an url in a field of a form.I’ve done all you said in your information to generate thumbnails.
I took usage.php and Webthumb.class.php and img.php on my ftp server.
I had my api number on http://webthumb.bluga.net/ section user.
i replace this api number into your script in usage.php and img.ph $wb->setApi(“myapinumber….”);.
After i launch usage.php , enter an url and nothing happens …
Did i make a mistake? can you tell me ?
Thanks a lot, and thanks for your very clever script.

Alternative for non-php5 and curl-disabled hosts (most webservers on this planet) :

The HTTP class from Manuel Lemos can replace some of the methods used by CURL extension :

http://www.phpclasses.org/browse/package/3.html

The Ister simplexml44 car replace the php5-only SimpleXML extension (needs EXPAT) :

http://www.ister.org/code/simplexml44/index.xhtml

Some reality :

http://www.nexen.net/images/stories/phpversion/200605/evolution.milieu.en.mini.png

@wayne – no no, because wbGetThumbnail() already returns u the binary content. So All you need to do is just storing the content in a file. Use the following code

file_put_contents($filename, $wb->getThumbnail($_GET[‘job_id’], “large”))

Thats it.

I was trying the webthumb on my localhost and it sure works, but when I uploaded and tested it on a web server (change mode of img.php to 777), no response or no thumbnail was created, could you please help on this sir hasin.

This will be highly appreciated. Thank you very much.

I noticed when enabled error_reporting, the private and public declaration could not be recognized so I removed it and came up with an error on simplexml_load_string() because this function does not exists on the class itself. Is this a new function from PHP. If not could you provide me the code on this, and if not I will try to create and mimick this function, trying to figure how 🙂

Great script!

Is there a way to make this get multiple images (from multiple URLs at a time? Or feed it a list of URLs?
And also to save them with filename domain1.com.jpeg, domain2.com.jpeg etc?

Maybe I can just pass them the $url variable as $filename for that?

Seems to me these changes would make it very useful for lots of people

Thanks
John

Good job!

Hasin, there’s a simple bug:

if (!empty($height))
$requests = “{$height}”;
if (!empty($width))
$requests = “{$height}”;

should be…

if (!empty($height))
$requests .= “{$height}”;
if (!empty($width))
$requests .= “{$height}”;

thanks!

I mean…

if (!empty($height))
$requests .= “{$height}”;
if (!empty($width))
$requests .= “{$width}”;

Grate Work
I need to save the file cause It will save a lot of resources when using this script, time even

To save the content can I use the gd library. And I tried with the file_put_contents(“”,””) but it did not work.

Leave a Reply

Your email address will not be published. Required fields are marked *