Web scrapping in a smart way, making a “Today in History” object in PHP

12 thoughts on “Web scrapping in a smart way, making a “Today in History” object in PHP”

  1. In line with aurelian’s comments, using dom/xpath could really save you some hassles. With xpath you can query on attributes for given tags, etc which make your script a bit less dependent on the page content (you can’t get zero dependence but it minimizes it). You’d find your script a bit shorter.

    Another suggestion would be to cache the results using Zend_Cache or PEAR’s Cache_Lite.

  2. @Tony, Aurelian

    True. And sometime dom is more efficient than scrapping like this. But did you ever try to make a scrapper which parse all the linked in page after simulating POST ? Or some of the popular job sites? You will understand the real pain. In those cases, this policy works best.

    And yeah, I forgot to mention about caching. We should cache the page to avoid DOS to that service. Thanks

  3. Hi there Hasin. Thanks for posting the script as I’ve been searching for an example like it to scrap specific element within a page. Given the fact it was written two years ago, when scraping with this app all I retrieve is the Date with nothing else echoing out. I think Scopesys have changed their source code, apart from that the code’s function seems fine.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s