Edmonds Commerce Logo
  • home
    • blog
  • ecommerce
    • product catalogue
    • order processing
    • customer services
    • stock control
    • human resources
    • management information
  • development
    • oscommerce
    • php
    • mysql
    • open source
    • performance tuning
  • design
  • marketing
  • contact us
    • pricing

Edmonds Commerce Blog

Freelance PHP Ecommerce and SEO Developer in the UK

Latest Posts

CRELoaded Remove Google Ads -
ICECat Integration with osCommerce, Magento etc
Magento UK
PHP Cached Download Function

Most Popular Posts

PHP Email Attachment Function Freelance osCommerce UK Ultimate osCommerce Checkout - Fast and Friendly PHP : Dead Easy Excel Export

Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically

February 14th, 2008
Read More curl, firefox, php, programming, spidering

One of the most useful and powerful things you can do with PHP is to create a programme which will simulate a web browser and can grab data, post data to forms and generally interact with other web sites - automatically.

For PHP to be able to work like this it must have the CURL library installed and active. It is the CURL library which actually handles all of the interaction and PHP is my scripting language of choice for interacting with CURL.

A simple CURL function is like this:

PLAIN TEXT
PHP:
  1. function curl($url){
  2.  
  3. $timeout = '300'; //how long before CURL gives up on this page
  4. $go = curl_init();
  5. curl_setopt ($go, CURLOPT_URL, $url);
  6. curl_setopt ($go, CURLOPT_RETURNTRANSFER, 1);
  7. curl_setopt ($go, CURLOPT_FOLLOWLOCATION, 1);
  8. curl_setopt ($go, CURLOPT_TIMEOUT, $timeout);
  9. $spage = curl_exec($go);
  10. curl_close($go);
  11. return $page;
  12.  
  13. }

This function when called and echoed will output the entire html of the $url specified.

For grabbing data from this page to be inserted into a database (for example when spidering a suppliers web site for product information to be inserted into your site) we then use regular expressions to find what we are looking for and then insert that into the database.

so for example if we wanted to grab the product title and we knew that this was wrapped in a h1 tag with the class "product title" we could use this regexp to grab this:

PLAIN TEXT
PHP:
  1. $page = curl($url);
  2.  
  3. $pattern = '%
  4. <h1 class="product_title">(.+?)</h1>
  5. %i';
  6.  
  7. preg_match($pattern,$page,$matches);
  8.  
  9. print_r($matches); //we can see the entire array of matches and choose which we want to insert into the database

We can also Post data to web sites using curl. This allows us to do all kinds of things including grabbing data that is displayed on the submission of post forms. Here is an example Curl Post Function:

PLAIN TEXT
PHP:
  1. function curl_post($url,$post_data){
  2.  
  3. $timeout = '300'; //how long before CURL gives up on this page
  4. $go = curl_init();
  5. curl_setopt ($go, CURLOPT_URL, $url);
  6. curl_setopt ($go, CURLOPT_RETURNTRANSFER, 1);
  7. curl_setopt ($go, CURLOPT_FOLLOWLOCATION, 1);
  8. curl_setopt ($go, CURLOPT_TIMEOUT, $timeout);
  9. //now for the post section
  10. curl_setopt($go, CURLOPT_POST, true);
  11.  
  12. curl_setopt($go, CURLOPT_POSTFIELDS, $post);
  13. $spage = curl_exec($go);
  14. curl_close($go);
  15. return $page;
  16. }

It can be tricky to figure out exactly what data should be in the post string. To help you out though is this incredibly handy addon for firefox: Live Http Headers.

This addon lets you see exactly what is going on between your browser and the web site you are visiting. This can quickly and easily give you the information you need to replicate the same behaviour with your CURL script.

Edmonds Commerce specialise in working with PHP and CURL. If you have any spidering, screen scraping or other application that requires PHP to actively interact with other web sites - get in touch today to see how we can help you benefit from this incredibly powerful technique.

Related Resources

http://www.phpfour.com/blog/2008/01/20/php-http-class/

http://www.phpclasses.org/browse/package/1988.html

http://www.phpit.net/article/using-curl-php/

http://skeymedia.com/intro-to-curl-with-php/

Bookmark this Post
Add 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Del.icio.usAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to diggAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to FURLAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to blinklistAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to redditAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Feed Me LinksAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to TechnoratiAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Yahoo My WebAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to NewsvineAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SocializerAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Ma.gnoliaAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Stumble UponAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Google BookmarksAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to RawSugarAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SquidooAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SpurlAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to BlinkBitsAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to NetvouzAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to RojoAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to BlogmarksAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to ShadowsAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Co.mments
Add 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to ScuttleAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to BloglinesAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to TailrankAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SegnaloAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to OKnotizieAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to NetscapeAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Bookmark.itAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to AskAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SmarkingAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to LinkagogoAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to DeliriousAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SocialdustAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Live-MSNAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SlashDotAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SphinnAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to DiggitaAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to SeotribuAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to FaceBookAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to UpnewsAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to WikioAdd 'Building Spiders: Grab Data, Post Forms and Interact with Web Sites Automatically' to Social Bookmarking Reloaded

Feed | Respond | Trackback

Leave a Reply

  • RSS Feed
  • Categories

    • apache
    • barcode
    • creloaded
    • curl
    • customer services
    • debugging
    • ecommerce
    • email
    • excel
    • firefox
    • flash
    • gd
    • graphs
    • hosting
    • icecat
    • internet news
    • javascript
    • link building
    • linux
    • magento
    • management
    • mod_rewrite
    • mysql
    • oscommerce
    • php
    • plesk
    • product catalogue
    • product feed
    • programming
    • regular expressions
    • scraping
    • search engine optimisation
    • spidering
    • ubuntu
    • web design
    • web development
    • Windows
    • xampp
    • zip
  • Archives

    • August 2008
    • July 2008
    • June 2008
    • May 2008
    • April 2008
    • March 2008
    • February 2008
  • Tags

    addons advanced adverts blackhat blocking css curl development directories find firefox google hosts file html javascript keywords links msn mysql myths operators oscommerce paid links paid placement performance php ppc reciprocal linking replace screen scraping security seo serp speed spider spidering tuning user friendly vista web web design web developer
  • Random Posts

    • PHP Random Sleep Function with Flush
    • File Comparison on Ubuntu
    • Who Needs Photoshop? PHP GD Images and Your Online Store
    • Purashop : SEO Services
    • PHP Save Images Using cURL
    • Advanced Google Search Queries
    • Regular Expression Test Tool
    • Advanced PHP Debug Function
    • MySQL Dump Partial Restore - Split MySQL Dump into Tables
    • Get Name from Email Address

Edmonds Commerce related questions? Send us a message or call us on 0844 357 0201.

Freelance PHP Web Design UK Commercial Web Design