Snoopy-1.2.4-4

所属分类:WEB开发
开发工具:PHP
文件大小:27KB
下载次数:9
上传日期:2013-01-28 16:31:59
上 传 者xxlian1
说明:  采集类Snoopy,php、采集类,方法代码
(Collected in Snoopy php acquisition method code)

文件列表:
Snoopy-1.2.4 (0, 2008-10-22)
Snoopy-1.2.4\AUTHORS (242, 2008-10-22)
Snoopy-1.2.4\TODO (264, 2008-10-22)
Snoopy-1.2.4\Snoopy.class.php (37815, 2008-10-22)
Snoopy-1.2.4\INSTALL (99, 2008-10-22)
Snoopy-1.2.4\FAQ (880, 2008-10-22)
Snoopy-1.2.4\COPYING.lib (24389, 2008-10-22)
Snoopy-1.2.4\ChangeLog (4105, 2008-10-22)
Snoopy-1.2.4\NEWS (2265, 2008-10-22)
PHP采集利器:Snoopy_试用心得.txt (6301, 2013-01-28)

NAME: Snoopy - the PHP net client v1.2.4 SYNOPSIS: include "Snoopy.class.php"; $snoopy = new Snoopy; $snoopy->fetchtext("http://www.php.net/"); print $snoopy->results; $snoopy->fetchlinks("http://www.phpbuilder.com/"); print $snoopy->results; $submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html"; $submit_vars["q"] = "amiga"; $submit_vars["submit"] = "Search!"; $submit_vars["searchhost"] = "Altavista"; $snoopy->submit($submit_url,$submit_vars); print $snoopy->results; $snoopy->maxframes=5; $snoopy->fetch("http://www.ispi.net/"); echo "
\n";
	echo htmlentities($snoopy->results[0]); 
	echo htmlentities($snoopy->results[1]); 
	echo htmlentities($snoopy->results[2]); 
	echo "
\n"; $snoopy->fetchform("http://www.altavista.com"); print $snoopy->results; DESCRIPTION: What is Snoopy? Snoopy is a PHP class that simulates a web browser. It automates the task of retrieving web page content and posting forms, for example. Some of Snoopy's features: * easily fetch the contents of a web page * easily fetch the text from a web page (strip html tags) * easily fetch the the links from a web page * supports proxy hosts * supports basic user/pass authentication * supports setting user_agent, referer, cookies and header content * supports browser redirects, and controlled depth of redirects * expands fetched links to fully qualified URLs (default) * easily submit form data and retrieve the results * supports following html frames (added v0.92) * supports passing cookies on redirects (added v0.92) REQUIREMENTS: Snoopy requires PHP with PCRE (Perl Compatible Regular Expressions), which should be PHP 3.0.9 and up. For read timeout support, it requires PHP 4 Beta 4 or later. Snoopy was developed and tested with PHP 3.0.12. CLASS METHODS: fetch($URI) ----------- This is the method used for fetching the contents of a web page. $URI is the fully qualified URL of the page to fetch. The results of the fetch are stored in $this->results. If you are fetching frames, then $this->results contains each frame fetched in an array. fetchtext($URI) --------------- This behaves exactly like fetch() except that it only returns the text from the page, stripping out html tags and other irrelevant data. fetchform($URI) --------------- This behaves exactly like fetch() except that it only returns the form elements from the page, stripping out html tags and other irrelevant data. fetchlinks($URI) ---------------- This behaves exactly like fetch() except that it only returns the links from the page. By default, relative links are converted to their fully qualified URL form. submit($URI,$formvars) ---------------------- This submits a form to the specified $URI. $formvars is an array of the form variables to pass. submittext($URI,$formvars) -------------------------- This behaves exactly like submit() except that it only returns the text from the page, stripping out html tags and other irrelevant data. submitlinks($URI) ---------------- This behaves exactly like submit() except that it only returns the links from the page. By default, relative links are converted to their fully qualified URL form. CLASS VARIABLES: (default value in parenthesis) $host the host to connect to $port the port to connect to $proxy_host the proxy host to use, if any $proxy_port the proxy port to use, if any $agent the user agent to masqerade as (Snoopy v0.1) $referer referer information to pass, if any $cookies cookies to pass if any $rawheaders other header info to pass, if any $maxredirs maximum redirects to allow. 0=none allowed. (5) $offsiteok whether or not to allow redirects off-site. (true) $expandlinks whether or not to expand links to fully qualified URLs (true) $user authentication username, if any $pass authentication password, if any $accept http accept types (image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*) $error where errors are sent, if any $response_code responde code returned from server $headers headers returned from server $maxlength max return data length $read_timeout timeout on read operations (requires PHP 4 Beta 4+) set to 0 to disallow timeouts $timed_out true if a read operation timed out (requires PHP 4 Beta 4+) $maxframes number of frames we will follow $status http status of fetch $temp_dir temp directory that the webserver can write to. (/tmp) $curl_path system path to cURL binary, set to false if none EXAMPLES: Example: fetch a web page and display the return headers and the contents of the page (html-escaped): include "Snoopy.class.php"; $snoopy = new Snoopy; $snoopy->user = "joe"; $snoopy->pass = "bloe"; if($snoopy->fetch("http://www.slashdot.org/")) { echo "response code: ".$snoopy->response_code."
\n"; while(list($key,$val) = each($snoopy->headers)) echo $key.": ".$val."
\n"; echo "

\n"; echo "

".htmlspecialchars($snoopy->results)."
\n"; } else echo "error fetching document: ".$snoopy->error."\n"; Example: submit a form and print out the result headers and html-escaped page: include "Snoopy.class.php"; $snoopy = new Snoopy; $submit_url = "http://lnk.ispi.net/texis/scripts/msearch/netsearch.html"; $submit_vars["q"] = "amiga"; $submit_vars["submit"] = "Search!"; $submit_vars["searchhost"] = "Altavista"; if($snoopy->submit($submit_url,$submit_vars)) { while(list($key,$val) = each($snoopy->headers)) echo $key.": ".$val."
\n"; echo "

\n"; echo "

".htmlspecialchars($snoopy->results)."
\n"; } else echo "error fetching document: ".$snoopy->error."\n"; Example: showing functionality of all the variables: include "Snoopy.class.php"; $snoopy = new Snoopy; $snoopy->proxy_host = "my.proxy.host"; $snoopy->proxy_port = "8080"; $snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows ***)"; $snoopy->referer = "http://www.microsnot.com/"; $snoopy->cookies["SessionID"] = 238472834723489l; $snoopy->cookies["favoriteColor"] = "RED"; $snoopy->rawheaders["Pragma"] = "no-cache"; $snoopy->maxredirs = 2; $snoopy->offsiteok = false; $snoopy->expandlinks = false; $snoopy->user = "joe"; $snoopy->pass = "bloe"; if($snoopy->fetchtext("http://www.phpbuilder.com")) { while(list($key,$val) = each($snoopy->headers)) echo $key.": ".$val."
\n"; echo "

\n"; echo "

".htmlspecialchars($snoopy->results)."
\n"; } else echo "error fetching document: ".$snoopy->error."\n"; Example: fetched framed content and display the results include "Snoopy.class.php"; $snoopy = new Snoopy; $snoopy->maxframes = 5; if($snoopy->fetch("http://www.ispi.net/")) { echo "
".htmlspecialchars($snoopy->results[0])."
\n"; echo "
".htmlspecialchars($snoopy->results[1])."
\n"; echo "
".htmlspecialchars($snoopy->results[2])."
\n"; } else echo "error fetching document: ".$snoopy->error."\n"; COPYRIGHT: Copyright(c) 1999,2000 ispi. All rights reserved. This software is released under the GNU General Public License. Please read the disclaimer at the top of the Snoopy.class.php file. THANKS: Special Thanks to: Peter Sorger help fixing a redirect bug Andrei Zmievski implementing time out functionality Patric Sandelin help with fetchform debugging Carmelo misc bug fixes with frames

近期下载者

相关文件


收藏者