百度首页 | 百度空间
 
查看文章
 
php yahoo 信息抓取
2008-07-12 19:07
<?
function getYahooQuote($stockSymbol = "CCR")
{
if (!$targetURL) $targetURL = "http://finance.yahoo.com/q?s=$stockSymbol&d=t"; //设定要抓取的URL目标     
         $fd = fopen("$targetURL", "r");
         $stopExtract = 0;  
         $startExtract = 0;  
         while (!feof($fd))  
         {
             $buffer = fgets($fd, 4096);
                 //echo trim($buffer)."\n";  
             if (strstr($buffer, "rowspan=3"))
             {
                 //echo "extract started at line #$lineCount\n";  
                 $startExtract = 1;  
             }     
             if ($startExtract && !$stopExtract)     
             {
         
                 if (strstr($buffer, "<a"))  
                 {

                     $startPos = strrpos($buffer, "<");
                     $buffer = substr($buffer, $startPos);
                 }
                 //$text = trim(strip_tags($buffer));
                 //echo trim($buffer)."\n";  
             
                 $buffer = str_replace("\n\r", " ", "$buffer");
                 if (strstr($buffer, "http://ichart.yahoo.com/v?s=$stockSymbol"))  
                 {
                     //echo "ichart found!";
                     $stopExtract = 1;  
                 }
                 $capturedHTML .= $buffer;     
         
             }

             if ($startExtract && strstr($buffer, "<br>"))
             {
                  $stopExtract = 1;  
                 //echo "extract stopped at line #$lineCount\n";          
                 echo $capturedHTML;
                 break;
             }
             $lineCount++;
         }
         fclose($fd);
     }
     
     //以下为抓取的一个例子
     $symbols = array('CCR', 'IIXL','SAPE','WBVN' );
     $symbolCount = count($symbols);
     for ($i=0; $i< $symbolCount; $i++)
     {
         echo "$symbols[$i]<br>";
         getYahooQuote("$symbols[$i]");
     }
     ?>

类别:网页抓取技术的研究 | 添加到搜藏 | 浏览() | 评论 (0)
 
最近读者:
 
网友评论:
发表评论:
姓 名:
网址或邮箱: (选填)
内 容:
验证码:
 

     

©2008 Baidu