[UPHPU] Web page data extraction

Jay Newhouse jay at newhousenetwork.net
Wed Jan 16 22:30:36 MST 2008


You might also want to try snoopy.  It allows for logging into sites and 
retrieving the html code returned.  I find it's a little easier to implement 
than CURL.

http://sourceforge.net/projects/snoopy/
http://www.jonasjohn.de/snippets/php/snoopy-example.htm

Jay


----- Original Message ----- 
From: "Mike Mackrory" <mike at echovue.com>
To: <boucha at gmail.com>
Cc: <uphpu at uphpu.org>
Sent: Friday, January 11, 2008 11:49 AM
Subject: Re: [UPHPU] Web page data extraction


> Thanks guys!  I'll have to give this a whirl!
>
> On Jan 11, 2008 11:02 AM, Mike Mackrory <mike at echovue.com> wrote:
>> I have an interesting question.
>>
>> I wrote an Access application a year or two ago that I'm looking at 
>> rewriting
> as a web app. One thing I'm not sure I can move over to a web app is a 
> tool I
> put together to let the users extract data from web pages.
>>
>> In the Access App, I open a browser window, they can log into the secure
> site, find the page with the data they need, then click a button and the
> program then takes the HTML source, parses out the necessary info and then
> loads it into the local database.
>>
>> Does anyone know if this is possible to do using PHP or JavaScript. Using 
>> an
> IFrame would be perfect, but since the site they want to extract the info 
> from
> is on a different domain this doesn't appear to be possible. Anyone have 
> any
> idea's of how I could do this? The big obstacle is just finding a way to 
> get
> the source code of the web page being viewed.
>>
>> Thanks
>>
>> Mike
>>
>
> You can get the source of the page by using fopen. 
> http://us.php.net/fopen
>
> And like Wade said, you can use Curl to handle the logging in and
> stuff.  http://us.php.net/curl
>
> Dave
>
> _______________________________________________
>
> UPHPU mailing list
> UPHPU at uphpu.org
> http://uphpu.org/mailman/listinfo/uphpu
> IRC: #uphpu on irc.freenode.net
> 



More information about the UPHPU mailing list