[UPHPU] Web site scraping

Walt Haas haas at xmission.com
Thu Sep 25 09:40:14 MDT 2008


If you just want a mirror of the site, any Linux distro should include 
wget.  "wget -r http://example.com" will make a local mirror of the 
whole example.com site.

If you want a tool to select and reformat parts of a page, XSLT might be 
worth a look.  It's a functional programming language which is 
unfamiliar to many but is powerful and worth learning.

-- Walt

Nathan Lane wrote:
> I want to make what in effect is a website scraper using PHP, but it isn't
> obvious how this would best be done. I've tried using DOMDocument and I'm
> not sure if that's the best option or not. I'd really like to use something
> where I could use XPath to get the elements out that I want. Recently I
> wrote a similar program in C# that I call HttpAnalyzer. Could I just use
> that with PHP (i.e. call it from PHP) to get what I'm looking for? Any
> suggestions?
>
>   



More information about the UPHPU mailing list