[UPHPU] Extracting templates from web pages

Richard K Miller richardkmiller at gmail.com
Wed Apr 2 12:28:27 MDT 2008


On Apr 2, 2008, at 12:11 PM, MilesTogoe wrote:
> Richard K Miller wrote:
>> Adrian Holovaty (creator of ChicagoCrime.org and Django) has a  
>> Python script called templatemaker[1][2], which in theory would do  
>> what I want. You feed it a bunch of similar web pages and it  
>> produces a template with "holes" where the data was different  
>> across each web page. In practice, it's too granular; it doesn't  
>> recognize HTML. It looks at every I don't care about spaces between  
>> tags. I only care about substantial content differences across  
>> pages. Everything else can be moved to the template.
> Sounds like your excuse to step up and move to Python & Django! :)

I'm sure Python and Django have plenty of other virtues, but  
templatemaker didn't work for me. Its engine is written in C and  
probably needs to be modified to recognize HTML and ignore whitespace,  
but that's outside of my area of expertise.



More information about the UPHPU mailing list