www.farid-hajji.net banner

Farid Hajji

Perl: Einführung, Anwendungen, Referenz (2/e) [Support-Site]

Farid Hajji: Perl - Einführung, Anwendungen, Referenz
2., aktualisierte und erweiterte Auflage
Addison-Wesley Longman, ISBN 3-8273-1535-2

Beispielprogramm

Tie/URL.pm
# Tie/URL.pm -- TIEHASH Klasse zum Holen von URLs via LWP-Library.

package Tie::URL;
$VERSION = "0.01";

use LWP::UserAgent;
use Tie::Hash;

use vars qw(@ISA);
@ISA = qw(Tie::StdHash);

use constant USERAGENT  => 'TIE-URL/0.1';
use constant RTIMEOUT   => 60; # Sekunden: Timeout bei Misserfolg

sub TIEHASH {
    my $class  = shift;
    my %params = (
          UserAgentName => USERAGENT,
          Timeout       => RTIMEOUT,
      @_);

    # Sicherheitshalber brauchen wir diesen Parameter!
    die "Tie::URL: Must specify UserMail Parameter!\n"
    unless exists $params{'UserMail'};

    my $ho = bless({}, $class);

    $ho->{'_ua'} = LWP::UserAgent->new();
    $ho->{'_ua'}->agent($params{'UserAgentName'});
    $ho->{'_ua'}->from($params{'UserMail'});

    # Die zwei speziellen Konstanten ProxyOff und CacheOff
    # schalten den Proxy und den Cache aus.
    unless (exists $params{'ProxyOff'}) {
    $ho->{'_ua'}->proxy('http', $params{'ProxyURL'})
        if defined $params{'ProxyURL'};
    $ho->{'_ua'}->no_proxy($params{'ProxyNO'})
        if defined $params{'ProxyNO'};
    }
    unless (exists $params{'CacheOff'}) {
    $ho->{'_cache'} = 1;
    }

    # Wenn ein Timeout angegeben wird, sollte es honoriert werden
    if (exists $params{'Timeout'}) {
    $ho->{'_ua'}->timeout($params{'Timeout'});
    }

    return $ho;
}

sub FETCH {
    my $self = shift;
    my $url  = shift;

    # Erst im Cache nachpruefen, ob die URL schon da ist...
    return $self->{$url} if exists $self->{$url};

    # Nun die URL holen
    my $resp = $self->{'_ua'}->request(new HTTP::Request('GET', $url));
    my $cont = $resp->is_success() ? $resp->content() :
                                 $resp->error_as_HTML();

    # Im Cache speichern, falls Cache eingeschaltet ist.
    $self->{$url} = $cont if exists $self->{'_cache'};

    # Den Wert zurueckgeben
    return $cont;
}

sub STORE {
    warn "Sorry, HTTP PUT/POST Methods not yet implemented!\n";
}

1;
__END__

=head1 NAME

Tie::URL - Tie Hashes to URLs

=head1 SYNOPSIS

    use Tie::URL;

    tie %url, 'Tie::URL', UserMail => 'user@somewhere.org';

    tie %url, 'Tie::URL', UserMail => 'user@somewhere.org',
                          Timeout  => $time_to_wait,
                          CacheOff => 1,
                          ProxyOff => 1;

    tie %url, 'Tie::URL', UserMail => 'user@somewhere.org',
                          ProxyURL => 'http://proxy.isp.org:8080/',
                          ProxyNO  => 'isp.org';

=head1 DESCRIPTION

This module provides a TIEHASH class to tie a hash to URLs using
the LWP::UserAgent module as a backend to fetch URLs.
Once a hash has been tied to this class, accessing a URL is simple:
Just read out the value of the hash, specifying as key the wanted
URL. The value of a hash is the content returned by the server
at the specified URL, or the error code.

Requests are normally cached in the hash memory, so that subsequent
reads are not propagated to the server. The cache can be turned
off by adding the CacheOff parameter to the tie() call.

tie() accepts the following parameters:

=over

=item UserMail

The mail address of the user running this program.
This is the only mandatory parameter.

=item Timeout

The time to wait for the server to reply in seconds. Defaults
to RTIMEOUT. You may wish to set this to a higher value in heavily
loaded networks or servers.

=item CacheOff

If this parameter is specified, repeated requests to a specific URL
are also repeatedly sent to the server. No cache is maintained in
memory.

The enabled cache currently saves all requests in memory, regardless
of their Expires: HTTP Header. This could change in the future.

=item ProxyOff

Using a proxy server is normally enabled, once you set the
ProxyURL parameter, eventually also the ProxyNO parameter. Adding
ProxyOff turns the use of proxy servers off.

=item ProxyURL

The URL of the proxy server to use. This is sometimes necessary to
pass through firewalls or to profit from the caching proxy of your
ISP, thus reducing waiting time and conserving network bandwidth.

=item ProxyNO

Don't ask proxy server for the domain specified by this parameter.

=back

keys() and related hash functions return the contents of the
cache, which could be empty if the cache has been disabled with
CacheOff. Note that keys() won't return all the URLs worldwide :-)

=head1 KNOWN BUGS

Caching should be smarter and use the HTTP Expires: Headers provided
by the responding server.

Support for Cookies is currently disabled, but could be added in
the TIEHASH constructor, as LWP::UserAgent supports cookie jars.

Writing a value to a tied hash results only in warning being issued.
A HTTP POST or even HTTP PUT request is NOT issued to the server.
This could be implemented in the future.

keys(), each() and values() also return the special keys
_ua, _cache which are not URLs but internal objects resp. markers.

=head1 AUTHOR

Farid Hajji <farid.hajji@ob.kamp.net>

This module is copylefted under the same terms as Perl. See the
GNU copyleft version 2 or the artistic license.

=cut
   

[Prev] [Up] [Relevant Chapter] [Next]

[Alte Quelle]


Last modified: $Date: 2004/06/16 22:19:40 $
FH. Search :: Sitemap :: Disclaimer :: Copyright :: Privacy
FreeBSD Logo