User Tools

Site Tools


cs498gpl:introduction_to_perl

Introduction to Perl

The "Practical Extraction and Report Language"

The "Pathologically Eclectic Rubbish Lister"

Perl

  • Creator: Larry Wall
  • Introduced: 1987
  • Open source
  • Comes standard on many UNIX/Linux systems
  • MacOS users are encouraged to use the PerlBrew distribution of Perl.
  • Windows installer obtainable from http://strawberryperl.com and http://activestate.com
  • Originally developed for text record manipulation
  • Used for system administration, web development, network programming, system exploit testing

Perl Uses

  • CGI (web application) programming
    • Written in Perl (or was written in Perl originally): Slash, Bugzilla, TWiki, Movable Type
      • Wikipedia originally used a Wiki engine (CGI program) written in Perl called UseModWiki.
      • Perl CGI-type programs typically communicate with database backends
    • bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point.
  • Large-scale text processing for report generation
    • Has found past use in finance and bioinformatics fields for its ability to handle large data sets

Perl Online Resources

Perl Execution

  • Programs typically given .pl extension
  • Executed with perl -w prog.pl
  • Or use shell script type line at top of Perl script
    • #!/path/to/perl -w
    • and make the program executable with chmod +x prog.pl
  • Using the “-w” is encouraged for debugging.
    • -w issues warnings that would otherwise not be issued.

Perl Syntax (highlights)

  • A Perl motto: “There is more than one way to do it.”
    • Perl designed with this idea in mind
  • print “Hello, world!\n”; # semicolon is mandatory like C/C++/Java/others
    • Comments begin with #, as in shell scripts, Tcl and older languages.
  • print “ 0.25” * 4, “\n”; # output: 1
    • “ 0.25” is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break (“\n”)
    • automatic conversion of scalars for “contextual polymorphism
    • leading space is ignored
  • Math operators are the familiar set from C/C++/Java
  • String concatenation (.) and repetition (x) operators
    • print “Ba” . “na” x 4, “\n”; # output: Banananana

More Perl Syntax

variables

  • Use $ all the time in front of variable names.
$name = "fred"; # variables are dynamically typed
print "My name is ", $name, "\n";

variable scope

  • Variables are global if not declared local with my $variable.
    • local Perl variables also called lexical
$name = "fred"; # This is global $name.
          
# Insert a block
{
   my $name = "joe"; 
      # This is local $name because of 'my'.
   print "Block local \$name is $name\n";
      # Using double quotes allows $name interpolation as
      # part of the print statement. The backslashed $ 
      # (\$name) suppresses variable interpolation.
}

print "Global \$name is ", $name, "\n";
  • Some advocate using the my keyword for all variables, including global vars.

standard input

  • Perl standard input with <STDIN>
print "Please enter something interesting: \n";
$comment = <STDIN>;
          
print "You entered: $comment\n";
  • Standard input can be from the keyboard or redirected from a file or from another program.
    • Example of redirecting stdin from a file:
$ perl -w stdin_comment.pl < comment.txt
  • Examples of redirecting stdin from a program:
$ echo "The ripest fruit falls first." | perl -w stdin_comment.pl

# If you have the fortune command installed ...
$ fortune | perl -w stdin_comment.pl
  • <STDIN> reads up to and including the newline character
  • The newline could be the user hitting <Enter> after inputting something or the newline at the end of a line in a text file.

chomp() and chop()

  • The Perl <STDIN> behavior of including newlines often needs to be accounted for.
print "Enter a five letter word guess, preferably \"Yoink\": ";
$userguess = <STDIN>;
chomp($userguess);
   # Removes trailing newline character only.
   # Test this without the chomp statement.
            
$secretword = "Yoink";
            
print "The result of the comparison: ", $userguess eq $secretword, "\n";
   # String comparison with 'eq' operator;
   # returns empty string if strings not equal;
   # returns 1 if strings equal.
  • chop() function removes last character of string, whether it's a newline or not.

string functions

  • length(): returns length of a string
print "Enter a string: ";
my $inpString = <STDIN>;
chomp($inpString); # Must do this, else length will return length + 1.
print "$inpString is ", length($inpString), " chars long.\n";
  • lc(): converts all characters in a string to lowercase.
my $string = "Hello, World!";
my $lowercase = lc($string);
print "$lowercase\n";
  • uc(): converts all characters in a string to uppercase.
my $string = "Hello, World!";
my $uppercase = uc($string);
print "$uppercase\n";
  • index(): returns the position of the first occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall";
my $substring = "Wall";
my $position = index($string, $substring);
print "$position\n";  # Output: 6
  • rindex(): returns the position of the last occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall";
my $substring = "Wall";
my $position = rindex($string, $substring);
print "$position\n";  # Output: 17
  • substr(): extracts a substring from a string.
my $string = "Hello, World!";
my $substring = substr($string, 7, 5);
   # Starting at position 7, extract 5 characters
print "$substring\n";  # Output: World

The die() controlled exit function

  • Commonly seen is the Perl function, die().
print "Enter a string to pass to die(): ";
chomp($string = <STDIN>);
            
die($string); # Outputs to STDERR, not STDOUT

print "This will not be printed.";
  • die() outputs the Perl program name and the line number of the die() statement if the string argument does not end in newline, thus, the chomp() statement above.

selection statements, familiar

  • Familar (C/C++/Java syntax)
if ( $number != 0 ) {
   $result = 100 / $number;
}

if ( $password eq $guess ) {
   print "Pass, friend.\n";
} else {
   die "Go away, imposter!";
}

selection statements, elsif

if ( $password eq $guess ) {
   print "Pass, friend.\n";
} elsif ( $password eq "Meh" } {
   print "Meh!\n";
} else {
   die "Go away, imposter!";
}

selection statements, unless

# Assuming $a is boolean
if ( not $a ) {
   print "\$a is not true\n";
}

# can also be expressed...

unless ( $a ) {
   print "\$a is not true\n";
}
  • Choose the syntax that best fits your own thought patterns
    • “There's more than one way to do it.”

reverse selection statements

  • Normal
if ($number == 0) {
   die "Can't divide by 0";
}
  • Equivalent
die "Can't divide by 0" if $number == 0;

repetition structures

  • while, until, for, foreach, do..while, do..until
    • while, for, do..while are C/C++/Java standard
until ( $countdown <= 0 ) {
   print "Counting down: $countdown\n";
   $countdown--;
}

# "for each number in the list 1 through 10"
foreach $number ( 1 .. 10 ) {
   print "The number is: $number\n";
}

while (<STDIN>)

  • while ( $var = <STDIN> ) reads from standard input until end of file (control-d)
    • Reads a line at a time into $var
while ( $var = <STDIN> ) {
   print $var;
	  # Print each line of standard input.
}
  • Can shorten to the following because it's such a common operation
    • $_ below is the default Perl variable
while ( <STDIN> ) {
   print $_;
}

lists

  • Create Perl lists with ()
print( "Hello, ", "world", "\n" );
   # 3 strings in a list being passed to print function

print( 123, 456, 789 );

foreach $number ( 1 .. 10 ) {
   # ( 1 .. 10 ) creates the list of numbers from 1 to 10
  • Create Perl lists with qw (quote words)
qw/hello world good bye/
   # creates a 4 word list

accessing list values

  • Use square brackets
print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] );
   # output: mustard (count from zero)

my $month = 3;

print qw(
   Jan Feb Mar
   Apr May Jun
   Jul Aug Sep
   Oct Nov Dec
)[ $month ]
   # output: Apr

accessing list "slices"

  • Get more than one list value at a time
my $mone;
my $mtwo;
( $mone, $mtwo ) = ( 1, 3 );


my $m1;
my $m2;
my $m3;

( $m1, $m2, $m3 ) = qw(
                      Jan Feb Mar
                      Apr May Jun
                      Jul Aug Sep
                      Oct Nov Dec
                      )[ 2..4 ];

print $m1." ".$m2." ".$m3;

arrays

  • Arrays are just named lists.
  • Whole arrays are called @arrayName (start with '@')
    • Individual array elements (scalars) are accessed as $arrayName[ subscript ]
my @days = qw(Mon Tue Wed Thu Fri Sat Sun);
print @days, "\n";
   # Output: MonTueWedThuFriSatSun
print "@days\n";
   # Output: Mon Tue Wed Thu Fri Sat Sun
   # By enclosing in "", have "stringified" @days
print $days[ 6 ], "\n"; 
# @days[ 6 ] can also be used, but see the warning at http://tiny.cc/f4t3vz

array size in $#

  • $# is a special Perl variable
    • contains the last subscript (or index) of the array
  • Looping through an array
my $i = 0;
while ( $i <= $#arrayName ) {
   # Do something with $arrayName[ $i ] here
   $i++;
}
  • or
for ( my $i = 0; $i <= $#arrayName; $i++ ) {
   # Do something with $arrayName[ $i ] here
}

arrays with foreach

foreach my $i ( @arrayName ) {
   # Does not use $#.
   # Do something with $i here.
}

accessing array slices

my @days = qw(Mon Tue Wed Thu Fri Sat Sun);
my @longweekend = @days[ 4..6 ];
   # Note use of @ instead of $ before days.
print "@longweekend\n";
   # Output: Fri Sat Sun

array functions

  • reverse()
my @count = ( 1..5 );

foreach $each ( reverse( @count ) ) {
  print "$each...\n";
  sleep 1;
}
  • sort()
my @unsorted = qw( Cohen Clapton Costello Rush ZZTop );
my @sorted = sort @unsorted;
  • end or top functions: push(), pop()
my $hand;
my @pile = ( "letter", "newspaper", "bill", "notepad" );
print "You pick up something off the top of the pile.\n";
$hand = pop @pile;
   # "notepad" is removed from end (top) of @pile
print "You now have a $hand in your hand,\n \
and the pile contains:\n@pile";

print "You now put something on your pile.\n";
push @pile, "statement";
   # "statement" is added to the end (top) of @pile
print "Now the pile contains:\n@pile\n";
  • beginning or bottom functions: shift(), unshift()
my @array = (); # nothing in array
unshift @array, "first";
print "Array is now: @array\n";
unshift @array, "second", "third";
print "Array is now: @array\n";
shift @array;
print "Array is now: @array\n";
   # //unshift// adds elements, //shift// deletes elements

hashes

  • hashes or associative arrays
    • can be thought of as unordered arrays, using keys instead of array subscripts
  • Whole hashes are called %hashname (start with '%').
%where = (
   Gary => "Dallas",
   Lucy => "Austin",
   Ian  => "Houston",
   Samantha => "Seattle"
);
  • To the left of the are the hash keys, to the right of are the associated hash values.
    • hash keys must be unique

hash element access

  • Individual hash elements are accessed with $hashname{key}
my $who = "Ian";

my %where = (
   Gary => "Piscataway",
   Lucy => "Hackensack",
   Ian  => "Mahwah",
   Samantha => "Hoboken"
);

print "Gary lives in ", $where{Gary}, "\n";
print "$who lives in $where{$who}\n";

hash element adding/deleting

my %where = (
   Gary => "Piscataway",
   Lucy => "Hackensack",
   Ian  => "Mahwah",
   Samantha => "Hoboken"
);

$where{Eva} = "Howell";
   # We added a Eva => "Howell" key/value pair to the
   # %where hash

delete $where{Gary};
   # We deleted the Gary => "Piscataway" key/value pair

hash functions, iteration

  • Can't use while, for or foreach to directly iterate through hashes
    • Perl provides keys(), values() and each()
# use 'keys' function to iterate through hash keys
foreach $who ( keys %where ) {
   print "$who lives in $where{$who}\n";
}


# use 'values' function to iterate through hash values
foreach $town ( values %where ) {
   print "someone lives in $town\n";
}


# use 'each' function to iterate through hash key/value
# pairs
my ($name, $town);
   # an assignable list of variables
while ( ($name, $town) = each %where ) {
   print "$name lives in $town\n";
}

hash functions, key existence

  • Use exists() hash function to check if a key exists in a hash
print "Gary exists in the hash!\n" if exists $where{Gary};

subroutines

  • subroutines are functions
  • define subroutine:
sub example_subroutine {
  ...
  # subroutine body
  ...
}
  • define and call subroutine
greet();

sub greet {
   print "Hello, World!\n";
            }

subroutines, arguments

  • arguments are passed through another special Perl var, @_
    • notice that @_ is a special array
greet( "Jim", "Bob", "Russ" );
   # There isn't a set number of function arguments or a
   # function "prototype" to speak of

sub greet {
   foreach my $arg ( @_ ) {
	  print "Hello $arg!\n";
   }

   print "You're first, $_[ 0 ].\n";
   print "You're second, $_[ 1 ].\n";
   print "You're last, $_[ 2 ].\n";
}

subroutines, returns

  • Use return
    • Can return a list of values.
my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" );
print "($len1, $len2, $len3)\n";

sub greet {
   foreach my $arg ( @_ ) {
	  print "Hello $arg!\n";
   }

   return (length($_[0]), length($_[1]), length($_[2]));
	  # return a list of 3 ints
}


Perl and Regular Expressions

Perl and Regular Expressions (match, search/replace)

  • Regular expression matching with the =~ operator
$var =~ m/regular expression/
   # boolean (true if match, false if no match)
   # can omit the "m": $var =~ /regular expression/
   
$var !~ m/regular expression/
   # true if not a match, false if a match
  • Substitution using regular expressions
$var =~ s/search re/replace re/

my $name = "Joseph";
$name =~ s/[sph]//;  # delete first instance of s, p or h
print "$name\n";
$name =~ s/[sph]//g; # g -> delete every instance of s, p or h
print "$name\n";

Perl and Regular Expressions (split)

  • split() function splits strings using a regular expression as delimiter
    • default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing
split /\s+/, $_;
   # split the default variable, using one or more
   # whitespaces
  • The split regexp delimiter is usually something simple
my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash";
my @fields = split /:/, $passwd;

Perl and Regular Expressions (join)

  • join() function joins elements of a list using a specified delimiter string
my $last = "Jones";
my $first = "Bob";
my $name = join ", ", ($last, $first);

Multiple Input Files with <>

  • We have used the <STDIN> statement and < filename redirection to process files such as cs498roster
  • Suppose we want to be able to process files with a command like
$ perl -w perlex5.pl cs498roster cs598roster
  • One way to accomplish this is with the <> (so-called diamond)
while (<>) {
	print "text read: $_";
}
  • <> checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names)
    • If so, it reads the file(s) one at a time, one line at a time.

Perl Command Line Arguments w/ @ARGV

  • The special Perl array @ARGV contains the command line arguments that a Perl script was invoked with.
    • <> gets file names from @ARGV by default
foreach my $arg ( @ARGV ) {
	print $arg;
}

or

for ( my $i = 0; $i <= $#ARGV; $i++ ) {
	print $ARGV[ $i ];
}

or

print "$_" foreach @ARGV; // Perl-speak

Perl Command Line Arguments w/ shift @ARGV

  • Command line arguments aren't always file names.
    • Here, we're trying to get a help message from perlex6.pl by calling it with the –help command line argument
      • –help becomes the first element of @ARGV, $ARGV[ 0 ]
$ perl -w perlex6.pl --help
  • Sometimes, command line arguments modify how a program will work.
    • Here, we're trying to make perlex6.pl sort rosters by last name:
$ perl -w perlex6.pl --last cs498roster cs598roster
  • To remove an argument from @ARGV, shift it.
my $arg0 = shift @ARGV;
if ( $arg0 =~ /last/ ) ...
   # Process all non-file command line args before we
   # get to //while (<>)//

                while (<>) { ...

Perl References

  • Perl references hold the locations of other pieces of data
    • The backslash “\” is used to create references.
    • Here, $hash_r becomes a reference to the memory location of %hash:
my %hash = ( 
	apple => "crab",
	pear => "asian"
	);
my $hash_r = \%hash;
  • Dereferencing
    • Use {} around reference name.
my %hash2 = %{$hash_r};
	# Set new %hash2 contents from the hash that is
	# referenced by $hash_r

Perl Objects

  • “…an object can be anything – it really depends on what your application is. … If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object.”
  • In Perl, what we see as an object is simply a reference…
    • In fact, you can convert any ordinary reference into an object simply by using the (Perl) bless() function.
    • Typically, however, objects are represented as references to a hash

Perl Objects and Modules Example

  • Perl Classes are often called Modules
    • Here, we'll use the Net::FTP module (example only; rockhopper does not run an anonymous ftp server):
use strict;
use Net::FTP;
   # This requires that FTP.pm be stored somewhere on the
   # local system that Perl searches through for modules.

my $ftp = Net::FTP->new("rockhopper.monmouth.edu")
   or die "Couldn't connect: $@\n";
	  # new() is a method of class Net::FTP; it's the
	  # constructor. $ftp is our FTP session object.

$ftp->login("anonymous");
   # New::FTP->login() method
$ftp->cwd("/");
$ftp->get("index.php");
$ftp->close();

Perl Modules and @INC

  • @INC contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm.
$ perl -V
...
@INC:
	/etc/perl
	/usr/local/lib/perl/5.8.8
	/usr/local/share/perl/5.8.8
	/usr/lib/perl5
	/usr/share/perl5
	/usr/lib/perl/5.8
	/usr/share/perl/5.8
	/usr/local/lib/site_perl
                .

Perl Modules and CPAN

  • A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks
  • The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules.
  • The CPAN module naming hierarchy places modules in categories such as Sort::Fields, Sort::Versions or subcategories such as LWP::Protocol::http
    • On disk, the modules would look like …/Sort/Fields.pm, …/Sort/Versions.pm
    • For the LWP::Protocol::http module, the full path to the module might be /usr/share/perl5/LWP/Protocol/http.pm on a Linux system.
  • How to know if a Perl module is installed?
    • You can run a Perl “one-liner” program using the “-e” option to check if a module is installed, e.g.,
# If the following runs with no errors, it means the LWP::Simple module is installed.
perl -e "use LWP::Simple"

Perl Web Automation

  • Retry Perl Exercise 7 using a Web automation module such as LWP::Simple or WWW::Mechanize.

Perl and System Commands

  • For portability, use Perl equivalents to system commands, such as Perl's chdir(), function instead of cd.
    • NOTE: This is applicable to all programming languages that can call system commands.
  • But if using system commands is a necessity, use system()
system( '/usr/games/fortune' );
  • Command substitution is done using backticks ``
    • Sometimes we want to capture the output of system commands to use in a Perl script.
my $sysdate = `date`;
cs498gpl/introduction_to_perl.txt · Last modified: 2024/02/09 19:31 by jchung

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki