====== Introduction to Perl ======
==== The "Practical Extraction and Report Language" ====
==== The "Pathologically Eclectic Rubbish Lister" =====
===== Perl =====
* Creator: Larry Wall
* Introduced: 1987
* Open source
* Comes standard on many UNIX/Linux systems
* MacOS users are encouraged to use the [[https://perlbrew.pl|PerlBrew]] distribution of Perl.
* Windows installer obtainable from http://strawberryperl.com and http://activestate.com
* [[https://learn.perl.org/installing/windows.html|Strawberry Perl]] is recommended.
* Originally developed for text record manipulation
* Used for system administration, web development, network programming, system exploit testing
===== Perl Uses =====
* CGI (web application) programming
* Written in Perl (or was written in Perl originally): [[http://www.slashcode.com/ | Slash]], Bugzilla, TWiki, Movable Type
* Wikipedia originally used a Wiki engine (CGI program) written in Perl called [[https://wikiless.tiekoetter.com/wiki/UseModWiki|UseModWiki]].
* Perl CGI-type programs typically communicate with database backends
* bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point.
* Large-scale text processing for report generation
* Has found past use in finance and [[http://shop.oreilly.com/product/9780596000806.do|bioinformatics]] fields for its ability to handle large data sets
===== Perl Online Resources =====
* [[http://www.perl.org | Perl 5 Official site]]
* [[http://www.perl6.org/ | Perl 6 Official site]]
* [[http://learn.perl.org/|learn.perl.org]]
* [[http://www.perl.org/books/library.html|Perl.org]] online library
* [[http://www.ebb.org/PickingUpPerl/|Picking Up Perl]] (online book)
===== Perl Execution =====
* Programs typically given .pl extension
* Executed with //perl -w prog.pl//
* Or use shell script type line at top of Perl script
* #!/path/to/perl -w
* and make the program executable with //chmod +x prog.pl//
* Using the //"-w"// is encouraged for debugging.
* //-w// issues //warnings// that would otherwise not be issued.
===== Perl Syntax (highlights) =====
* A Perl motto: "There is more than one way to do it."
* Perl designed with this idea in mind
* print "Hello, world!\n"; # semicolon is **mandatory** like C/C++/Java/others
* Comments begin with #, as in shell scripts, Tcl and older languages.
* print " 0.25" * 4, "\n"; # output: 1
* " 0.25" is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break ("\n")
* automatic conversion of [[https://perldoc.perl.org/perldata|scalars]] for "[[https://perlmaven.com/automatic-value-conversion-or-casting-in-perl|contextual polymorphism]]"
* leading space is ignored
* Math operators are the familiar set from C/C++/Java
* String concatenation (.) and repetition (x) operators
* print "Ba" . "na" x 4, "\n"; # output: Banananana
===== More Perl Syntax =====
==== variables ====
* Use $ all the time in front of variable names.
$name = "fred"; # variables are dynamically typed
print "My name is ", $name, "\n";
==== variable scope ====
* Variables are global if not declared local with ''my $variable''.
* local Perl variables also called //lexical//
$name = "fred"; # This is global $name.
# Insert a block
{
my $name = "joe";
# This is local $name because of 'my'.
print "Block local \$name is $name\n";
# Using double quotes allows $name interpolation as
# part of the print statement. The backslashed $
# (\$name) suppresses variable interpolation.
}
print "Global \$name is ", $name, "\n";
* Some advocate using the ''my'' keyword for all variables, including global vars.
==== standard input ====
* Perl standard input with ////
print "Please enter something interesting: \n";
$comment = ;
print "You entered: $comment\n";
* Standard input can be from the keyboard or redirected from a file or from another program.
* Example of redirecting stdin from a file:
$ perl -w stdin_comment.pl < comment.txt
* Examples of redirecting stdin from a program:
$ echo "The ripest fruit falls first." | perl -w stdin_comment.pl
# If you have the fortune command installed ...
$ fortune | perl -w stdin_comment.pl
* '''' reads //up to and including// the newline character
* The newline could be the user hitting after inputting something or the newline at the end of a line in a text file.
* See [[cs370:cs_370_-_unix_shells_and_shell_scripting#standard_file_descriptors|standard file descriptors (CS-370)]]
==== chomp() and chop() ====
* The Perl behavior of including newlines often needs to be accounted for.
print "Enter a five letter word guess, preferably \"Yoink\": ";
$userguess = ;
chomp($userguess);
# Removes trailing newline character only.
# Test this without the chomp statement.
$secretword = "Yoink";
print "The result of the comparison: ", $userguess eq $secretword, "\n";
# String comparison with 'eq' operator;
# returns empty string if strings not equal;
# returns 1 if strings equal.
* //chop()// function removes last character of string, whether it's a newline or not.
==== string functions ====
* length(): returns length of a string
print "Enter a string: ";
my $inpString = ;
chomp($inpString); # Must do this, else length will return length + 1.
print "$inpString is ", length($inpString), " chars long.\n";
* lc(): converts all characters in a string to lowercase.
my $string = "Hello, World!";
my $lowercase = lc($string);
print "$lowercase\n";
* uc(): converts all characters in a string to uppercase.
my $string = "Hello, World!";
my $uppercase = uc($string);
print "$uppercase\n";
* index(): returns the position of the first occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall";
my $substring = "Wall";
my $position = index($string, $substring);
print "$position\n"; # Output: 6
* rindex(): returns the position of the last occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall";
my $substring = "Wall";
my $position = rindex($string, $substring);
print "$position\n"; # Output: 17
* substr(): extracts a substring from a string.
my $string = "Hello, World!";
my $substring = substr($string, 7, 5);
# Starting at position 7, extract 5 characters
print "$substring\n"; # Output: World
==== The die() controlled exit function ====
* Commonly seen is the Perl function, //die()//.
print "Enter a string to pass to die(): ";
chomp($string = );
die($string); # Outputs to STDERR, not STDOUT
print "This will not be printed.";
* die() outputs the Perl program name and the line number of the die() statement if the string argument **does not** end in newline, thus, the //chomp()// statement above.
==== selection statements, familiar ====
* Familar (C/C++/Java syntax)
if ( $number != 0 ) {
$result = 100 / $number;
}
if ( $password eq $guess ) {
print "Pass, friend.\n";
} else {
die "Go away, imposter!";
}
==== selection statements, elsif ====
if ( $password eq $guess ) {
print "Pass, friend.\n";
} elsif ( $password eq "Meh" } {
print "Meh!\n";
} else {
die "Go away, imposter!";
}
==== selection statements, unless ====
# Assuming $a is boolean
if ( not $a ) {
print "\$a is not true\n";
}
# can also be expressed...
unless ( $a ) {
print "\$a is not true\n";
}
* Choose the syntax that best fits your own thought patterns
* "There's more than one way to do it."
==== reverse selection statements ====
* Normal
if ($number == 0) {
die "Can't divide by 0";
}
* Equivalent
die "Can't divide by 0" if $number == 0;
==== repetition structures ====
* while, until, for, foreach, do..while, do..until
* while, for, do..while are C/C++/Java standard
until ( $countdown <= 0 ) {
print "Counting down: $countdown\n";
$countdown--;
}
# "for each number in the list 1 through 10"
foreach $number ( 1 .. 10 ) {
print "The number is: $number\n";
}
==== while () ====
* //while ( $var = )// reads from standard input until end of file (control-d)
* Reads a line at a time into $var
while ( $var = ) {
print $var;
# Print each line of standard input.
}
* Can shorten to the following because it's such a common operation
* $_ below is the //default Perl variable//
while ( ) {
print $_;
}
==== lists ====
* Create Perl lists with ()
print( "Hello, ", "world", "\n" );
# 3 strings in a list being passed to print function
print( 123, 456, 789 );
foreach $number ( 1 .. 10 ) {
# ( 1 .. 10 ) creates the list of numbers from 1 to 10
* Create Perl lists with //qw// (quote words)
qw/hello world good bye/
# creates a 4 word list
==== accessing list values =====
* Use square brackets
print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] );
# output: mustard (count from zero)
my $month = 3;
print qw(
Jan Feb Mar
Apr May Jun
Jul Aug Sep
Oct Nov Dec
)[ $month ]
# output: Apr
==== accessing list "slices" ====
* Get more than one list value at a time
my $mone;
my $mtwo;
( $mone, $mtwo ) = ( 1, 3 );
my $m1;
my $m2;
my $m3;
( $m1, $m2, $m3 ) = qw(
Jan Feb Mar
Apr May Jun
Jul Aug Sep
Oct Nov Dec
)[ 2..4 ];
print $m1." ".$m2." ".$m3;
==== arrays ====
* Arrays are just //named// lists.
* Whole arrays are called //@//arrayName (start with '@')
* Individual array elements (//scalars//) are accessed as //$arrayName[ subscript ]//
my @days = qw(Mon Tue Wed Thu Fri Sat Sun);
print @days, "\n";
# Output: MonTueWedThuFriSatSun
print "@days\n";
# Output: Mon Tue Wed Thu Fri Sat Sun
# By enclosing in "", have "stringified" @days
print $days[ 6 ], "\n";
# @days[ 6 ] can also be used, but see the warning at https://stackoverflow.com/a/53732305
==== array size in $# ====
* //$#// is a special Perl variable
* contains the last subscript (or index) of the array
* Looping through an array
my $i = 0;
while ( $i <= $#arrayName ) {
# Do something with $arrayName[ $i ] here
$i++;
}
* or
for ( my $i = 0; $i <= $#arrayName; $i++ ) {
# Do something with $arrayName[ $i ] here
}
==== arrays with foreach ====
foreach my $i ( @arrayName ) {
# Does not use $#.
# Do something with $i here.
}
==== accessing array slices ====
my @days = qw(Mon Tue Wed Thu Fri Sat Sun);
my @longweekend = @days[ 4..6 ];
# Note use of @ instead of $ before days.
print "@longweekend\n";
# Output: Fri Sat Sun
==== array functions ====
* reverse()
my @count = ( 1..5 );
foreach $each ( reverse( @count ) ) {
print "$each...\n";
sleep 1;
}
* sort()
my @unsorted = qw( Cohen Clapton Costello Rush ZZTop );
my @sorted = sort @unsorted;
* //end// or //top// functions: push(), pop()
my $hand;
my @pile = ( "letter", "newspaper", "bill", "notepad" );
print "You pick up something off the top of the pile.\n";
$hand = pop @pile;
# "notepad" is removed from end (top) of @pile
print "You now have a $hand in your hand,\n \
and the pile contains:\n@pile";
print "You now put something on your pile.\n";
push @pile, "statement";
# "statement" is added to the end (top) of @pile
print "Now the pile contains:\n@pile\n";
* //beginning// or //bottom// functions: shift(), unshift()
my @array = (); # nothing in array
unshift @array, "first";
print "Array is now: @array\n";
unshift @array, "second", "third";
print "Array is now: @array\n";
shift @array;
print "Array is now: @array\n";
# //unshift// adds elements, //shift// deletes elements
==== hashes ====
* //hashes// or //associative arrays//
* can be thought of as //unordered arrays//, using //keys// instead of array subscripts
* Whole hashes are called //%//hashname (start with '%').
%where = (
Gary => "Dallas",
Lucy => "Austin",
Ian => "Houston",
Samantha => "Seattle"
);
* To the left of the //=>// are the hash //keys//, to the right of //=>// are the associated hash values.
* hash keys must be unique
==== hash element access ====
* Individual hash elements are accessed with $hashname{key}
my $who = "Ian";
my %where = (
Gary => "Piscataway",
Lucy => "Hackensack",
Ian => "Mahwah",
Samantha => "Hoboken"
);
print "Gary lives in ", $where{Gary}, "\n";
print "$who lives in $where{$who}\n";
==== hash element adding/deleting ====
my %where = (
Gary => "Piscataway",
Lucy => "Hackensack",
Ian => "Mahwah",
Samantha => "Hoboken"
);
$where{Eva} = "Howell";
# We added a Eva => "Howell" key/value pair to the
# %where hash
delete $where{Gary};
# We deleted the Gary => "Piscataway" key/value pair
==== hash functions, iteration ====
* Can't use //while//, //for// or //foreach// to directly iterate through hashes
* Perl provides //keys()//, //values()// and //each()//
# use 'keys' function to iterate through hash keys
foreach $who ( keys %where ) {
print "$who lives in $where{$who}\n";
}
# use 'values' function to iterate through hash values
foreach $town ( values %where ) {
print "someone lives in $town\n";
}
# use 'each' function to iterate through hash key/value
# pairs
my ($name, $town);
# an assignable list of variables
while ( ($name, $town) = each %where ) {
print "$name lives in $town\n";
}
==== hash functions, key existence ====
* Use //exists()// hash function to check if a key exists in a hash
print "Gary exists in the hash!\n" if exists $where{Gary};
==== subroutines ====
* subroutines are functions
* define subroutine:
sub example_subroutine {
...
# subroutine body
...
}
* define and call subroutine
greet();
sub greet {
print "Hello, World!\n";
}
==== subroutines, arguments ====
* arguments are passed through another special Perl var, //@_//
* notice that //@_// is a special //array//
greet( "Jim", "Bob", "Russ" );
# There isn't a set number of function arguments or a
# function "prototype" to speak of
sub greet {
foreach my $arg ( @_ ) {
print "Hello $arg!\n";
}
print "You're first, $_[ 0 ].\n";
print "You're second, $_[ 1 ].\n";
print "You're last, $_[ 2 ].\n";
}
==== subroutines, returns ====
* Use //return//
* Can return a list of values.
my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" );
print "($len1, $len2, $len3)\n";
sub greet {
foreach my $arg ( @_ ) {
print "Hello $arg!\n";
}
return (length($_[0]), length($_[1]), length($_[2]));
# return a list of 3 ints
}
----
----
===== Perl and Regular Expressions =====
* For anything beyond trivial text processing with Perl and other languages, you need a basic understanding of //regular expressions// (regex).
* See [[cs370:cs_370_-_regular_expressions | CS 370 intro to regular expressions]] for a start.
* Also, see links:
* http://linuxreviews.org/beginner/tao_of_regular_expressions
* =>=>=>=>=>=> https://regexone.com <=<=<=<=<=<=
* =>=>=>=>=>=> https://perldoc.perl.org/perlrequick <=<=<=<=<=<=
* http://en.wikipedia.org/wiki/Regular_expressions
* [[http://docs.google.com/viewer?url=http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf|Regex chapter in "Beginning Perl"]]
* [[https://perldoc.perl.org/perlre|Official Perl regular expressions documentation]]
* Interactive regex testers:
* http://www.regexr.com/
* http://regexpal.com/
===== Perl and Regular Expressions (match, search/replace) =====
* Regular expression matching with the //=~// operator
$var =~ m/regular expression/
# boolean (true if match, false if no match)
# can omit the "m": $var =~ /regular expression/
$var !~ m/regular expression/
# true if not a match, false if a match
* Substitution using regular expressions
$var =~ s/search re/replace re/
my $name = "Joseph";
$name =~ s/[sph]//; # delete first instance of s, p or h
print "$name\n";
$name =~ s/[sph]//g; # g -> delete every instance of s, p or h
print "$name\n";
===== Perl and Regular Expressions (split) =====
* //split()// function splits strings using a regular expression as delimiter
* default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing
split /\s+/, $_;
# split the default variable, using one or more
# whitespaces
* The //split// regexp delimiter is usually something simple
my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash";
my @fields = split /:/, $passwd;
===== Perl and Regular Expressions (join) =====
* //join()// function joins elements of a list using a specified delimiter string
my $last = "Jones";
my $first = "Bob";
my $name = join ", ", ($last, $first);
===== Multiple Input Files with <> =====
* We have used the //// statement and //< filename// redirection to process files such as //cs498roster//
* Suppose we want to be able to process files with a command like
$ perl -w perlex5.pl cs498roster cs598roster
* One way to accomplish this is with the //<>// (so-called //diamond//)
while (<>) {
print "text read: $_";
}
* //<>// checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names)
* If so, it reads the file(s) one at a time, one line at a time.
===== Perl Command Line Arguments w/ @ARGV =====
* The special Perl array //@ARGV// contains the command line arguments that a Perl script was invoked with.
* //<>// gets file names from //@ARGV// by default
foreach my $arg ( @ARGV ) {
print $arg;
}
or
for ( my $i = 0; $i <= $#ARGV; $i++ ) {
print $ARGV[ $i ];
}
or
print "$_" foreach @ARGV; // Perl-speak
===== Perl Command Line Arguments w/ shift @ARGV =====
* Command line arguments aren't always file names.
* Here, we're trying to get a help message from ''perlex6.pl'' by calling it with the ''%%--%%help'' command line argument
* ''%%--%%help'' becomes the first element of ''@ARGV'', ''$ARGV[ 0 ]''
$ perl -w perlex6.pl --help
* Sometimes, command line arguments modify how a program will work.
* Here, we're trying to make ''perlex6.pl'' sort rosters by last name:
$ perl -w perlex6.pl --last cs498roster cs598roster
* To remove an argument from ''@ARGV'', ''shift'' it.
my $arg0 = shift @ARGV;
if ( $arg0 =~ /last/ ) ...
# Process all non-file command line args before we
# get to //while (<>)//
while (<>) { ...
===== Perl References =====
* Perl references hold the locations of other pieces of data
* The backslash "\" is used to create references.
* Here, //$hash_r// becomes a reference to the memory location of //%hash//:
my %hash = (
apple => "crab",
pear => "asian"
);
my $hash_r = \%hash;
* Dereferencing
* Use {} around reference name.
my %hash2 = %{$hash_r};
# Set new %hash2 contents from the hash that is
# referenced by $hash_r
===== Perl Objects =====
* "...an object can be anything %%--%% it really depends on what your application is. ... If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object."
* In Perl, what we see as an object is simply a reference...
* In fact, you can convert any ordinary reference //into// an object simply by using the (Perl) //bless()// function.
* Typically, however, objects are represented as references to a //hash//
===== Perl Objects and Modules Example =====
* Perl Classes are often called //Modules//
* Here, we'll use the //Net::FTP// module (example only; rockhopper does not run an anonymous ftp server):
use strict;
use Net::FTP;
# This requires that FTP.pm be stored somewhere on the
# local system that Perl searches through for modules.
my $ftp = Net::FTP->new("rockhopper.monmouth.edu")
or die "Couldn't connect: $@\n";
# new() is a method of class Net::FTP; it's the
# constructor. $ftp is our FTP session object.
$ftp->login("anonymous");
# New::FTP->login() method
$ftp->cwd("/");
$ftp->get("index.php");
$ftp->close();
===== Perl Modules and @INC =====
* //@INC// contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm.
$ perl -V
...
@INC:
/etc/perl
/usr/local/lib/perl/5.8.8
/usr/local/share/perl/5.8.8
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8
/usr/share/perl/5.8
/usr/local/lib/site_perl
.
===== Perl Modules and CPAN =====
* A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks
* The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules.
* Searchable at http://search.cpan.org
* The CPAN module naming hierarchy places modules in categories such as ''Sort::Fields'', ''Sort::Versions'' or subcategories such as ''LWP::Protocol::http''
* On disk, the modules would look like ''.../Sort/Fields.pm'', ''.../Sort/Versions.pm''
* For the ''LWP::Protocol::http'' module, the full path to the module might be ''/usr/share/perl5/LWP/Protocol/http.pm'' on a Linux system.
* How to know if a Perl module is installed?
* You can run a Perl "one-liner" program using the "-e" option to check if a module is installed, e.g.,
# If the following runs with no errors, it means the LWP::Simple module is installed.
perl -e "use LWP::Simple"
===== Perl Web Automation =====
* Perl modules are available for Web automation
* Well-known modules are the LWP ("Library for WWW in Perl") group of modules and WWW::Mechanize.
* LWP - https://www.perl.com/pub/2002/08/20/perlandlwp.html/
* WWW::Mechanize
* https://www.perlmonks.org/?node_id=1037506
* http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod
* Old article that describes Perl web automation: http://rockhopper.monmouth.edu/~jchung/docs/4915-1108-turoff.pdf
* Retry Perl Exercise 7 using a Web automation module such as //LWP::Simple or WWW::Mechanize//.
===== Perl and System Commands =====
* For portability, use Perl equivalents to system commands, such as Perl's //chdir()//, function instead of //cd//.
* NOTE: This is applicable to all programming languages that can call system commands.
* But if using system commands is a necessity, use //system()//
system( '/usr/games/fortune' );
* Command substitution is done using backticks ``
* Sometimes we want to capture the output of system commands to use in a Perl script.
my $sysdate = `date`;