====== Introduction to Perl ====== ==== The "Practical Extraction and Report Language" ==== ==== The "Pathologically Eclectic Rubbish Lister" ===== ===== Perl ===== * Creator: Larry Wall * Introduced: 1987 * Open source * Comes standard on many UNIX/Linux systems * MacOS users are encouraged to use the [[https://perlbrew.pl|PerlBrew]] distribution of Perl. * Windows installer obtainable from http://strawberryperl.com and http://activestate.com * [[https://learn.perl.org/installing/windows.html|Strawberry Perl]] is recommended. * Originally developed for text record manipulation * Used for system administration, web development, network programming, system exploit testing ===== Perl Uses ===== * CGI (web application) programming * Written in Perl (or was written in Perl originally): [[http://www.slashcode.com/ | Slash]], Bugzilla, TWiki, Movable Type * Wikipedia originally used a Wiki engine (CGI program) written in Perl called [[https://wikiless.tiekoetter.com/wiki/UseModWiki|UseModWiki]]. * Perl CGI-type programs typically communicate with database backends * bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point. * Large-scale text processing for report generation * Has found past use in finance and [[http://shop.oreilly.com/product/9780596000806.do|bioinformatics]] fields for its ability to handle large data sets ===== Perl Online Resources ===== * [[http://www.perl.org | Perl 5 Official site]] * [[http://www.perl6.org/ | Perl 6 Official site]] * [[http://learn.perl.org/|learn.perl.org]] * [[http://www.perl.org/books/library.html|Perl.org]] online library * [[http://www.ebb.org/PickingUpPerl/|Picking Up Perl]] (online book) ===== Perl Execution ===== * Programs typically given .pl extension * Executed with //perl -w prog.pl// * Or use shell script type line at top of Perl script * #!/path/to/perl -w * and make the program executable with //chmod +x prog.pl// * Using the //"-w"// is encouraged for debugging. * //-w// issues //warnings// that would otherwise not be issued. ===== Perl Syntax (highlights) ===== * A Perl motto: "There is more than one way to do it." * Perl designed with this idea in mind * print "Hello, world!\n"; # semicolon is **mandatory** like C/C++/Java/others * Comments begin with #, as in shell scripts, Tcl and older languages. * print " 0.25" * 4, "\n"; # output: 1 * " 0.25" is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break ("\n") * automatic conversion of [[https://perldoc.perl.org/perldata|scalars]] for "[[https://perlmaven.com/automatic-value-conversion-or-casting-in-perl|contextual polymorphism]]" * leading space is ignored * Math operators are the familiar set from C/C++/Java * String concatenation (.) and repetition (x) operators * print "Ba" . "na" x 4, "\n"; # output: Banananana ===== More Perl Syntax ===== ==== variables ==== * Use $ all the time in front of variable names. $name = "fred"; # variables are dynamically typed print "My name is ", $name, "\n"; ==== variable scope ==== * Variables are global if not declared local with ''my $variable''. * local Perl variables also called //lexical// $name = "fred"; # This is global $name. # Insert a block { my $name = "joe"; # This is local $name because of 'my'. print "Block local \$name is $name\n"; # Using double quotes allows $name interpolation as # part of the print statement. The backslashed $ # (\$name) suppresses variable interpolation. } print "Global \$name is ", $name, "\n"; * Some advocate using the ''my'' keyword for all variables, including global vars. ==== standard input ==== * Perl standard input with //// print "Please enter something interesting: \n"; $comment = ; print "You entered: $comment\n"; * Standard input can be from the keyboard or redirected from a file or from another program. * Example of redirecting stdin from a file: $ perl -w stdin_comment.pl < comment.txt * Examples of redirecting stdin from a program: $ echo "The ripest fruit falls first." | perl -w stdin_comment.pl # If you have the fortune command installed ... $ fortune | perl -w stdin_comment.pl * '''' reads //up to and including// the newline character * The newline could be the user hitting after inputting something or the newline at the end of a line in a text file. * See [[cs370:cs_370_-_unix_shells_and_shell_scripting#standard_file_descriptors|standard file descriptors (CS-370)]] ==== chomp() and chop() ==== * The Perl behavior of including newlines often needs to be accounted for. print "Enter a five letter word guess, preferably \"Yoink\": "; $userguess = ; chomp($userguess); # Removes trailing newline character only. # Test this without the chomp statement. $secretword = "Yoink"; print "The result of the comparison: ", $userguess eq $secretword, "\n"; # String comparison with 'eq' operator; # returns empty string if strings not equal; # returns 1 if strings equal. * //chop()// function removes last character of string, whether it's a newline or not. ==== string functions ==== * length(): returns length of a string print "Enter a string: "; my $inpString = ; chomp($inpString); # Must do this, else length will return length + 1. print "$inpString is ", length($inpString), " chars long.\n"; * lc(): converts all characters in a string to lowercase. my $string = "Hello, World!"; my $lowercase = lc($string); print "$lowercase\n"; * uc(): converts all characters in a string to uppercase. my $string = "Hello, World!"; my $uppercase = uc($string); print "$uppercase\n"; * index(): returns the position of the first occurrence of a substring within a string. my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = index($string, $substring); print "$position\n"; # Output: 6 * rindex(): returns the position of the last occurrence of a substring within a string. my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = rindex($string, $substring); print "$position\n"; # Output: 17 * substr(): extracts a substring from a string. my $string = "Hello, World!"; my $substring = substr($string, 7, 5); # Starting at position 7, extract 5 characters print "$substring\n"; # Output: World ==== The die() controlled exit function ==== * Commonly seen is the Perl function, //die()//. print "Enter a string to pass to die(): "; chomp($string = ); die($string); # Outputs to STDERR, not STDOUT print "This will not be printed."; * die() outputs the Perl program name and the line number of the die() statement if the string argument **does not** end in newline, thus, the //chomp()// statement above. ==== selection statements, familiar ==== * Familar (C/C++/Java syntax) if ( $number != 0 ) { $result = 100 / $number; } if ( $password eq $guess ) { print "Pass, friend.\n"; } else { die "Go away, imposter!"; } ==== selection statements, elsif ==== if ( $password eq $guess ) { print "Pass, friend.\n"; } elsif ( $password eq "Meh" } { print "Meh!\n"; } else { die "Go away, imposter!"; } ==== selection statements, unless ==== # Assuming $a is boolean if ( not $a ) { print "\$a is not true\n"; } # can also be expressed... unless ( $a ) { print "\$a is not true\n"; } * Choose the syntax that best fits your own thought patterns * "There's more than one way to do it." ==== reverse selection statements ==== * Normal if ($number == 0) { die "Can't divide by 0"; } * Equivalent die "Can't divide by 0" if $number == 0; ==== repetition structures ==== * while, until, for, foreach, do..while, do..until * while, for, do..while are C/C++/Java standard until ( $countdown <= 0 ) { print "Counting down: $countdown\n"; $countdown--; } # "for each number in the list 1 through 10" foreach $number ( 1 .. 10 ) { print "The number is: $number\n"; } ==== while () ==== * //while ( $var = )// reads from standard input until end of file (control-d) * Reads a line at a time into $var while ( $var = ) { print $var; # Print each line of standard input. } * Can shorten to the following because it's such a common operation * $_ below is the //default Perl variable// while ( ) { print $_; } ==== lists ==== * Create Perl lists with () print( "Hello, ", "world", "\n" ); # 3 strings in a list being passed to print function print( 123, 456, 789 ); foreach $number ( 1 .. 10 ) { # ( 1 .. 10 ) creates the list of numbers from 1 to 10 * Create Perl lists with //qw// (quote words) qw/hello world good bye/ # creates a 4 word list ==== accessing list values ===== * Use square brackets print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] ); # output: mustard (count from zero) my $month = 3; print qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )[ $month ] # output: Apr ==== accessing list "slices" ==== * Get more than one list value at a time my $mone; my $mtwo; ( $mone, $mtwo ) = ( 1, 3 ); my $m1; my $m2; my $m3; ( $m1, $m2, $m3 ) = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )[ 2..4 ]; print $m1." ".$m2." ".$m3; ==== arrays ==== * Arrays are just //named// lists. * Whole arrays are called //@//arrayName (start with '@') * Individual array elements (//scalars//) are accessed as //$arrayName[ subscript ]// my @days = qw(Mon Tue Wed Thu Fri Sat Sun); print @days, "\n"; # Output: MonTueWedThuFriSatSun print "@days\n"; # Output: Mon Tue Wed Thu Fri Sat Sun # By enclosing in "", have "stringified" @days print $days[ 6 ], "\n"; # @days[ 6 ] can also be used, but see the warning at http://tiny.cc/f4t3vz ==== array size in $# ==== * //$#// is a special Perl variable * contains the last subscript (or index) of the array * Looping through an array my $i = 0; while ( $i <= $#arrayName ) { # Do something with $arrayName[ $i ] here $i++; } * or for ( my $i = 0; $i <= $#arrayName; $i++ ) { # Do something with $arrayName[ $i ] here } ==== arrays with foreach ==== foreach my $i ( @arrayName ) { # Does not use $#. # Do something with $i here. } ==== accessing array slices ==== my @days = qw(Mon Tue Wed Thu Fri Sat Sun); my @longweekend = @days[ 4..6 ]; # Note use of @ instead of $ before days. print "@longweekend\n"; # Output: Fri Sat Sun ==== array functions ==== * reverse() my @count = ( 1..5 ); foreach $each ( reverse( @count ) ) { print "$each...\n"; sleep 1; } * sort() my @unsorted = qw( Cohen Clapton Costello Rush ZZTop ); my @sorted = sort @unsorted; * //end// or //top// functions: push(), pop() my $hand; my @pile = ( "letter", "newspaper", "bill", "notepad" ); print "You pick up something off the top of the pile.\n"; $hand = pop @pile; # "notepad" is removed from end (top) of @pile print "You now have a $hand in your hand,\n \ and the pile contains:\n@pile"; print "You now put something on your pile.\n"; push @pile, "statement"; # "statement" is added to the end (top) of @pile print "Now the pile contains:\n@pile\n"; * //beginning// or //bottom// functions: shift(), unshift() my @array = (); # nothing in array unshift @array, "first"; print "Array is now: @array\n"; unshift @array, "second", "third"; print "Array is now: @array\n"; shift @array; print "Array is now: @array\n"; # //unshift// adds elements, //shift// deletes elements ==== hashes ==== * //hashes// or //associative arrays// * can be thought of as //unordered arrays//, using //keys// instead of array subscripts * Whole hashes are called //%//hashname (start with '%'). %where = ( Gary => "Dallas", Lucy => "Austin", Ian => "Houston", Samantha => "Seattle" ); * To the left of the //=>// are the hash //keys//, to the right of //=>// are the associated hash values. * hash keys must be unique ==== hash element access ==== * Individual hash elements are accessed with $hashname{key} my $who = "Ian"; my %where = ( Gary => "Piscataway", Lucy => "Hackensack", Ian => "Mahwah", Samantha => "Hoboken" ); print "Gary lives in ", $where{Gary}, "\n"; print "$who lives in $where{$who}\n"; ==== hash element adding/deleting ==== my %where = ( Gary => "Piscataway", Lucy => "Hackensack", Ian => "Mahwah", Samantha => "Hoboken" ); $where{Eva} = "Howell"; # We added a Eva => "Howell" key/value pair to the # %where hash delete $where{Gary}; # We deleted the Gary => "Piscataway" key/value pair ==== hash functions, iteration ==== * Can't use //while//, //for// or //foreach// to directly iterate through hashes * Perl provides //keys()//, //values()// and //each()// # use 'keys' function to iterate through hash keys foreach $who ( keys %where ) { print "$who lives in $where{$who}\n"; } # use 'values' function to iterate through hash values foreach $town ( values %where ) { print "someone lives in $town\n"; } # use 'each' function to iterate through hash key/value # pairs my ($name, $town); # an assignable list of variables while ( ($name, $town) = each %where ) { print "$name lives in $town\n"; } ==== hash functions, key existence ==== * Use //exists()// hash function to check if a key exists in a hash print "Gary exists in the hash!\n" if exists $where{Gary}; ==== subroutines ==== * subroutines are functions * define subroutine: sub example_subroutine { ... # subroutine body ... } * define and call subroutine greet(); sub greet { print "Hello, World!\n"; } ==== subroutines, arguments ==== * arguments are passed through another special Perl var, //@_// * notice that //@_// is a special //array// greet( "Jim", "Bob", "Russ" ); # There isn't a set number of function arguments or a # function "prototype" to speak of sub greet { foreach my $arg ( @_ ) { print "Hello $arg!\n"; } print "You're first, $_[ 0 ].\n"; print "You're second, $_[ 1 ].\n"; print "You're last, $_[ 2 ].\n"; } ==== subroutines, returns ==== * Use //return// * Can return a list of values. my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" ); print "($len1, $len2, $len3)\n"; sub greet { foreach my $arg ( @_ ) { print "Hello $arg!\n"; } return (length($_[0]), length($_[1]), length($_[2])); # return a list of 3 ints } ---- ---- ===== Perl and Regular Expressions ===== * For anything beyond trivial text processing with Perl and other languages, you need a basic understanding of //regular expressions// (regex). * See [[cs370:cs_370_-_regular_expressions | CS 370 intro to regular expressions]] for a start. * See links: * http://linuxreviews.org/beginner/tao_of_regular_expressions * =>=>=>=>=>=> https://perldoc.perl.org/perlrequick <=<=<=<=<=<= * http://en.wikipedia.org/wiki/Regular_expressions * [[http://docs.google.com/viewer?url=http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf|Regex chapter in "Beginning Perl"]] * [[https://perldoc.perl.org/perlre|Official Perl regular expressions documentation]] * Interactive regex testers: * http://www.regexr.com/ * http://regexpal.com/ ===== Perl and Regular Expressions (match, search/replace) ===== * Regular expression matching with the //=~// operator $var =~ m/regular expression/ # boolean (true if match, false if no match) # can omit the "m": $var =~ /regular expression/ $var !~ m/regular expression/ # true if not a match, false if a match * Substitution using regular expressions $var =~ s/search re/replace re/ my $name = "Joseph"; $name =~ s/[sph]//; # delete first instance of s, p or h print "$name\n"; $name =~ s/[sph]//g; # g -> delete every instance of s, p or h print "$name\n"; ===== Perl and Regular Expressions (split) ===== * //split()// function splits strings using a regular expression as delimiter * default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing split /\s+/, $_; # split the default variable, using one or more # whitespaces * The //split// regexp delimiter is usually something simple my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash"; my @fields = split /:/, $passwd; ===== Perl and Regular Expressions (join) ===== * //join()// function joins elements of a list using a specified delimiter string my $last = "Jones"; my $first = "Bob"; my $name = join ", ", ($last, $first); ===== Multiple Input Files with <> ===== * We have used the //// statement and //< filename// redirection to process files such as //cs498roster// * Suppose we want to be able to process files with a command like $ perl -w perlex5.pl cs498roster cs598roster * One way to accomplish this is with the //<>// (so-called //diamond//) while (<>) { print "text read: $_"; } * //<>// checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names) * If so, it reads the file(s) one at a time, one line at a time. ===== Perl Command Line Arguments w/ @ARGV ===== * The special Perl array //@ARGV// contains the command line arguments that a Perl script was invoked with. * //<>// gets file names from //@ARGV// by default foreach my $arg ( @ARGV ) { print $arg; } or for ( my $i = 0; $i <= $#ARGV; $i++ ) { print $ARGV[ $i ]; } or print "$_" foreach @ARGV; // Perl-speak ===== Perl Command Line Arguments w/ shift @ARGV ===== * Command line arguments aren't always file names. * Here, we're trying to get a help message from //perlex6.pl// by calling it with the //--help// command line argument * //--help// becomes the first element of //@ARGV//, //$ARGV[ 0 ]// $ perl -w perlex6.pl --help * Sometimes, command line arguments modify how a program will work. * Here, we're trying to make //perlex6.pl// sort rosters by last name: $ perl -w perlex6.pl --last cs498roster cs598roster * To remove an argument from //@ARGV//, //shift// it. my $arg0 = shift @ARGV; if ( $arg0 =~ /last/ ) ... # Process all non-file command line args before we # get to //while (<>)// while (<>) { ... ===== Perl References ===== * Perl references hold the locations of other pieces of data * The backslash "\" is used to create references. * Here, //$hash_r// becomes a reference to the memory location of //%hash//: my %hash = ( apple => "crab", pear => "asian" ); my $hash_r = \%hash; * Dereferencing * Use {} around reference name. my %hash2 = %{$hash_r}; # Set new %hash2 contents from the hash that is # referenced by $hash_r ===== Perl Objects ===== * "...an object can be anything -- it really depends on what your application is. ... If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object." * In Perl, what we see as an object is simply a reference... * In fact, you can convert any ordinary reference //into// an object simply by using the (Perl) //bless()// function. * Typically, however, objects are represented as references to a //hash// ===== Perl Objects and Modules Example ===== * Perl Classes are often called //Modules// * Here, we'll use the //Net::FTP// module (example only; rockhopper does not run an anonymous ftp server): use strict; use Net::FTP; # This requires that FTP.pm be stored somewhere on the # local system that Perl searches through for modules. my $ftp = Net::FTP->new("rockhopper.monmouth.edu") or die "Couldn't connect: $@\n"; # new() is a method of class Net::FTP; it's the # constructor. $ftp is our FTP session object. $ftp->login("anonymous"); # New::FTP->login() method $ftp->cwd("/"); $ftp->get("index.php"); $ftp->close(); ===== Perl Modules and @INC ===== * //@INC// contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm. $ perl -V ... @INC: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl . ===== Perl Modules and CPAN ===== * A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks * The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules. * Searchable at http://search.cpan.org * The CPAN module naming hierarchy places modules in categories such as ''Sort::Fields'', ''Sort::Versions'' or subcategories such as ''LWP::Protocol::http'' * On disk, the modules would look like ''.../Sort/Fields.pm'', ''.../Sort/Versions.pm'' * For the ''LWP::Protocol::http'' module, the full path to the module might be ''/usr/share/perl5/LWP/Protocol/http.pm'' on a Linux system. * How to know if a Perl module is installed? * You can run a Perl "one-liner" program using the "-e" option to check if a module is installed, e.g., # If the following runs with no errors, it means the LWP::Simple module is installed. perl -e "use LWP::Simple" ===== Perl Web Automation ===== * Perl modules are available for Web automation * Well-known modules are the LWP ("Library for WWW in Perl") group of modules and WWW::Mechanize. * LWP - https://www.perl.com/pub/2002/08/20/perlandlwp.html/ * WWW::Mechanize * https://www.perlmonks.org/?node_id=1037506 * http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod * Old article that describes Perl web automation: http://rockhopper.monmouth.edu/~jchung/docs/4915-1108-turoff.pdf * Retry Perl Exercise 7 using a Web automation module such as //LWP::Simple or WWW::Mechanize//. ===== Perl and System Commands ===== * For portability, use Perl equivalents to system commands, such as Perl's //chdir()//, function instead of //cd//. * NOTE: This is applicable to all programming languages that can call system commands. * But if using system commands is a necessity, use //system()// system( '/usr/games/fortune' ); * Command substitution is done using backticks `` * Sometimes we want to capture the output of system commands to use in a Perl script. my $sysdate = `date`;