====== Introduction to Perl ======

==== The "Practical Extraction and Report Language" ====

==== The "Pathologically Eclectic Rubbish Lister" =====

===== Perl =====

  * Creator: Larry Wall
  * Introduced: 1987
  * Open source
  * Comes standard on many UNIX/Linux systems
  * MacOS users are encouraged to use the [[https://perlbrew.pl|PerlBrew]] distribution of Perl. 
  * Windows installer obtainable from http://strawberryperl.com and http://activestate.com
    * [[https://learn.perl.org/installing/windows.html|Strawberry Perl]] is recommended.
  * Originally developed for text record manipulation
  * Used for system administration, web development, network programming, system exploit testing

===== Perl Uses =====

  * CGI (web application) programming
     * Written in Perl (or was written in Perl originally): [[http://www.slashcode.com/ | Slash]], Bugzilla, TWiki, Movable Type
       * Wikipedia originally used a Wiki engine (CGI program) written in Perl called [[https://wikiless.tiekoetter.com/wiki/UseModWiki|UseModWiki]].
       * Perl CGI-type programs typically communicate with database backends
     * bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point.
  * Large-scale text processing for report generation
    * Has found past use in finance and [[http://shop.oreilly.com/product/9780596000806.do|bioinformatics]] fields for its ability to handle large data sets
===== Perl Online Resources =====

  * [[http://www.perl.org | Perl 5 Official site]]
  * [[http://www.perl6.org/ | Perl 6 Official site]]

  * [[http://learn.perl.org/|learn.perl.org]]
  * [[http://www.perl.org/books/library.html|Perl.org]] online library
  * [[http://www.ebb.org/PickingUpPerl/|Picking Up Perl]] (online book)


===== Perl Execution =====

  * Programs typically given .pl extension
  * Executed with //perl -w prog.pl//
  * Or use shell script type line at top of Perl script
    * #!/path/to/perl -w
    * and make the program executable with //chmod +x prog.pl//
  * Using the //"-w"// is encouraged for debugging.
    * //-w// issues //warnings// that would otherwise not be issued.

===== Perl Syntax (highlights) =====

  * A Perl motto:  "There is more than one way to do it."
    * Perl designed with this idea in mind

  * print "Hello, world!\n";    # semicolon is **mandatory** like C/C++/Java/others
    * Comments begin with #, as in shell scripts, Tcl and older languages.

  * print " 0.25" * 4, "\n";    # output: 1
    * " 0.25" is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break ("\n")
    * automatic conversion of [[https://perldoc.perl.org/perldata|scalars]] for "[[https://perlmaven.com/automatic-value-conversion-or-casting-in-perl|contextual polymorphism]]"
    * leading space is ignored

  * Math operators are the familiar set from C/C++/Java

  * String concatenation (.) and repetition (x) operators
    * print "Ba" . "na" x 4, "\n";   # output: Banananana


===== More Perl Syntax =====

==== variables ====

  * Use $ all the time in front of variable names.

<code>
$name = "fred"; # variables are dynamically typed
print "My name is ", $name, "\n";
</code>

==== variable scope ====

  * Variables are global if not declared local with ''my $variable''.
    * local Perl variables also called //lexical//

<code>
$name = "fred"; # This is global $name.
          
# Insert a block
{
   my $name = "joe"; 
      # This is local $name because of 'my'.
   print "Block local \$name is $name\n";
      # Using double quotes allows $name interpolation as
      # part of the print statement. The backslashed $ 
      # (\$name) suppresses variable interpolation.
}

print "Global \$name is ", $name, "\n";
</code>

  * Some advocate using the ''my'' keyword for all variables, including global vars.

==== standard input ====

  * Perl standard input with //<STDIN>//

<code>
print "Please enter something interesting: \n";
$comment = <STDIN>;
          
print "You entered: $comment\n";
</code>

  * Standard input can be from the keyboard or redirected from a file or from another program.
    * Example of redirecting stdin from a file:

<code>
$ perl -w stdin_comment.pl < comment.txt
</code>

    * Examples of redirecting stdin from a program:

<code>
$ echo "The ripest fruit falls first." | perl -w stdin_comment.pl

# If you have the fortune command installed ...
$ fortune | perl -w stdin_comment.pl
</code>

  * ''<STDIN>'' reads //up to and including// the newline character

    * The newline could be the user hitting <Enter> after inputting something or the newline at the end of a line in a text file.

  * See [[cs370:cs_370_-_unix_shells_and_shell_scripting#standard_file_descriptors|standard file descriptors (CS-370)]]

==== chomp() and chop() ====

  * The Perl <STDIN> behavior of including newlines often needs to be accounted for.

<code>
print "Enter a five letter word guess, preferably \"Yoink\": ";
$userguess = <STDIN>;
chomp($userguess);
   # Removes trailing newline character only.
   # Test this without the chomp statement.
            
$secretword = "Yoink";
            
print "The result of the comparison: ", $userguess eq $secretword, "\n";
   # String comparison with 'eq' operator;
   # returns empty string if strings not equal;
   # returns 1 if strings equal.
</code>

  * //chop()// function removes last character of string, whether it's a newline or not.

==== string functions ====

  * length(): returns length of a string

<code>
print "Enter a string: ";
my $inpString = <STDIN>;
chomp($inpString); # Must do this, else length will return length + 1.
print "$inpString is ", length($inpString), " chars long.\n";
</code>

  * lc(): converts all characters in a string to lowercase.

<code>
my $string = "Hello, World!";
my $lowercase = lc($string);
print "$lowercase\n";
</code>

  * uc(): converts all characters in a string to uppercase.

<code>
my $string = "Hello, World!";
my $uppercase = uc($string);
print "$uppercase\n";
</code>

  * index(): returns the position of the first occurrence of a substring within a string.

<code>
my $string = "Larry Wall Larry Wall";
my $substring = "Wall";
my $position = index($string, $substring);
print "$position\n";  # Output: 6
</code>

  * rindex(): returns the position of the last occurrence of a substring within a string.

<code>
my $string = "Larry Wall Larry Wall";
my $substring = "Wall";
my $position = rindex($string, $substring);
print "$position\n";  # Output: 17
</code>

  * substr(): extracts a substring from a string.

<code>
my $string = "Hello, World!";
my $substring = substr($string, 7, 5);
   # Starting at position 7, extract 5 characters
print "$substring\n";  # Output: World
</code>

==== The die() controlled exit function ====

  * Commonly seen is the Perl function, //die()//.

<code>
print "Enter a string to pass to die(): ";
chomp($string = <STDIN>);
            
die($string); # Outputs to STDERR, not STDOUT

print "This will not be printed.";
</code>

    * die() outputs the Perl program name and the line number of the die() statement if the string argument **does not** end in newline, thus, the //chomp()// statement above.

==== selection statements, familiar ====

  * Familar (C/C++/Java syntax)

<code>
if ( $number != 0 ) {
   $result = 100 / $number;
}

if ( $password eq $guess ) {
   print "Pass, friend.\n";
} else {
   die "Go away, imposter!";
}
</code>

==== selection statements, elsif ====

<code>
if ( $password eq $guess ) {
   print "Pass, friend.\n";
} elsif ( $password eq "Meh" } {
   print "Meh!\n";
} else {
   die "Go away, imposter!";
}
</code>

==== selection statements, unless ====

<code>
# Assuming $a is boolean
if ( not $a ) {
   print "\$a is not true\n";
}

# can also be expressed...

unless ( $a ) {
   print "\$a is not true\n";
}
</code>

  * Choose the syntax that best fits your own thought patterns
    * "There's more than one way to do it."

==== reverse selection statements ====

  * Normal

<code>
if ($number == 0) {
   die "Can't divide by 0";
}
</code>

  * Equivalent

<code>
die "Can't divide by 0" if $number == 0;
</code>

==== repetition structures ====

  * while, until, for, foreach, do..while, do..until
    * while, for, do..while are C/C++/Java standard

<code>
until ( $countdown <= 0 ) {
   print "Counting down: $countdown\n";
   $countdown--;
}

# "for each number in the list 1 through 10"
foreach $number ( 1 .. 10 ) {
   print "The number is: $number\n";
}
</code>

==== while (<STDIN>) ====

  * //while ( $var = <STDIN> )// reads from standard input until end of file (control-d)
    * Reads a line at a time into $var

<code>
while ( $var = <STDIN> ) {
   print $var;
	  # Print each line of standard input.
}
</code>

    * Can shorten to the following because it's such a common operation
      * $_ below is the //default Perl variable//

<code>
while ( <STDIN> ) {
   print $_;
}
</code>

==== lists ====

  * Create Perl lists with ()

<code>
print( "Hello, ", "world", "\n" );
   # 3 strings in a list being passed to print function

print( 123, 456, 789 );

foreach $number ( 1 .. 10 ) {
   # ( 1 .. 10 ) creates the list of numbers from 1 to 10
</code>

  * Create Perl lists with //qw// (quote words)

<code>
qw/hello world good bye/
   # creates a 4 word list
</code>

==== accessing list values =====

  * Use square brackets

<code>
print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] );
   # output: mustard (count from zero)

my $month = 3;

print qw(
   Jan Feb Mar
   Apr May Jun
   Jul Aug Sep
   Oct Nov Dec
)[ $month ]
   # output: Apr
</code>

==== accessing list "slices" ====

  * Get more than one list value at a time

<code>
my $mone;
my $mtwo;
( $mone, $mtwo ) = ( 1, 3 );


my $m1;
my $m2;
my $m3;

( $m1, $m2, $m3 ) = qw(
                      Jan Feb Mar
                      Apr May Jun
                      Jul Aug Sep
                      Oct Nov Dec
                      )[ 2..4 ];

print $m1." ".$m2." ".$m3;
</code>

==== arrays ====

  * Arrays are just //named// lists.
  * Whole arrays are called //@//arrayName (start with '@')
    * Individual array elements (//scalars//) are accessed as //$arrayName[ subscript ]//

<code>
my @days = qw(Mon Tue Wed Thu Fri Sat Sun);
print @days, "\n";
   # Output: MonTueWedThuFriSatSun
print "@days\n";
   # Output: Mon Tue Wed Thu Fri Sat Sun
   # By enclosing in "", have "stringified" @days
print $days[ 6 ], "\n"; 
# @days[ 6 ] can also be used, but see the warning at https://stackoverflow.com/a/53732305
</code>

==== array size in $# ====

  * //$#// is a special Perl variable
    * contains the last subscript (or index) of the array

  * Looping through an array

<code>
my $i = 0;
while ( $i <= $#arrayName ) {
   # Do something with $arrayName[ $i ] here
   $i++;
}
</code>

    * or

<code>
for ( my $i = 0; $i <= $#arrayName; $i++ ) {
   # Do something with $arrayName[ $i ] here
}
</code>
==== arrays with foreach ====

<code>
foreach my $i ( @arrayName ) {
   # Does not use $#.
   # Do something with $i here.
}
</code>

==== accessing array slices ====

<code>
my @days = qw(Mon Tue Wed Thu Fri Sat Sun);
my @longweekend = @days[ 4..6 ];
   # Note use of @ instead of $ before days.
print "@longweekend\n";
   # Output: Fri Sat Sun
</code>
==== array functions ====

  * reverse()

<code>
my @count = ( 1..5 );

foreach $each ( reverse( @count ) ) {
  print "$each...\n";
  sleep 1;
}
</code>

  * sort()

<code>
my @unsorted = qw( Cohen Clapton Costello Rush ZZTop );
my @sorted = sort @unsorted;
</code>

  * //end// or //top// functions: push(), pop()

<code>
my $hand;
my @pile = ( "letter", "newspaper", "bill", "notepad" );
print "You pick up something off the top of the pile.\n";
$hand = pop @pile;
   # "notepad" is removed from end (top) of @pile
print "You now have a $hand in your hand,\n \
and the pile contains:\n@pile";

print "You now put something on your pile.\n";
push @pile, "statement";
   # "statement" is added to the end (top) of @pile
print "Now the pile contains:\n@pile\n";
</code>

  * //beginning// or //bottom// functions: shift(), unshift()

<code>
my @array = (); # nothing in array
unshift @array, "first";
print "Array is now: @array\n";
unshift @array, "second", "third";
print "Array is now: @array\n";
shift @array;
print "Array is now: @array\n";
   # //unshift// adds elements, //shift// deletes elements
</code>
==== hashes ====

  * //hashes// or //associative arrays//
    * can be thought of as //unordered arrays//, using //keys// instead of array subscripts

  * Whole hashes are called //%//hashname (start with '%').

<code>
%where = (
   Gary => "Dallas",
   Lucy => "Austin",
   Ian  => "Houston",
   Samantha => "Seattle"
);
</code>

    * To the left of the //=>// are the hash //keys//, to the right of //=>// are the associated hash values.
      * hash keys must be unique

==== hash element access ====

  * Individual hash elements are accessed with $hashname{key}

<code>
my $who = "Ian";

my %where = (
   Gary => "Piscataway",
   Lucy => "Hackensack",
   Ian  => "Mahwah",
   Samantha => "Hoboken"
);

print "Gary lives in ", $where{Gary}, "\n";
print "$who lives in $where{$who}\n";
</code>

==== hash element adding/deleting ====

<code>
my %where = (
   Gary => "Piscataway",
   Lucy => "Hackensack",
   Ian  => "Mahwah",
   Samantha => "Hoboken"
);

$where{Eva} = "Howell";
   # We added a Eva => "Howell" key/value pair to the
   # %where hash

delete $where{Gary};
   # We deleted the Gary => "Piscataway" key/value pair
</code>

==== hash functions, iteration ====

  * Can't use //while//, //for// or //foreach// to directly iterate through hashes
    * Perl provides //keys()//, //values()// and //each()//

<code>
# use 'keys' function to iterate through hash keys
foreach $who ( keys %where ) {
   print "$who lives in $where{$who}\n";
}


# use 'values' function to iterate through hash values
foreach $town ( values %where ) {
   print "someone lives in $town\n";
}


# use 'each' function to iterate through hash key/value
# pairs
my ($name, $town);
   # an assignable list of variables
while ( ($name, $town) = each %where ) {
   print "$name lives in $town\n";
}
</code>

==== hash functions, key existence ====

  * Use //exists()// hash function to check if a key exists in a hash

  print "Gary exists in the hash!\n" if exists $where{Gary};

==== subroutines ====

  * subroutines are functions
  * define subroutine:

<code>
sub example_subroutine {
  ...
  # subroutine body
  ...
}
</code>

  * define and call subroutine

<code>
greet();

sub greet {
   print "Hello, World!\n";
}
</code>
==== subroutines, arguments ====

  * arguments are passed through another special Perl var, //@_//
    * notice that //@_// is a special //array//

<code>
greet( "Jim", "Bob", "Russ" );
   # There isn't a set number of function arguments or a
   # function "prototype" to speak of

sub greet {
   foreach my $arg ( @_ ) {
	  print "Hello $arg!\n";
   }

   print "You're first, $_[ 0 ].\n";
   print "You're second, $_[ 1 ].\n";
   print "You're last, $_[ 2 ].\n";
}
</code>

==== subroutines, returns ====

  * Use //return//
    * Can return a list of values.

<code>
my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" );
print "($len1, $len2, $len3)\n";

sub greet {
   foreach my $arg ( @_ ) {
	  print "Hello $arg!\n";
   }

   return (length($_[0]), length($_[1]), length($_[2]));
	  # return a list of 3 ints
}
</code>

----
----
===== Perl and Regular Expressions =====

  * For anything beyond trivial text processing with Perl and other languages, you need a basic understanding of //regular expressions// (regex).
  * See [[cs370:cs_370_-_regular_expressions | CS 370 intro to regular expressions]] for a start.
    * Also, see links:
      * http://linuxreviews.org/beginner/tao_of_regular_expressions
      * =>=>=>=>=>=> https://regexone.com <=<=<=<=<=<=
      * =>=>=>=>=>=> https://perldoc.perl.org/perlrequick <=<=<=<=<=<=
      * http://en.wikipedia.org/wiki/Regular_expressions
      * [[http://docs.google.com/viewer?url=http://blob.perl.org/books/beginning-perl/3145_Chap05.pdf|Regex chapter in "Beginning Perl"]]
      * [[https://perldoc.perl.org/perlre|Official Perl regular expressions documentation]]
      * Interactive regex testers:
        * http://www.regexr.com/
        * http://regexpal.com/


===== Perl and Regular Expressions (match, search/replace) =====

  * Regular expression matching with the //=~// operator

<code>
$var =~ m/regular expression/
   # boolean (true if match, false if no match)
   # can omit the "m": $var =~ /regular expression/
   
$var !~ m/regular expression/
   # true if not a match, false if a match
</code>

  * Substitution using regular expressions

<code>
$var =~ s/search re/replace re/

my $name = "Joseph";
$name =~ s/[sph]//;  # delete first instance of s, p or h
print "$name\n";
$name =~ s/[sph]//g; # g -> delete every instance of s, p or h
print "$name\n";
</code>

===== Perl and Regular Expressions (split) =====

  * //split()// function splits strings using a regular expression as delimiter 
    * default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing

<code>
split /\s+/, $_;
   # split the default variable, using one or more
   # whitespaces
</code>

  * The //split// regexp delimiter is usually something simple

<code>
my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash";
my @fields = split /:/, $passwd;
</code>

===== Perl and Regular Expressions (join) =====

  * //join()// function joins elements of a list using a specified delimiter string

<code>
my $last = "Jones";
my $first = "Bob";
my $name = join ", ", ($last, $first);
</code>

===== Multiple Input Files with <> =====

  * We have used the //<STDIN>// statement and //< filename// redirection to process files such as //cs498roster//

  * Suppose we want to be able to process files with a command like

  $ perl -w perlex5.pl cs498roster cs598roster

  * One way to accomplish this is with the //<>// (so-called //diamond//)

<code>
while (<>) {
	print "text read: $_";
}
</code>

  * //<>// checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names)
    * If so, it reads the file(s) one at a time, one line at a time.
     

===== Perl Command Line Arguments w/ @ARGV =====

  * The special Perl array //@ARGV// contains the command line arguments that a Perl script was invoked with.
    * //<>// gets file names from //@ARGV// by default

<code>
foreach my $arg ( @ARGV ) {
	print $arg;
}

or

for ( my $i = 0; $i <= $#ARGV; $i++ ) {
	print $ARGV[ $i ];
}

or

print "$_" foreach @ARGV; // Perl-speak
</code>

===== Perl Command Line Arguments w/ shift @ARGV =====

  * Command line arguments aren't always file names.
    * Here, we're trying to get a help message from ''perlex6.pl'' by calling it with the ''%%--%%help'' command line argument
      * ''%%--%%help'' becomes the first element of ''@ARGV'', ''$ARGV[ 0 ]''

  $ perl -w perlex6.pl --help

    * Sometimes, command line arguments modify how a program will work.
      * Here, we're trying to make ''perlex6.pl'' sort rosters by last name:

  $ perl -w perlex6.pl --last cs498roster cs598roster

  * To remove an argument from ''@ARGV'', ''shift'' it.

<code>
my $arg0 = shift @ARGV;
if ( $arg0 =~ /last/ ) ...
   # Process all non-file command line args before we
   # get to //while (<>)//

                while (<>) { ...
</code>

===== Perl References =====

  * Perl references hold the locations of other pieces of data
    * The backslash "\" is used to create references.
    * Here, //$hash_r// becomes a reference to the memory location of //%hash//:

<code>
my %hash = ( 
	apple => "crab",
	pear => "asian"
	);
my $hash_r = \%hash;
</code>

  * Dereferencing
    * Use {} around reference name.

<code>
my %hash2 = %{$hash_r};
	# Set new %hash2 contents from the hash that is
	# referenced by $hash_r
</code>

===== Perl Objects =====

  * "...an object can be anything %%--%% it really depends on what your application is. ... If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object."

  * In Perl, what we see as an object is simply a reference...
    * In fact, you can convert any ordinary reference //into// an object simply by using the (Perl) //bless()// function.
    * Typically, however, objects are represented as references to a //hash//

===== Perl Objects and Modules Example =====

  * Perl Classes are often called //Modules//
    * Here, we'll use the //Net::FTP// module (example only; rockhopper does not run an anonymous ftp server):

<code>
use strict;
use Net::FTP;
   # This requires that FTP.pm be stored somewhere on the
   # local system that Perl searches through for modules.

my $ftp = Net::FTP->new("rockhopper.monmouth.edu")
   or die "Couldn't connect: $@\n";
	  # new() is a method of class Net::FTP; it's the
	  # constructor. $ftp is our FTP session object.

$ftp->login("anonymous");
   # New::FTP->login() method
$ftp->cwd("/");
$ftp->get("index.php");
$ftp->close();
</code>

===== Perl Modules and @INC =====

  * //@INC// contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm.

<code>
$ perl -V
...
@INC:
	/etc/perl
	/usr/local/lib/perl/5.8.8
	/usr/local/share/perl/5.8.8
	/usr/lib/perl5
	/usr/share/perl5
	/usr/lib/perl/5.8
	/usr/share/perl/5.8
	/usr/local/lib/site_perl
                .
</code>

===== Perl Modules and CPAN =====

  * A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks
  * The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules.
    * Searchable at http://search.cpan.org
  * The CPAN module naming hierarchy places modules in categories such as ''Sort::Fields'', ''Sort::Versions'' or subcategories such as ''LWP::Protocol::http''
    * On disk, the modules would look like ''.../Sort/Fields.pm'', ''.../Sort/Versions.pm''
    * For the ''LWP::Protocol::http'' module, the full path to the module might be ''/usr/share/perl5/LWP/Protocol/http.pm'' on a Linux system.
  * How to know if a Perl module is installed?
    * You can run a Perl "one-liner" program using the "-e" option to check if a module is installed, e.g.,

<code>
# If the following runs with no errors, it means the LWP::Simple module is installed.
perl -e "use LWP::Simple"
</code>

===== Perl Web Automation =====

  * Perl modules are available for Web automation
    * Well-known modules are the LWP ("Library for WWW in Perl") group of modules and WWW::Mechanize.
      * LWP - https://www.perl.com/pub/2002/08/20/perlandlwp.html/
      * WWW::Mechanize
        * https://www.perlmonks.org/?node_id=1037506
        * http://search.cpan.org/dist/WWW-Mechanize/lib/WWW/Mechanize/Examples.pod
      * Old article that describes Perl web automation: http://rockhopper.monmouth.edu/~jchung/docs/4915-1108-turoff.pdf

  * Retry Perl Exercise 7 using a Web automation module such as //LWP::Simple or WWW::Mechanize//.

===== Perl and System Commands =====

  * For portability, use Perl equivalents to system commands, such as Perl's //chdir()//, function instead of //cd//.
    * NOTE: This is applicable to all programming languages that can call system commands.

  * But if using system commands is a necessity, use //system()//

  system( '/usr/games/fortune' );

  * Command substitution is done using backticks ``
    * Sometimes we want to capture the output of system commands to use in a Perl script. 

  my $sysdate = `date`;