cs498gpl:introduction_to_perl
                Table of Contents
Introduction to Perl
The "Practical Extraction and Report Language"
The "Pathologically Eclectic Rubbish Lister"
Perl
- Creator: Larry Wall
 - Introduced: 1987
 - Open source
 - Comes standard on many UNIX/Linux systems
 - MacOS users are encouraged to use the PerlBrew distribution of Perl.
 - Windows installer obtainable from http://strawberryperl.com and http://activestate.com
- Strawberry Perl is recommended.
 
 - Originally developed for text record manipulation
 - Used for system administration, web development, network programming, system exploit testing
 
Perl Uses
- CGI (web application) programming
- Written in Perl (or was written in Perl originally): Slash, Bugzilla, TWiki, Movable Type
- Wikipedia originally used a Wiki engine (CGI program) written in Perl called UseModWiki.
 - Perl CGI-type programs typically communicate with database backends
 
 - bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point.
 
 - Large-scale text processing for report generation
- Has found past use in finance and bioinformatics fields for its ability to handle large data sets
 
 
Perl Online Resources
- Perl.org online library
 - Picking Up Perl (online book)
 
Perl Execution
- Programs typically given .pl extension
 - Executed with perl -w prog.pl
 - Or use shell script type line at top of Perl script
- #!/path/to/perl -w
 - and make the program executable with chmod +x prog.pl
 
 - Using the “-w” is encouraged for debugging.
- -w issues warnings that would otherwise not be issued.
 
 
Perl Syntax (highlights)
- A Perl motto: “There is more than one way to do it.”
- Perl designed with this idea in mind
 
 
- print “Hello, world!\n”; # semicolon is mandatory like C/C++/Java/others
- Comments begin with #, as in shell scripts, Tcl and older languages.
 
 
- print “ 0.25” * 4, “\n”; # output: 1
- “ 0.25” is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break (“\n”)
 - automatic conversion of scalars for “contextual polymorphism”
 - leading space is ignored
 
 
- Math operators are the familiar set from C/C++/Java
 
- String concatenation (.) and repetition (x) operators
- print “Ba” . “na” x 4, “\n”; # output: Banananana
 
 
More Perl Syntax
variables
- Use $ all the time in front of variable names.
 
$name = "fred"; # variables are dynamically typed print "My name is ", $name, "\n";
variable scope
- Variables are global if not declared local with
my $variable.- local Perl variables also called lexical
 
 
$name = "fred"; # This is global $name.
          
# Insert a block
{
   my $name = "joe"; 
      # This is local $name because of 'my'.
   print "Block local \$name is $name\n";
      # Using double quotes allows $name interpolation as
      # part of the print statement. The backslashed $ 
      # (\$name) suppresses variable interpolation.
}
print "Global \$name is ", $name, "\n";
- Some advocate using the
mykeyword for all variables, including global vars. 
standard input
- Perl standard input with <STDIN>
 
print "Please enter something interesting: \n";
$comment = <STDIN>;
          
print "You entered: $comment\n";
- Standard input can be from the keyboard or redirected from a file or from another program.
- Example of redirecting stdin from a file:
 
 
$ perl -w stdin_comment.pl < comment.txt
- Examples of redirecting stdin from a program:
 
$ echo "The ripest fruit falls first." | perl -w stdin_comment.pl # If you have the fortune command installed ... $ fortune | perl -w stdin_comment.pl
<STDIN>reads up to and including the newline character
- The newline could be the user hitting <Enter> after inputting something or the newline at the end of a line in a text file.
 
chomp() and chop()
- The Perl <STDIN> behavior of including newlines often needs to be accounted for.
 
print "Enter a five letter word guess, preferably \"Yoink\": ";
$userguess = <STDIN>;
chomp($userguess);
   # Removes trailing newline character only.
   # Test this without the chomp statement.
            
$secretword = "Yoink";
            
print "The result of the comparison: ", $userguess eq $secretword, "\n";
   # String comparison with 'eq' operator;
   # returns empty string if strings not equal;
   # returns 1 if strings equal.
- chop() function removes last character of string, whether it's a newline or not.
 
string functions
- length(): returns length of a string
 
print "Enter a string: "; my $inpString = <STDIN>; chomp($inpString); # Must do this, else length will return length + 1. print "$inpString is ", length($inpString), " chars long.\n";
- lc(): converts all characters in a string to lowercase.
 
my $string = "Hello, World!"; my $lowercase = lc($string); print "$lowercase\n";
- uc(): converts all characters in a string to uppercase.
 
my $string = "Hello, World!"; my $uppercase = uc($string); print "$uppercase\n";
- index(): returns the position of the first occurrence of a substring within a string.
 
my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = index($string, $substring); print "$position\n"; # Output: 6
- rindex(): returns the position of the last occurrence of a substring within a string.
 
my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = rindex($string, $substring); print "$position\n"; # Output: 17
- substr(): extracts a substring from a string.
 
my $string = "Hello, World!"; my $substring = substr($string, 7, 5); # Starting at position 7, extract 5 characters print "$substring\n"; # Output: World
The die() controlled exit function
- Commonly seen is the Perl function, die().
 
print "Enter a string to pass to die(): ";
chomp($string = <STDIN>);
            
die($string); # Outputs to STDERR, not STDOUT
print "This will not be printed.";
- die() outputs the Perl program name and the line number of the die() statement if the string argument does not end in newline, thus, the chomp() statement above.
 
selection statements, familiar
- Familar (C/C++/Java syntax)
 
if ( $number != 0 ) {
   $result = 100 / $number;
}
if ( $password eq $guess ) {
   print "Pass, friend.\n";
} else {
   die "Go away, imposter!";
}
selection statements, elsif
if ( $password eq $guess ) {
   print "Pass, friend.\n";
} elsif ( $password eq "Meh" } {
   print "Meh!\n";
} else {
   die "Go away, imposter!";
}
selection statements, unless
# Assuming $a is boolean
if ( not $a ) {
   print "\$a is not true\n";
}
# can also be expressed...
unless ( $a ) {
   print "\$a is not true\n";
}
- Choose the syntax that best fits your own thought patterns
- “There's more than one way to do it.”
 
 
reverse selection statements
- Normal
 
if ($number == 0) {
   die "Can't divide by 0";
}
- Equivalent
 
die "Can't divide by 0" if $number == 0;
repetition structures
- while, until, for, foreach, do..while, do..until
- while, for, do..while are C/C++/Java standard
 
 
until ( $countdown <= 0 ) {
   print "Counting down: $countdown\n";
   $countdown--;
}
# "for each number in the list 1 through 10"
foreach $number ( 1 .. 10 ) {
   print "The number is: $number\n";
}
while (<STDIN>)
- while ( $var = <STDIN> ) reads from standard input until end of file (control-d)
- Reads a line at a time into $var
 
 
while ( $var = <STDIN> ) {
   print $var;
	  # Print each line of standard input.
}
- Can shorten to the following because it's such a common operation
- $_ below is the default Perl variable
 
 
while ( <STDIN> ) {
   print $_;
}
lists
- Create Perl lists with ()
 
print( "Hello, ", "world", "\n" );
   # 3 strings in a list being passed to print function
print( 123, 456, 789 );
foreach $number ( 1 .. 10 ) {
   # ( 1 .. 10 ) creates the list of numbers from 1 to 10
- Create Perl lists with qw (quote words)
 
qw/hello world good bye/ # creates a 4 word list
accessing list values
- Use square brackets
 
print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] ); # output: mustard (count from zero) my $month = 3; print qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )[ $month ] # output: Apr
accessing list "slices"
- Get more than one list value at a time
 
my $mone;
my $mtwo;
( $mone, $mtwo ) = ( 1, 3 );
my $m1;
my $m2;
my $m3;
( $m1, $m2, $m3 ) = qw(
                      Jan Feb Mar
                      Apr May Jun
                      Jul Aug Sep
                      Oct Nov Dec
                      )[ 2..4 ];
print $m1." ".$m2." ".$m3;
arrays
- Arrays are just named lists.
 - Whole arrays are called @arrayName (start with '@')
- Individual array elements (scalars) are accessed as $arrayName[ subscript ]
 
 
my @days = qw(Mon Tue Wed Thu Fri Sat Sun); print @days, "\n"; # Output: MonTueWedThuFriSatSun print "@days\n"; # Output: Mon Tue Wed Thu Fri Sat Sun # By enclosing in "", have "stringified" @days print $days[ 6 ], "\n"; # @days[ 6 ] can also be used, but see the warning at https://stackoverflow.com/a/53732305
array size in $#
- $# is a special Perl variable
- contains the last subscript (or index) of the array
 
 
- Looping through an array
 
my $i = 0;
while ( $i <= $#arrayName ) {
   # Do something with $arrayName[ $i ] here
   $i++;
}
- or
 
for ( my $i = 0; $i <= $#arrayName; $i++ ) {
   # Do something with $arrayName[ $i ] here
}
arrays with foreach
foreach my $i ( @arrayName ) {
   # Does not use $#.
   # Do something with $i here.
}
accessing array slices
my @days = qw(Mon Tue Wed Thu Fri Sat Sun); my @longweekend = @days[ 4..6 ]; # Note use of @ instead of $ before days. print "@longweekend\n"; # Output: Fri Sat Sun
array functions
- reverse()
 
my @count = ( 1..5 );
foreach $each ( reverse( @count ) ) {
  print "$each...\n";
  sleep 1;
}
- sort()
 
my @unsorted = qw( Cohen Clapton Costello Rush ZZTop ); my @sorted = sort @unsorted;
- end or top functions: push(), pop()
 
my $hand; my @pile = ( "letter", "newspaper", "bill", "notepad" ); print "You pick up something off the top of the pile.\n"; $hand = pop @pile; # "notepad" is removed from end (top) of @pile print "You now have a $hand in your hand,\n \ and the pile contains:\n@pile"; print "You now put something on your pile.\n"; push @pile, "statement"; # "statement" is added to the end (top) of @pile print "Now the pile contains:\n@pile\n";
- beginning or bottom functions: shift(), unshift()
 
my @array = (); # nothing in array unshift @array, "first"; print "Array is now: @array\n"; unshift @array, "second", "third"; print "Array is now: @array\n"; shift @array; print "Array is now: @array\n"; # //unshift// adds elements, //shift// deletes elements
hashes
- hashes or associative arrays
- can be thought of as unordered arrays, using keys instead of array subscripts
 
 
- Whole hashes are called %hashname (start with '%').
 
%where = ( Gary => "Dallas", Lucy => "Austin", Ian => "Houston", Samantha => "Seattle" );
- To the left of the ⇒ are the hash keys, to the right of ⇒ are the associated hash values.
- hash keys must be unique
 
 
hash element access
- Individual hash elements are accessed with $hashname{key}
 
my $who = "Ian";
my %where = (
   Gary => "Piscataway",
   Lucy => "Hackensack",
   Ian  => "Mahwah",
   Samantha => "Hoboken"
);
print "Gary lives in ", $where{Gary}, "\n";
print "$who lives in $where{$who}\n";
hash element adding/deleting
my %where = (
   Gary => "Piscataway",
   Lucy => "Hackensack",
   Ian  => "Mahwah",
   Samantha => "Hoboken"
);
$where{Eva} = "Howell";
   # We added a Eva => "Howell" key/value pair to the
   # %where hash
delete $where{Gary};
   # We deleted the Gary => "Piscataway" key/value pair
hash functions, iteration
- Can't use while, for or foreach to directly iterate through hashes
- Perl provides keys(), values() and each()
 
 
# use 'keys' function to iterate through hash keys
foreach $who ( keys %where ) {
   print "$who lives in $where{$who}\n";
}
# use 'values' function to iterate through hash values
foreach $town ( values %where ) {
   print "someone lives in $town\n";
}
# use 'each' function to iterate through hash key/value
# pairs
my ($name, $town);
   # an assignable list of variables
while ( ($name, $town) = each %where ) {
   print "$name lives in $town\n";
}
hash functions, key existence
- Use exists() hash function to check if a key exists in a hash
 
print "Gary exists in the hash!\n" if exists $where{Gary};
subroutines
- subroutines are functions
 - define subroutine:
 
sub example_subroutine {
  ...
  # subroutine body
  ...
}
- define and call subroutine
 
greet();
sub greet {
   print "Hello, World!\n";
}
subroutines, arguments
- arguments are passed through another special Perl var, @_
- notice that @_ is a special array
 
 
greet( "Jim", "Bob", "Russ" );
   # There isn't a set number of function arguments or a
   # function "prototype" to speak of
sub greet {
   foreach my $arg ( @_ ) {
	  print "Hello $arg!\n";
   }
   print "You're first, $_[ 0 ].\n";
   print "You're second, $_[ 1 ].\n";
   print "You're last, $_[ 2 ].\n";
}
subroutines, returns
- Use return
- Can return a list of values.
 
 
my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" );
print "($len1, $len2, $len3)\n";
sub greet {
   foreach my $arg ( @_ ) {
	  print "Hello $arg!\n";
   }
   return (length($_[0]), length($_[1]), length($_[2]));
	  # return a list of 3 ints
}
Perl and Regular Expressions
- For anything beyond trivial text processing with Perl and other languages, you need a basic understanding of regular expressions (regex).
 - See CS 370 intro to regular expressions for a start.
- Also, see links:
- ⇒⇒⇒⇒⇒⇒ https://regexone.com ⇐⇐⇐⇐⇐⇐
 - ⇒⇒⇒⇒⇒⇒ https://perldoc.perl.org/perlrequick ⇐⇐⇐⇐⇐⇐
 - Interactive regex testers:
 
 
 
Perl and Regular Expressions (match, search/replace)
- Regular expression matching with the =~ operator
 
$var =~ m/regular expression/ # boolean (true if match, false if no match) # can omit the "m": $var =~ /regular expression/ $var !~ m/regular expression/ # true if not a match, false if a match
- Substitution using regular expressions
 
$var =~ s/search re/replace re/ my $name = "Joseph"; $name =~ s/[sph]//; # delete first instance of s, p or h print "$name\n"; $name =~ s/[sph]//g; # g -> delete every instance of s, p or h print "$name\n";
Perl and Regular Expressions (split)
- split() function splits strings using a regular expression as delimiter
- default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing
 
 
split /\s+/, $_; # split the default variable, using one or more # whitespaces
- The split regexp delimiter is usually something simple
 
my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash"; my @fields = split /:/, $passwd;
Perl and Regular Expressions (join)
- join() function joins elements of a list using a specified delimiter string
 
my $last = "Jones"; my $first = "Bob"; my $name = join ", ", ($last, $first);
Multiple Input Files with <>
- We have used the <STDIN> statement and < filename redirection to process files such as cs498roster
 
- Suppose we want to be able to process files with a command like
 
$ perl -w perlex5.pl cs498roster cs598roster
- One way to accomplish this is with the <> (so-called diamond)
 
while (<>) {
	print "text read: $_";
}
- <> checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names)
- If so, it reads the file(s) one at a time, one line at a time.
 
 
Perl Command Line Arguments w/ @ARGV
- The special Perl array @ARGV contains the command line arguments that a Perl script was invoked with.
- <> gets file names from @ARGV by default
 
 
foreach my $arg ( @ARGV ) {
	print $arg;
}
or
for ( my $i = 0; $i <= $#ARGV; $i++ ) {
	print $ARGV[ $i ];
}
or
print "$_" foreach @ARGV; // Perl-speak
Perl Command Line Arguments w/ shift @ARGV
- Command line arguments aren't always file names.
- Here, we're trying to get a help message from
perlex6.plby calling it with the--helpcommand line argument--helpbecomes the first element of@ARGV,$ARGV[ 0 ]
 
 
$ perl -w perlex6.pl --help
- Sometimes, command line arguments modify how a program will work.
- Here, we're trying to make
perlex6.plsort rosters by last name: 
 
$ perl -w perlex6.pl --last cs498roster cs598roster
- To remove an argument from
@ARGV,shiftit. 
my $arg0 = shift @ARGV;
if ( $arg0 =~ /last/ ) ...
   # Process all non-file command line args before we
   # get to //while (<>)//
                while (<>) { ...
Perl References
- Perl references hold the locations of other pieces of data
- The backslash “\” is used to create references.
 - Here, $hash_r becomes a reference to the memory location of %hash:
 
 
my %hash = ( apple => "crab", pear => "asian" ); my $hash_r = \%hash;
- Dereferencing
- Use {} around reference name.
 
 
my %hash2 = %{$hash_r};
	# Set new %hash2 contents from the hash that is
	# referenced by $hash_r
Perl Objects
- “…an object can be anything -- it really depends on what your application is. … If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object.”
 
- In Perl, what we see as an object is simply a reference…
- In fact, you can convert any ordinary reference into an object simply by using the (Perl) bless() function.
 - Typically, however, objects are represented as references to a hash
 
 
Perl Objects and Modules Example
- Perl Classes are often called Modules
- Here, we'll use the Net::FTP module (example only; rockhopper does not run an anonymous ftp server):
 
 
use strict;
use Net::FTP;
   # This requires that FTP.pm be stored somewhere on the
   # local system that Perl searches through for modules.
my $ftp = Net::FTP->new("rockhopper.monmouth.edu")
   or die "Couldn't connect: $@\n";
	  # new() is a method of class Net::FTP; it's the
	  # constructor. $ftp is our FTP session object.
$ftp->login("anonymous");
   # New::FTP->login() method
$ftp->cwd("/");
$ftp->get("index.php");
$ftp->close();
Perl Modules and @INC
- @INC contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm.
 
$ perl -V
...
@INC:
	/etc/perl
	/usr/local/lib/perl/5.8.8
	/usr/local/share/perl/5.8.8
	/usr/lib/perl5
	/usr/share/perl5
	/usr/lib/perl/5.8
	/usr/share/perl/5.8
	/usr/local/lib/site_perl
                .
Perl Modules and CPAN
- A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks
 - The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules.
- Searchable at http://search.cpan.org
 
 - The CPAN module naming hierarchy places modules in categories such as
Sort::Fields,Sort::Versionsor subcategories such asLWP::Protocol::http- On disk, the modules would look like
…/Sort/Fields.pm,…/Sort/Versions.pm - For the
LWP::Protocol::httpmodule, the full path to the module might be/usr/share/perl5/LWP/Protocol/http.pmon a Linux system. 
 - How to know if a Perl module is installed?
- You can run a Perl “one-liner” program using the “-e” option to check if a module is installed, e.g.,
 
 
# If the following runs with no errors, it means the LWP::Simple module is installed. perl -e "use LWP::Simple"
Perl Web Automation
- Perl modules are available for Web automation
- Well-known modules are the LWP (“Library for WWW in Perl”) group of modules and WWW::Mechanize.
- WWW::Mechanize
 - Old article that describes Perl web automation: http://rockhopper.monmouth.edu/~jchung/docs/4915-1108-turoff.pdf
 
 
 
- Retry Perl Exercise 7 using a Web automation module such as LWP::Simple or WWW::Mechanize.
 
Perl and System Commands
- For portability, use Perl equivalents to system commands, such as Perl's chdir(), function instead of cd.
- NOTE: This is applicable to all programming languages that can call system commands.
 
 
- But if using system commands is a necessity, use system()
 
system( '/usr/games/fortune' );
- Command substitution is done using backticks ``
- Sometimes we want to capture the output of system commands to use in a Perl script.
 
 
my $sysdate = `date`;
cs498gpl/introduction_to_perl.txt · Last modified:  by jchung
                
                