cs498gpl:introduction_to_perl
Table of Contents
Introduction to Perl
The "Practical Extraction and Report Language"
The "Pathologically Eclectic Rubbish Lister"
Perl
- Creator: Larry Wall
- Introduced: 1987
- Open source
- Comes standard on many UNIX/Linux systems
- MacOS users are encouraged to use the PerlBrew distribution of Perl.
- Windows installer obtainable from http://strawberryperl.com and http://activestate.com
- Strawberry Perl is recommended.
- Originally developed for text record manipulation
- Used for system administration, web development, network programming, system exploit testing
Perl Uses
- CGI (web application) programming
- Written in Perl (or was written in Perl originally): Slash, Bugzilla, TWiki, Movable Type
- Wikipedia originally used a Wiki engine (CGI program) written in Perl called UseModWiki.
- Perl CGI-type programs typically communicate with database backends
- bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point.
- Large-scale text processing for report generation
- Has found past use in finance and bioinformatics fields for its ability to handle large data sets
Perl Online Resources
- Perl.org online library
- Picking Up Perl (online book)
Perl Execution
- Programs typically given .pl extension
- Executed with perl -w prog.pl
- Or use shell script type line at top of Perl script
- #!/path/to/perl -w
- and make the program executable with chmod +x prog.pl
- Using the “-w” is encouraged for debugging.
- -w issues warnings that would otherwise not be issued.
Perl Syntax (highlights)
- A Perl motto: “There is more than one way to do it.”
- Perl designed with this idea in mind
- print “Hello, world!\n”; # semicolon is mandatory like C/C++/Java/others
- Comments begin with #, as in shell scripts, Tcl and older languages.
- print “ 0.25” * 4, “\n”; # output: 1
- “ 0.25” is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break (“\n”)
- automatic conversion of scalars for “contextual polymorphism”
- leading space is ignored
- Math operators are the familiar set from C/C++/Java
- String concatenation (.) and repetition (x) operators
- print “Ba” . “na” x 4, “\n”; # output: Banananana
More Perl Syntax
variables
- Use $ all the time in front of variable names.
$name = "fred"; # variables are dynamically typed print "My name is ", $name, "\n";
variable scope
- Variables are global if not declared local with
my $variable
.- local Perl variables also called lexical
$name = "fred"; # This is global $name. # Insert a block { my $name = "joe"; # This is local $name because of 'my'. print "Block local \$name is $name\n"; # Using double quotes allows $name interpolation as # part of the print statement. The backslashed $ # (\$name) suppresses variable interpolation. } print "Global \$name is ", $name, "\n";
- Some advocate using the
my
keyword for all variables, including global vars.
standard input
- Perl standard input with <STDIN>
print "Please enter something interesting: \n"; $comment = <STDIN>; print "You entered: $comment\n";
- Standard input can be from the keyboard or redirected from a file or from another program.
- Example of redirecting stdin from a file:
$ perl -w stdin_comment.pl < comment.txt
- Examples of redirecting stdin from a program:
$ echo "The ripest fruit falls first." | perl -w stdin_comment.pl # If you have the fortune command installed ... $ fortune | perl -w stdin_comment.pl
<STDIN>
reads up to and including the newline character
- The newline could be the user hitting <Enter> after inputting something or the newline at the end of a line in a text file.
chomp() and chop()
- The Perl <STDIN> behavior of including newlines often needs to be accounted for.
print "Enter a five letter word guess, preferably \"Yoink\": "; $userguess = <STDIN>; chomp($userguess); # Removes trailing newline character only. # Test this without the chomp statement. $secretword = "Yoink"; print "The result of the comparison: ", $userguess eq $secretword, "\n"; # String comparison with 'eq' operator; # returns empty string if strings not equal; # returns 1 if strings equal.
- chop() function removes last character of string, whether it's a newline or not.
string functions
- length(): returns length of a string
print "Enter a string: "; my $inpString = <STDIN>; chomp($inpString); # Must do this, else length will return length + 1. print "$inpString is ", length($inpString), " chars long.\n";
- lc(): converts all characters in a string to lowercase.
my $string = "Hello, World!"; my $lowercase = lc($string); print "$lowercase\n";
- uc(): converts all characters in a string to uppercase.
my $string = "Hello, World!"; my $uppercase = uc($string); print "$uppercase\n";
- index(): returns the position of the first occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = index($string, $substring); print "$position\n"; # Output: 6
- rindex(): returns the position of the last occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = rindex($string, $substring); print "$position\n"; # Output: 17
- substr(): extracts a substring from a string.
my $string = "Hello, World!"; my $substring = substr($string, 7, 5); # Starting at position 7, extract 5 characters print "$substring\n"; # Output: World
The die() controlled exit function
- Commonly seen is the Perl function, die().
print "Enter a string to pass to die(): "; chomp($string = <STDIN>); die($string); # Outputs to STDERR, not STDOUT print "This will not be printed.";
- die() outputs the Perl program name and the line number of the die() statement if the string argument does not end in newline, thus, the chomp() statement above.
selection statements, familiar
- Familar (C/C++/Java syntax)
if ( $number != 0 ) { $result = 100 / $number; } if ( $password eq $guess ) { print "Pass, friend.\n"; } else { die "Go away, imposter!"; }
selection statements, elsif
if ( $password eq $guess ) { print "Pass, friend.\n"; } elsif ( $password eq "Meh" } { print "Meh!\n"; } else { die "Go away, imposter!"; }
selection statements, unless
# Assuming $a is boolean if ( not $a ) { print "\$a is not true\n"; } # can also be expressed... unless ( $a ) { print "\$a is not true\n"; }
- Choose the syntax that best fits your own thought patterns
- “There's more than one way to do it.”
reverse selection statements
- Normal
if ($number == 0) { die "Can't divide by 0"; }
- Equivalent
die "Can't divide by 0" if $number == 0;
repetition structures
- while, until, for, foreach, do..while, do..until
- while, for, do..while are C/C++/Java standard
until ( $countdown <= 0 ) { print "Counting down: $countdown\n"; $countdown--; } # "for each number in the list 1 through 10" foreach $number ( 1 .. 10 ) { print "The number is: $number\n"; }
while (<STDIN>)
- while ( $var = <STDIN> ) reads from standard input until end of file (control-d)
- Reads a line at a time into $var
while ( $var = <STDIN> ) { print $var; # Print each line of standard input. }
- Can shorten to the following because it's such a common operation
- $_ below is the default Perl variable
while ( <STDIN> ) { print $_; }
lists
- Create Perl lists with ()
print( "Hello, ", "world", "\n" ); # 3 strings in a list being passed to print function print( 123, 456, 789 ); foreach $number ( 1 .. 10 ) { # ( 1 .. 10 ) creates the list of numbers from 1 to 10
- Create Perl lists with qw (quote words)
qw/hello world good bye/ # creates a 4 word list
accessing list values
- Use square brackets
print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] ); # output: mustard (count from zero) my $month = 3; print qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )[ $month ] # output: Apr
accessing list "slices"
- Get more than one list value at a time
my $mone; my $mtwo; ( $mone, $mtwo ) = ( 1, 3 ); my $m1; my $m2; my $m3; ( $m1, $m2, $m3 ) = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )[ 2..4 ]; print $m1." ".$m2." ".$m3;
arrays
- Arrays are just named lists.
- Whole arrays are called @arrayName (start with '@')
- Individual array elements (scalars) are accessed as $arrayName[ subscript ]
my @days = qw(Mon Tue Wed Thu Fri Sat Sun); print @days, "\n"; # Output: MonTueWedThuFriSatSun print "@days\n"; # Output: Mon Tue Wed Thu Fri Sat Sun # By enclosing in "", have "stringified" @days print $days[ 6 ], "\n"; # @days[ 6 ] can also be used, but see the warning at https://stackoverflow.com/a/53732305
array size in $#
- $# is a special Perl variable
- contains the last subscript (or index) of the array
- Looping through an array
my $i = 0; while ( $i <= $#arrayName ) { # Do something with $arrayName[ $i ] here $i++; }
- or
for ( my $i = 0; $i <= $#arrayName; $i++ ) { # Do something with $arrayName[ $i ] here }
arrays with foreach
foreach my $i ( @arrayName ) { # Does not use $#. # Do something with $i here. }
accessing array slices
my @days = qw(Mon Tue Wed Thu Fri Sat Sun); my @longweekend = @days[ 4..6 ]; # Note use of @ instead of $ before days. print "@longweekend\n"; # Output: Fri Sat Sun
array functions
- reverse()
my @count = ( 1..5 ); foreach $each ( reverse( @count ) ) { print "$each...\n"; sleep 1; }
- sort()
my @unsorted = qw( Cohen Clapton Costello Rush ZZTop ); my @sorted = sort @unsorted;
- end or top functions: push(), pop()
my $hand; my @pile = ( "letter", "newspaper", "bill", "notepad" ); print "You pick up something off the top of the pile.\n"; $hand = pop @pile; # "notepad" is removed from end (top) of @pile print "You now have a $hand in your hand,\n \ and the pile contains:\n@pile"; print "You now put something on your pile.\n"; push @pile, "statement"; # "statement" is added to the end (top) of @pile print "Now the pile contains:\n@pile\n";
- beginning or bottom functions: shift(), unshift()
my @array = (); # nothing in array unshift @array, "first"; print "Array is now: @array\n"; unshift @array, "second", "third"; print "Array is now: @array\n"; shift @array; print "Array is now: @array\n"; # //unshift// adds elements, //shift// deletes elements
hashes
- hashes or associative arrays
- can be thought of as unordered arrays, using keys instead of array subscripts
- Whole hashes are called %hashname (start with '%').
%where = ( Gary => "Dallas", Lucy => "Austin", Ian => "Houston", Samantha => "Seattle" );
- To the left of the ⇒ are the hash keys, to the right of ⇒ are the associated hash values.
- hash keys must be unique
hash element access
- Individual hash elements are accessed with $hashname{key}
my $who = "Ian"; my %where = ( Gary => "Piscataway", Lucy => "Hackensack", Ian => "Mahwah", Samantha => "Hoboken" ); print "Gary lives in ", $where{Gary}, "\n"; print "$who lives in $where{$who}\n";
hash element adding/deleting
my %where = ( Gary => "Piscataway", Lucy => "Hackensack", Ian => "Mahwah", Samantha => "Hoboken" ); $where{Eva} = "Howell"; # We added a Eva => "Howell" key/value pair to the # %where hash delete $where{Gary}; # We deleted the Gary => "Piscataway" key/value pair
hash functions, iteration
- Can't use while, for or foreach to directly iterate through hashes
- Perl provides keys(), values() and each()
# use 'keys' function to iterate through hash keys foreach $who ( keys %where ) { print "$who lives in $where{$who}\n"; } # use 'values' function to iterate through hash values foreach $town ( values %where ) { print "someone lives in $town\n"; } # use 'each' function to iterate through hash key/value # pairs my ($name, $town); # an assignable list of variables while ( ($name, $town) = each %where ) { print "$name lives in $town\n"; }
hash functions, key existence
- Use exists() hash function to check if a key exists in a hash
print "Gary exists in the hash!\n" if exists $where{Gary};
subroutines
- subroutines are functions
- define subroutine:
sub example_subroutine { ... # subroutine body ... }
- define and call subroutine
greet(); sub greet { print "Hello, World!\n"; }
subroutines, arguments
- arguments are passed through another special Perl var, @_
- notice that @_ is a special array
greet( "Jim", "Bob", "Russ" ); # There isn't a set number of function arguments or a # function "prototype" to speak of sub greet { foreach my $arg ( @_ ) { print "Hello $arg!\n"; } print "You're first, $_[ 0 ].\n"; print "You're second, $_[ 1 ].\n"; print "You're last, $_[ 2 ].\n"; }
subroutines, returns
- Use return
- Can return a list of values.
my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" ); print "($len1, $len2, $len3)\n"; sub greet { foreach my $arg ( @_ ) { print "Hello $arg!\n"; } return (length($_[0]), length($_[1]), length($_[2])); # return a list of 3 ints }
Perl and Regular Expressions
- For anything beyond trivial text processing with Perl and other languages, you need a basic understanding of regular expressions (regex).
- See CS 370 intro to regular expressions for a start.
- Also, see links:
- ⇒⇒⇒⇒⇒⇒ https://regexone.com ⇐⇐⇐⇐⇐⇐
- ⇒⇒⇒⇒⇒⇒ https://perldoc.perl.org/perlrequick ⇐⇐⇐⇐⇐⇐
- Interactive regex testers:
Perl and Regular Expressions (match, search/replace)
- Regular expression matching with the =~ operator
$var =~ m/regular expression/ # boolean (true if match, false if no match) # can omit the "m": $var =~ /regular expression/ $var !~ m/regular expression/ # true if not a match, false if a match
- Substitution using regular expressions
$var =~ s/search re/replace re/ my $name = "Joseph"; $name =~ s/[sph]//; # delete first instance of s, p or h print "$name\n"; $name =~ s/[sph]//g; # g -> delete every instance of s, p or h print "$name\n";
Perl and Regular Expressions (split)
- split() function splits strings using a regular expression as delimiter
- default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing
split /\s+/, $_; # split the default variable, using one or more # whitespaces
- The split regexp delimiter is usually something simple
my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash"; my @fields = split /:/, $passwd;
Perl and Regular Expressions (join)
- join() function joins elements of a list using a specified delimiter string
my $last = "Jones"; my $first = "Bob"; my $name = join ", ", ($last, $first);
Multiple Input Files with <>
- We have used the <STDIN> statement and < filename redirection to process files such as cs498roster
- Suppose we want to be able to process files with a command like
$ perl -w perlex5.pl cs498roster cs598roster
- One way to accomplish this is with the <> (so-called diamond)
while (<>) { print "text read: $_"; }
- <> checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names)
- If so, it reads the file(s) one at a time, one line at a time.
Perl Command Line Arguments w/ @ARGV
- The special Perl array @ARGV contains the command line arguments that a Perl script was invoked with.
- <> gets file names from @ARGV by default
foreach my $arg ( @ARGV ) { print $arg; } or for ( my $i = 0; $i <= $#ARGV; $i++ ) { print $ARGV[ $i ]; } or print "$_" foreach @ARGV; // Perl-speak
Perl Command Line Arguments w/ shift @ARGV
- Command line arguments aren't always file names.
- Here, we're trying to get a help message from
perlex6.pl
by calling it with the--help
command line argument--help
becomes the first element of@ARGV
,$ARGV[ 0 ]
$ perl -w perlex6.pl --help
- Sometimes, command line arguments modify how a program will work.
- Here, we're trying to make
perlex6.pl
sort rosters by last name:
$ perl -w perlex6.pl --last cs498roster cs598roster
- To remove an argument from
@ARGV
,shift
it.
my $arg0 = shift @ARGV; if ( $arg0 =~ /last/ ) ... # Process all non-file command line args before we # get to //while (<>)// while (<>) { ...
Perl References
- Perl references hold the locations of other pieces of data
- The backslash “\” is used to create references.
- Here, $hash_r becomes a reference to the memory location of %hash:
my %hash = ( apple => "crab", pear => "asian" ); my $hash_r = \%hash;
- Dereferencing
- Use {} around reference name.
my %hash2 = %{$hash_r}; # Set new %hash2 contents from the hash that is # referenced by $hash_r
Perl Objects
- “…an object can be anything -- it really depends on what your application is. … If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object.”
- In Perl, what we see as an object is simply a reference…
- In fact, you can convert any ordinary reference into an object simply by using the (Perl) bless() function.
- Typically, however, objects are represented as references to a hash
Perl Objects and Modules Example
- Perl Classes are often called Modules
- Here, we'll use the Net::FTP module (example only; rockhopper does not run an anonymous ftp server):
use strict; use Net::FTP; # This requires that FTP.pm be stored somewhere on the # local system that Perl searches through for modules. my $ftp = Net::FTP->new("rockhopper.monmouth.edu") or die "Couldn't connect: $@\n"; # new() is a method of class Net::FTP; it's the # constructor. $ftp is our FTP session object. $ftp->login("anonymous"); # New::FTP->login() method $ftp->cwd("/"); $ftp->get("index.php"); $ftp->close();
Perl Modules and @INC
- @INC contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm.
$ perl -V ... @INC: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .
Perl Modules and CPAN
- A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks
- The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules.
- Searchable at http://search.cpan.org
- The CPAN module naming hierarchy places modules in categories such as
Sort::Fields
,Sort::Versions
or subcategories such asLWP::Protocol::http
- On disk, the modules would look like
…/Sort/Fields.pm
,…/Sort/Versions.pm
- For the
LWP::Protocol::http
module, the full path to the module might be/usr/share/perl5/LWP/Protocol/http.pm
on a Linux system.
- How to know if a Perl module is installed?
- You can run a Perl “one-liner” program using the “-e” option to check if a module is installed, e.g.,
# If the following runs with no errors, it means the LWP::Simple module is installed. perl -e "use LWP::Simple"
Perl Web Automation
- Perl modules are available for Web automation
- Well-known modules are the LWP (“Library for WWW in Perl”) group of modules and WWW::Mechanize.
- WWW::Mechanize
- Old article that describes Perl web automation: http://rockhopper.monmouth.edu/~jchung/docs/4915-1108-turoff.pdf
- Retry Perl Exercise 7 using a Web automation module such as LWP::Simple or WWW::Mechanize.
Perl and System Commands
- For portability, use Perl equivalents to system commands, such as Perl's chdir(), function instead of cd.
- NOTE: This is applicable to all programming languages that can call system commands.
- But if using system commands is a necessity, use system()
system( '/usr/games/fortune' );
- Command substitution is done using backticks ``
- Sometimes we want to capture the output of system commands to use in a Perl script.
my $sysdate = `date`;
cs498gpl/introduction_to_perl.txt · Last modified: 2025/02/14 07:07 by jchung