cs498gpl:introduction_to_perl
Table of Contents
Introduction to Perl
The "Practical Extraction and Report Language"
The "Pathologically Eclectic Rubbish Lister"
Perl
- Creator: Larry Wall
- Introduced: 1987
- Open source
- Comes standard on many UNIX/Linux systems
- MacOS users are encouraged to use the PerlBrew distribution of Perl.
- Windows installer obtainable from http://strawberryperl.com and http://activestate.com
- Strawberry Perl is recommended.
- Originally developed for text record manipulation
- Used for system administration, web development, network programming, system exploit testing
Perl Uses
- CGI (web application) programming
- Written in Perl (or was written in Perl originally): Slash, Bugzilla, TWiki, Movable Type
- Wikipedia originally used a Wiki engine (CGI program) written in Perl called UseModWiki.
- Perl CGI-type programs typically communicate with database backends
- bbc.co.uk, amazon.com, livejournal.com, ticketmaster.com, imdb.com have used Perl extensively, at one point.
- Large-scale text processing for report generation
- Has found past use in finance and bioinformatics fields for its ability to handle large data sets
Perl Online Resources
- Perl.org online library
- Picking Up Perl (online book)
Perl Execution
- Programs typically given .pl extension
- Executed with perl -w prog.pl
- Or use shell script type line at top of Perl script
- #!/path/to/perl -w
- and make the program executable with chmod +x prog.pl
- Using the “-w” is encouraged for debugging.
- -w issues warnings that would otherwise not be issued.
Perl Syntax (highlights)
- A Perl motto: “There is more than one way to do it.”
- Perl designed with this idea in mind
- print “Hello, world!\n”; # semicolon is mandatory like C/C++/Java/others
- Comments begin with #, as in shell scripts, Tcl and older languages.
- print “ 0.25” * 4, “\n”; # output: 1
- “ 0.25” is a string, but it is converted to a float and multiplied by 4 and concatenated with a line break (“\n”)
- automatic conversion of scalars for “contextual polymorphism”
- leading space is ignored
- Math operators are the familiar set from C/C++/Java
- String concatenation (.) and repetition (x) operators
- print “Ba” . “na” x 4, “\n”; # output: Banananana
More Perl Syntax
variables
- Use $ all the time in front of variable names.
$name = "fred"; # variables are dynamically typed print "My name is ", $name, "\n";
variable scope
- Variables are global if not declared local with
my $variable.- local Perl variables also called lexical
$name = "fred"; # This is global $name.
# Insert a block
{
my $name = "joe";
# This is local $name because of 'my'.
print "Block local \$name is $name\n";
# Using double quotes allows $name interpolation as
# part of the print statement. The backslashed $
# (\$name) suppresses variable interpolation.
}
print "Global \$name is ", $name, "\n";
- Some advocate using the
mykeyword for all variables, including global vars.
standard input
- Perl standard input with <STDIN>
print "Please enter something interesting: \n";
$comment = <STDIN>;
print "You entered: $comment\n";
- Standard input can be from the keyboard or redirected from a file or from another program.
- Example of redirecting stdin from a file:
$ perl -w stdin_comment.pl < comment.txt
- Examples of redirecting stdin from a program:
$ echo "The ripest fruit falls first." | perl -w stdin_comment.pl # If you have the fortune command installed ... $ fortune | perl -w stdin_comment.pl
<STDIN>reads up to and including the newline character
- The newline could be the user hitting <Enter> after inputting something or the newline at the end of a line in a text file.
chomp() and chop()
- The Perl <STDIN> behavior of including newlines often needs to be accounted for.
print "Enter a five letter word guess, preferably \"Yoink\": ";
$userguess = <STDIN>;
chomp($userguess);
# Removes trailing newline character only.
# Test this without the chomp statement.
$secretword = "Yoink";
print "The result of the comparison: ", $userguess eq $secretword, "\n";
# String comparison with 'eq' operator;
# returns empty string if strings not equal;
# returns 1 if strings equal.
- chop() function removes last character of string, whether it's a newline or not.
string functions
- length(): returns length of a string
print "Enter a string: "; my $inpString = <STDIN>; chomp($inpString); # Must do this, else length will return length + 1. print "$inpString is ", length($inpString), " chars long.\n";
- lc(): converts all characters in a string to lowercase.
my $string = "Hello, World!"; my $lowercase = lc($string); print "$lowercase\n";
- uc(): converts all characters in a string to uppercase.
my $string = "Hello, World!"; my $uppercase = uc($string); print "$uppercase\n";
- index(): returns the position of the first occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = index($string, $substring); print "$position\n"; # Output: 6
- rindex(): returns the position of the last occurrence of a substring within a string.
my $string = "Larry Wall Larry Wall"; my $substring = "Wall"; my $position = rindex($string, $substring); print "$position\n"; # Output: 17
- substr(): extracts a substring from a string.
my $string = "Hello, World!"; my $substring = substr($string, 7, 5); # Starting at position 7, extract 5 characters print "$substring\n"; # Output: World
The die() controlled exit function
- Commonly seen is the Perl function, die().
print "Enter a string to pass to die(): ";
chomp($string = <STDIN>);
die($string); # Outputs to STDERR, not STDOUT
print "This will not be printed.";
- die() outputs the Perl program name and the line number of the die() statement if the string argument does not end in newline, thus, the chomp() statement above.
selection statements, familiar
- Familar (C/C++/Java syntax)
if ( $number != 0 ) {
$result = 100 / $number;
}
if ( $password eq $guess ) {
print "Pass, friend.\n";
} else {
die "Go away, imposter!";
}
selection statements, elsif
if ( $password eq $guess ) {
print "Pass, friend.\n";
} elsif ( $password eq "Meh" } {
print "Meh!\n";
} else {
die "Go away, imposter!";
}
selection statements, unless
# Assuming $a is boolean
if ( not $a ) {
print "\$a is not true\n";
}
# can also be expressed...
unless ( $a ) {
print "\$a is not true\n";
}
- Choose the syntax that best fits your own thought patterns
- “There's more than one way to do it.”
reverse selection statements
- Normal
if ($number == 0) {
die "Can't divide by 0";
}
- Equivalent
die "Can't divide by 0" if $number == 0;
repetition structures
- while, until, for, foreach, do..while, do..until
- while, for, do..while are C/C++/Java standard
until ( $countdown <= 0 ) {
print "Counting down: $countdown\n";
$countdown--;
}
# "for each number in the list 1 through 10"
foreach $number ( 1 .. 10 ) {
print "The number is: $number\n";
}
while (<STDIN>)
- while ( $var = <STDIN> ) reads from standard input until end of file (control-d)
- Reads a line at a time into $var
while ( $var = <STDIN> ) {
print $var;
# Print each line of standard input.
}
- Can shorten to the following because it's such a common operation
- $_ below is the default Perl variable
while ( <STDIN> ) {
print $_;
}
lists
- Create Perl lists with ()
print( "Hello, ", "world", "\n" );
# 3 strings in a list being passed to print function
print( 123, 456, 789 );
foreach $number ( 1 .. 10 ) {
# ( 1 .. 10 ) creates the list of numbers from 1 to 10
- Create Perl lists with qw (quote words)
qw/hello world good bye/ # creates a 4 word list
accessing list values
- Use square brackets
print( ( 'salt', 'vinegar', 'mustard', 'pepper' )[ 2 ] ); # output: mustard (count from zero) my $month = 3; print qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec )[ $month ] # output: Apr
accessing list "slices"
- Get more than one list value at a time
my $mone;
my $mtwo;
( $mone, $mtwo ) = ( 1, 3 );
my $m1;
my $m2;
my $m3;
( $m1, $m2, $m3 ) = qw(
Jan Feb Mar
Apr May Jun
Jul Aug Sep
Oct Nov Dec
)[ 2..4 ];
print $m1." ".$m2." ".$m3;
arrays
- Arrays are just named lists.
- Whole arrays are called @arrayName (start with '@')
- Individual array elements (scalars) are accessed as $arrayName[ subscript ]
my @days = qw(Mon Tue Wed Thu Fri Sat Sun); print @days, "\n"; # Output: MonTueWedThuFriSatSun print "@days\n"; # Output: Mon Tue Wed Thu Fri Sat Sun # By enclosing in "", have "stringified" @days print $days[ 6 ], "\n"; # @days[ 6 ] can also be used, but see the warning at https://stackoverflow.com/a/53732305
array size in $#
- $# is a special Perl variable
- contains the last subscript (or index) of the array
- Looping through an array
my $i = 0;
while ( $i <= $#arrayName ) {
# Do something with $arrayName[ $i ] here
$i++;
}
- or
for ( my $i = 0; $i <= $#arrayName; $i++ ) {
# Do something with $arrayName[ $i ] here
}
arrays with foreach
foreach my $i ( @arrayName ) {
# Does not use $#.
# Do something with $i here.
}
accessing array slices
my @days = qw(Mon Tue Wed Thu Fri Sat Sun); my @longweekend = @days[ 4..6 ]; # Note use of @ instead of $ before days. print "@longweekend\n"; # Output: Fri Sat Sun
array functions
- reverse()
my @count = ( 1..5 );
foreach $each ( reverse( @count ) ) {
print "$each...\n";
sleep 1;
}
- sort()
my @unsorted = qw( Cohen Clapton Costello Rush ZZTop ); my @sorted = sort @unsorted;
- end or top functions: push(), pop()
my $hand; my @pile = ( "letter", "newspaper", "bill", "notepad" ); print "You pick up something off the top of the pile.\n"; $hand = pop @pile; # "notepad" is removed from end (top) of @pile print "You now have a $hand in your hand,\n \ and the pile contains:\n@pile"; print "You now put something on your pile.\n"; push @pile, "statement"; # "statement" is added to the end (top) of @pile print "Now the pile contains:\n@pile\n";
- beginning or bottom functions: shift(), unshift()
my @array = (); # nothing in array unshift @array, "first"; print "Array is now: @array\n"; unshift @array, "second", "third"; print "Array is now: @array\n"; shift @array; print "Array is now: @array\n"; # //unshift// adds elements, //shift// deletes elements
hashes
- hashes or associative arrays
- can be thought of as unordered arrays, using keys instead of array subscripts
- Whole hashes are called %hashname (start with '%').
%where = ( Gary => "Dallas", Lucy => "Austin", Ian => "Houston", Samantha => "Seattle" );
- To the left of the ⇒ are the hash keys, to the right of ⇒ are the associated hash values.
- hash keys must be unique
hash element access
- Individual hash elements are accessed with $hashname{key}
my $who = "Ian";
my %where = (
Gary => "Piscataway",
Lucy => "Hackensack",
Ian => "Mahwah",
Samantha => "Hoboken"
);
print "Gary lives in ", $where{Gary}, "\n";
print "$who lives in $where{$who}\n";
hash element adding/deleting
my %where = (
Gary => "Piscataway",
Lucy => "Hackensack",
Ian => "Mahwah",
Samantha => "Hoboken"
);
$where{Eva} = "Howell";
# We added a Eva => "Howell" key/value pair to the
# %where hash
delete $where{Gary};
# We deleted the Gary => "Piscataway" key/value pair
hash functions, iteration
- Can't use while, for or foreach to directly iterate through hashes
- Perl provides keys(), values() and each()
# use 'keys' function to iterate through hash keys
foreach $who ( keys %where ) {
print "$who lives in $where{$who}\n";
}
# use 'values' function to iterate through hash values
foreach $town ( values %where ) {
print "someone lives in $town\n";
}
# use 'each' function to iterate through hash key/value
# pairs
my ($name, $town);
# an assignable list of variables
while ( ($name, $town) = each %where ) {
print "$name lives in $town\n";
}
hash functions, key existence
- Use exists() hash function to check if a key exists in a hash
print "Gary exists in the hash!\n" if exists $where{Gary};
subroutines
- subroutines are functions
- define subroutine:
sub example_subroutine {
...
# subroutine body
...
}
- define and call subroutine
greet();
sub greet {
print "Hello, World!\n";
}
subroutines, arguments
- arguments are passed through another special Perl var, @_
- notice that @_ is a special array
greet( "Jim", "Bob", "Russ" );
# There isn't a set number of function arguments or a
# function "prototype" to speak of
sub greet {
foreach my $arg ( @_ ) {
print "Hello $arg!\n";
}
print "You're first, $_[ 0 ].\n";
print "You're second, $_[ 1 ].\n";
print "You're last, $_[ 2 ].\n";
}
subroutines, returns
- Use return
- Can return a list of values.
my ($len1, $len2, $len3) = greet( "Jim", "Bob", "Russ" );
print "($len1, $len2, $len3)\n";
sub greet {
foreach my $arg ( @_ ) {
print "Hello $arg!\n";
}
return (length($_[0]), length($_[1]), length($_[2]));
# return a list of 3 ints
}
Perl and Regular Expressions
- For anything beyond trivial text processing with Perl and other languages, you need a basic understanding of regular expressions (regex).
- See CS 370 intro to regular expressions for a start.
- Also, see links:
- ⇒⇒⇒⇒⇒⇒ https://regexone.com ⇐⇐⇐⇐⇐⇐
- ⇒⇒⇒⇒⇒⇒ https://perldoc.perl.org/perlrequick ⇐⇐⇐⇐⇐⇐
- Interactive regex testers:
Perl and Regular Expressions (match, search/replace)
- Regular expression matching with the =~ operator
$var =~ m/regular expression/ # boolean (true if match, false if no match) # can omit the "m": $var =~ /regular expression/ $var !~ m/regular expression/ # true if not a match, false if a match
- Substitution using regular expressions
$var =~ s/search re/replace re/ my $name = "Joseph"; $name =~ s/[sph]//; # delete first instance of s, p or h print "$name\n"; $name =~ s/[sph]//g; # g -> delete every instance of s, p or h print "$name\n";
Perl and Regular Expressions (split)
- split() function splits strings using a regular expression as delimiter
- default regular expression delimiter is /\s+/, so calling split by itself is equivalent to doing
split /\s+/, $_; # split the default variable, using one or more # whitespaces
- The split regexp delimiter is usually something simple
my $passwd = "jchung:x:1032:51:J. Chung, CS:/home/jchung:/bin/bash"; my @fields = split /:/, $passwd;
Perl and Regular Expressions (join)
- join() function joins elements of a list using a specified delimiter string
my $last = "Jones"; my $first = "Bob"; my $name = join ", ", ($last, $first);
Multiple Input Files with <>
- We have used the <STDIN> statement and < filename redirection to process files such as cs498roster
- Suppose we want to be able to process files with a command like
$ perl -w perlex5.pl cs498roster cs598roster
- One way to accomplish this is with the <> (so-called diamond)
while (<>) {
print "text read: $_";
}
- <> checks to see if the program (the Perl script) was invoked with command line arguments (file name or multiple file names)
- If so, it reads the file(s) one at a time, one line at a time.
Perl Command Line Arguments w/ @ARGV
- The special Perl array @ARGV contains the command line arguments that a Perl script was invoked with.
- <> gets file names from @ARGV by default
foreach my $arg ( @ARGV ) {
print $arg;
}
or
for ( my $i = 0; $i <= $#ARGV; $i++ ) {
print $ARGV[ $i ];
}
or
print "$_" foreach @ARGV; // Perl-speak
Perl Command Line Arguments w/ shift @ARGV
- Command line arguments aren't always file names.
- Here, we're trying to get a help message from
perlex6.plby calling it with the--helpcommand line argument--helpbecomes the first element of@ARGV,$ARGV[ 0 ]
$ perl -w perlex6.pl --help
- Sometimes, command line arguments modify how a program will work.
- Here, we're trying to make
perlex6.plsort rosters by last name:
$ perl -w perlex6.pl --last cs498roster cs598roster
- To remove an argument from
@ARGV,shiftit.
my $arg0 = shift @ARGV;
if ( $arg0 =~ /last/ ) ...
# Process all non-file command line args before we
# get to //while (<>)//
while (<>) { ...
Perl References
- Perl references hold the locations of other pieces of data
- The backslash “\” is used to create references.
- Here, $hash_r becomes a reference to the memory location of %hash:
my %hash = ( apple => "crab", pear => "asian" ); my $hash_r = \%hash;
- Dereferencing
- Use {} around reference name.
my %hash2 = %{$hash_r};
# Set new %hash2 contents from the hash that is
# referenced by $hash_r
Perl Objects
- “…an object can be anything -- it really depends on what your application is. … If you're communicating with a remote computer via FTP, you could make each connection to the remote server an object.”
- In Perl, what we see as an object is simply a reference…
- In fact, you can convert any ordinary reference into an object simply by using the (Perl) bless() function.
- Typically, however, objects are represented as references to a hash
Perl Objects and Modules Example
- Perl Classes are often called Modules
- Here, we'll use the Net::FTP module (example only; rockhopper does not run an anonymous ftp server):
use strict;
use Net::FTP;
# This requires that FTP.pm be stored somewhere on the
# local system that Perl searches through for modules.
my $ftp = Net::FTP->new("rockhopper.monmouth.edu")
or die "Couldn't connect: $@\n";
# new() is a method of class Net::FTP; it's the
# constructor. $ftp is our FTP session object.
$ftp->login("anonymous");
# New::FTP->login() method
$ftp->cwd("/");
$ftp->get("index.php");
$ftp->close();
Perl Modules and @INC
- @INC contains the system file system paths that the Perl interpreter looks through for modules such as FTP.pm.
$ perl -V
...
@INC:
/etc/perl
/usr/local/lib/perl/5.8.8
/usr/local/share/perl/5.8.8
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.8
/usr/share/perl/5.8
/usr/local/lib/site_perl
.
Perl Modules and CPAN
- A module is a collection of subroutines (methods) and variables (attributes) that all work together to perform some set of tasks
- The Comprehensive Perl Archive Network (http://www.cpan.org) was put together to organize and share the large collection of prewritten Perl modules.
- Searchable at http://search.cpan.org
- The CPAN module naming hierarchy places modules in categories such as
Sort::Fields,Sort::Versionsor subcategories such asLWP::Protocol::http- On disk, the modules would look like
…/Sort/Fields.pm,…/Sort/Versions.pm - For the
LWP::Protocol::httpmodule, the full path to the module might be/usr/share/perl5/LWP/Protocol/http.pmon a Linux system.
- How to know if a Perl module is installed?
- You can run a Perl “one-liner” program using the “-e” option to check if a module is installed, e.g.,
# If the following runs with no errors, it means the LWP::Simple module is installed. perl -e "use LWP::Simple"
Perl Web Automation
- Perl modules are available for Web automation
- Well-known modules are the LWP (“Library for WWW in Perl”) group of modules and WWW::Mechanize.
- WWW::Mechanize
- Old article that describes Perl web automation: http://rockhopper.monmouth.edu/~jchung/docs/4915-1108-turoff.pdf
- Retry Perl Exercise 7 using a Web automation module such as LWP::Simple or WWW::Mechanize.
Perl and System Commands
- For portability, use Perl equivalents to system commands, such as Perl's chdir(), function instead of cd.
- NOTE: This is applicable to all programming languages that can call system commands.
- But if using system commands is a necessity, use system()
system( '/usr/games/fortune' );
- Command substitution is done using backticks ``
- Sometimes we want to capture the output of system commands to use in a Perl script.
my $sysdate = `date`;
cs498gpl/introduction_to_perl.txt · Last modified: by jchung
