< file
or are read through standard input.(Do in class) Run the following command to begin setting up a subdirectory structure for the text processing utility examples:
for topic in grep sed sort uniq awk tr; do echo mkdir -p ~/cs370/examples/text/$topic; done # | bash
Remove the end comment #
to pipe (|
) the mkdir commands to bash
.
Note: The above directories could have been created without a for
loop, using shell brace expansion:
mkdir -p ~/cs370/examples/text/{grep,sed,sort,uniq,awk,tr}
grep [options] PATTERN [FILE...] grep [options] [-e PATTERN | -f FILE] [FILE...]
$ cat grepfile # see grepfile contents Well you know it's your bedtime, So turn off the light, Say all your prayers and then, Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, And you will be swimming there too. $ grep the grepfile # look for pattern "the" in grepfile So turn off the light, Say all your prayers and then, Beautiful mermaids will swim through the sea, And you will be swimming there too. $ cat grepfile | grep the # pipe grepfile to grep So turn off the light, Say all your prayers and then, Beautiful mermaids will swim through the sea, And you will be swimming there too. # look for whole word "the" in grepfile and number lines found $ grep -wn the grepfile 2:So turn off the light, 5:Beautiful mermaids will swim through the sea, # look for lines without "the", number lines $ grep -wnv the grepfile 1:Well you know it's your bedtime, 3:Say all your prayers and then, 4:Oh you sleepy young heads dream of wonderful things, 6:And you will be swimming there too.
# Read search patterns from a file, and search for the patterns in a file. # See the grep "-f" option. # Pattern file contents cat ids s1306205 s1321300 # grepfile contents cat list_of_ids s1064730 s1185725 s1294895 s1306205 s1321300 s1333911 s1359142 $ grep -f ids list_of_ids s1306205 s1321300
$ grep .nd grepfile Say all your prayers and then, Oh you sleepy young heads dream of wonderful things, And you will be swimming there too. $ grep ^.nd grepfile And you will be swimming there too. $ grep sw.*ng grepfile And you will be swimming there too. $ grep [A-D] grepfile Beautiful mermaids will swim through the sea, And you will be swimming there too. $ grep "\." grepfile And you will be swimming there too. $ grep a. grepfile Say all your prayers and then, Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, $ grep a.$ grepfile Beautiful mermaids will swim through the sea, $ grep [a-m]nd grepfile Say all your prayers and then, $ grep [^a-m]nd grepfile Oh you sleepy young heads dream of wonderful things, And you will be swimming there too. $ egrep s.+w grepfile Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, $ egrep "off|will" grepfile So turn off the light, Beautiful mermaids will swim through the sea, And you will be swimming there too. $ egrep im*ing grepfile And you will be swimming there too. $ egrep im?ing grepfile ? Why no matches ?
# -C 1 option below means grep will show 1 line above and # and 1 line below the matching lines: $ grep -C 1 sleepy grepfile Say all your prayers and then, Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, # -A 2 option means show up to 2 lines AFTER the matching lines: $ grep -A 2 sleepy grepfile Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, And you will be swimming there too. # -B 2 option means show up to 2 lines BEFORE the matching lines: $ grep -B 2 sleepy grepfile So turn off the light, Say all your prayers and then, Oh you sleepy young heads dream of wonderful things,
sed [ -e command ] [ -f scriptfile ] { fileName }
# The sed input file: $ cat fiction The lone monarch butterfly flew flutteringly through the cemetery, dancing on and glancing against headstone after headstone before alighting atop Willie Mitchell's already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners, until a big shovelful of dirt landed on it and it died. $ sed 's/^/ /' fiction > fiction.indented # contents of 'fiction' indented by one space: $ cat fiction.indented The lone monarch butterfly flew flutteringly through the cemetery, dancing on and glancing against headstone after headstone before alighting atop Willie Mitchell's already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners, until a big shovelful of dirt landed on it and it died. $ sed 's/^ *//' fiction.indented # removes leading spaces # To insert the indentations directly into 'fiction' means # doing an "in-place" edit of 'fiction', using sed's '-i' option: $ sed -i 's/^/ /' fiction
$ sed '/a/d' fiction # remove all lines containing char 'a'. from the open mouths of five or six lingering mourners, $ sed '/\<a\>/d' fiction # remove lines containing the word 'a'. The lone monarch butterfly flew flutteringly through the cemetery, dancing on and glancing against headstone after headstone before alighting atop Willie Mitchell's already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners,
# Sed accepts sed scripts with the '-f' option; # sed5 is a sed script containing sed commands; # It will insert 2 lines at line 1: $ cat sed5 1i\ Copyright 2002 Joe Chung\ All rights reserved\ $ sed -f sed5 fiction Copyright 2002 Joe Chung All rights reserved The lone monarch butterfly flew flutteringly through the cemetery, dancing on and glancing against headstone after headstone before alighting atop Willie Mitchell's already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners, until a big shovelful of dirt landed on it and it died.
Append text after a line that contains pattern with sed '/pattern/a line of text here' filename Insert text before a line that contains pattern with sed '/pattern/i line of text here' filename Examples of appending and inserting a line of text: $ cat test foo bar option baz $ sed '/option/a append text here' test foo bar option append text here baz $ sed '/option/i insert text here' test foo bar insert text here option baz
# Another sed script, containing a sed change text directive: $ cat sed6 1,3c\ Lines 1-3 are censored.\ $ sed -f sed6 fiction Lines 1-3 are censored. already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners, until a big shovelful of dirt landed on it and it died. # Another sed script, containing a sed change text directive: $ cat sed7 1c\ Line 1 is censored. 2c\ Line 2 is obfuscated. 3c\ Line 3 is kaput. $ sed -f sed7 fiction Line 1 is censored. Line 2 is obfuscated. Line 3 is kaput. already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners, until a big shovelful of dirt landed on it and it died.
# We want to insert a file called 'fin' using sed: $ cat fin The End # Direct sed to insert 'fin' at end of 'fiction' $ sed '$r fin' fiction The lone monarch butterfly flew flutteringly through the cemetery, dancing on and glancing against headstone after headstone before alighting atop Willie Mitchell's already lowered casket, causing gasps of awe to fly from the open mouths of five or six lingering mourners, until a big shovelful of dirt landed on it and it died. The End
# Use sed's '-e' option to perform multiple sed operations # per line: $ sed -e 's/^/<< /' -e 's/$/ >>/' fiction << The lone monarch butterfly flew flutteringly through >> << the cemetery, dancing on and glancing against headstone >> << after headstone before alighting atop Willie Mitchell's >> << already lowered casket, causing gasps of awe to fly >> << from the open mouths of five or six lingering mourners, >> << until a big shovelful of dirt landed on it and it died. >>
sort [OPTION]... [FILE]...
# Sort input file: $ cat sortfile jan Start chapter 3 10th Jan Start chapter 1 30th Jan Start chapter 5 23rd Jan End chapter 3 23rd Mar Start chapter 7 27 may End chapter 7 17th Apr End Chapter 5 1 Feb End chapter 5 14 $ sort sortfile Apr End Chapter 5 1 Feb End chapter 5 14 Jan End chapter 3 23rd Jan Start chapter 1 30th jan Start chapter 3 10th Jan Start chapter 5 23rd Mar Start chapter 7 27 may End chapter 7 17th # Force reverse or descending sort: $ sort -r sortfile may End chapter 7 17th Mar Start chapter 7 27 Jan Start chapter 5 23rd jan Start chapter 3 10th Jan Start chapter 1 30th Jan End chapter 3 23rd Feb End chapter 5 14 Apr End Chapter 5 1 # Sort starting in the 1st (+0) field, end at the 2nd (-1) field; # alternatively: sort --key=1,1 sortfile: $ sort +0 -1 sortfile Apr End Chapter 5 1 Feb End chapter 5 14 jan Start chapter 3 10th Jan End chapter 3 23rd Jan Start chapter 1 30th Jan Start chapter 5 23rd Mar Start chapter 7 27 may End chapter 7 17th # Sort by month name in 1st field $ sort +0 -1 -M sortfile Jan End chapter 3 23rd Jan Start chapter 1 30th jan Start chapter 3 10th Jan Start chapter 5 23rd Feb End chapter 5 14 Mar Start chapter 7 27 Apr End Chapter 5 1 may End chapter 7 17th # sort by the 5th (last) field numerically; # alternatively: sort --key=5 -n sortfile $ sort +4 -5 -n sortfile Apr End Chapter 5 1 jan Start chapter 3 10th Feb End chapter 5 14 may End chapter 7 17th Jan End chapter 3 23rd Jan Start chapter 5 23rd Mar Start chapter 7 27 Jan Start chapter 1 30th
uniq [OPTION]... [INPUT [OUTPUT]]
sort
# Input file for uniq: $ cat animals cat snake monkey snake dolphin elephant dolphin elephant goat elephant pig pig pig pig monkey pig # Default mode filters out non-unique lines: $ uniq animals cat snake monkey snake dolphin elephant goat elephant pig pig monkey pig # count instances of nonunique lines $ uniq -c animals 1 cat snake 1 monkey snake 2 dolphin elephant 1 goat elephant 2 pig pig 1 monkey pig # Ignore first field of each line when # looking for duplicates: $ uniq -1 animals cat snake dolphin elephant pig pig
$ awk -F "." '{ print "mkdir " $2 }'
$ awk -F "." -f makedirs where makedirs contains { print "mkdir " $2 }
Synopsis: awk [ condition ] [ { action } ] condition can be: - special token BEG[awk - pattern scanning and processing language] IN or END - expression using logical or relational operators and/or regular expression action is performed on every line of input that matches the condition and can be one or more C-like programming statements: - if (conditional) statement [ else statement ] - while (conditional) statement - for (expression; conditional; expression ) statement - break/continue - variable = expression - print [ list of expressions ] [ > expression ] - printf format [ , list of expressions ] [ > expression ] - next (skips the remaining patterns on the current line of input) - exit (skips the rest of the current line) - [ list of statements ]
# Say we have this input file: $ cat float Wish I was floating in blue across the sky, My imagination is strong, And I often visit the days When everything seemed so clear. Now I wonder what I'm doing here at all... $ awk '{print NF, $0}' float 9 Wish I was floating in blue across the sky, 4 My imagination is strong, 6 And I often visit the days 5 When everything seemed so clear. 9 Now I wonder what I'm doing here at all... # Awk fields are delimited using white space by default.
# Say that the file awk2 contains these awk statements: $ cat awk2 BEGIN { print "Start of file" } { print $1 $3 $NF } END { print "End of file" } $ awk -f awk2 float Start of file: Wishwassky, Myisstrong, Andoftendays Whenseemedclear. Nowwonderall... End of file # Equivalently, on the command line: $ awk 'BEGIN { print "Start of file" } { print $1 $3 $NF } END { print "End of file" }' float
$ awk 'NR > 1 && NR < 4 { print NR, $1, $3, $NF }' float 2 My is strong, 3 And often days
$ awk '/t.+e/ { print $0 }' float Wish I was floating in blue across the sky, And I often visit the days When everything seemed so clear. Now I wonder what I'm doing here at all...
$ awk '/strong/,/clear/ { print $0 }' float My imagination is strong, And I often visit the days When everything seemed so clear.
# See contents of /etc/passwd (delimited file using : as the delimiter) $ cat /etc/passwd # Extract fields of /etc/passwd using awk: $ awk -F ":" '{ print $1, $3, $NF }' /etc/passwd # 1st, 3rd and last fields
awk
to extract fields in delimited lines of text.tr -cds string1 string2
# Input file: $ cat go.cart go cart racing # Translating case: probably the most common use of tr $ tr a-z A-Z < go.cart GO CART RACING # Replace character ranges $ tr a-c D-E < go.cart go EDrt rDEing # Replace every non-"a" with "X" $ tr -c a X < go.cart XXXXaXXXXXaXXXXX # Replace non-"a-z" with (new line) # Could substitute '\n' for '\012' $ tr -c a-z '\012' < go.cart go cart racing # Just delete characters $ tr -d a-c < go.cart go rt ring
~/bin
directory.
Create a script nospace.sh
to look for filenames with spaces in them in the current directory and to rename those files, converting the spaces to _ (underscore).
To test nospace.sh
, in a separate nospace
directory, use touch
to create a bunch of files that have spaces in the file names:
mkdir nospace cd nospace touch "report one" "report two" "report three" "reports four and five"
Download the following file using wget:
http://rockhopper.monmouth.edu/~jchung/cs370/modem.out
Write a pipeline to extract only the PPP ip address “72.68.102.102” from this file. Incorporate wget in the pipeline.
Complete the pipeline using sed
, and later, awk
.
Solution using sed:
# wget: Quiet (-q) wget output while sending fetched modem.out to stdout (-O -) # grep: Match 1 line of modem.out containing "PPP" # sed: Delete all information before the IP address wget -q -O - http://rockhopper.monmouth.edu/~jchung/cs370/modem.out | grep PPP | sed 's/.*PPP *//' # or sed 's/.*PPP\s*//'
Solution using awk:
# wget: Quiet (-q) wget output while sending fetched modem.out to stdout (-O -) # grep: Match 1 line of modem.out containing "PPP" # awk: Extract IP address, which is the 5th field ($5) in the line, # IP Network Address PPP 72.68.102.102 wget -q -O - http://rockhopper.monmouth.edu/~jchung/cs370/modem.out | grep PPP | awk '{print $5}'
Write a script randlines.sh
to randomize the order of lines in standard input. Here's a start:
#!/bin/bash # # randlines.sh: Randomize lines in standard input # # Uses $RANDOM shell variable (found at the Advanced BASH # Shell Scripting Guide). while read myline # Read one line of stdin at a time. do echo $RANDOM $myline done
Using either the head
or tail
command, create a variant of randlines.sh
called randline.sh
that outputs just one line at random from standard input.
Note: We are just re-implementing the functionality of the shuf
command which randomizes lines of files and stdin.
Create a script called wordfreq.sh
to print the number of occurrences of all words in a file or standard input. Output must be sorted descending by number of occurrences.
Sample output if input is https://www.gutenberg.org/cache/epub/11231/pg11231.txt:
738 the 519 i 508 to 472 of 434 and 387 a 305 in 210 his 204 that 193 was 191 my 189 he 169 you 162 not 150 with 146 it 141 me 139 him 121 bartleby ...
We want wordfreq.sh
to be able to handle both STDIN and files given as arguments. So, it should be able to do something like
fortune | wordfreq.sh # process STDIN with wordfreq.sh
and also
wordfreq.sh input.txt # wordfreq.sh a input file
(and also)
wordfreq.sh input*.txt # wordfreq.sh multiple input files together
Study the cut text processing command. Apply cut
to a file containing this list of names:
Wehman, John Wehner, Monk Weid, Kahn Weigner, Ray Weimann, Joseph Weimmer, Nottingham Weinberg, John Weiner, Stephanie Weiner, Joseph Weinert, Molly Weingarten, Joyce Weinraub, John
Use cut
to extract the first letters of the first names, convert to lower case, and write the letters to a file called firstinit
.
m j r k j ... and so on
Use cut
again to extract the first 7 letters of the last names, convert to lower case, and write to a file called lastname
.
wehman wehner weid weigner weimann ... and so on
Study the paste text processing utility. Use paste
to paste firstinit
and lastname
together, eliminating any spaces.
mwehman jwehner rweid kweigner jweimann ... and so on
and redirect the result to a filed called userids
.
Write a script makeuserids.sh
to perform the above tasks on an input file.
Link to makeuserids.sh code | makeuserids-ps.sh (alternative version that uses process substitution)
Write a pipeline to turn the following input (saved in a file called 'servers'):
# comment blah bigblah { blah { host MA-FXDWF-14 { hardware ethernet 00:13:21:5C:11:16; fixed-address 192.168.19.29; } host MA-FXDWF-15 { hardware ethernet 00:13:21:5D:12:17; fixed-address 192.168.19.30; } host MA-FXDWF-16 { hardware ethernet 00:13:21:5E:13:18; fixed-address 192.168.19.31; } ... ... # repeats 4000 times ... ... } blah }
into this (for import into a spreadsheet):
MA-FXDWF-14???00:13:21:5C:11:16???192.168.19.29 MA-FXDWF-15???00:13:21:5D:12:17???192.168.19.30 MA-FXDWF-16???00:13:21:5E:13:18???192.168.19.31 ... ...
grep -A 3 "host" servers | # find lines that contain "host", list 3 lines following each matching line tr -d '\n' | # delete new lines to put everything on one line sed "s/--/\n/g" | # insert a new line where "--" occurs ("--" separates the grep matches) awk '{ print $2, $6, $8 }' | # print 2nd, 6th and 8th tokens, using default awk delimiter tr -d ';' | # delete semicolons sed "s/ /???/g" # replace single spaces with ???
Download a class roster.txt
. Using sed
search and replace operations, convert the raw roster.txt
file to a list with the following format:
Lastname-Firstname:StudentID
The list would be even better if Lastname and Firstname were both lower case, like this:
lastname-firstname:StudentID
cat roster.txt | awk -F ", " '{ print $1"-"$2":"$3 }' | # Using ", " as delimiter, extract and print last"-"first":"id sed "s/ [A-Z]\.//" | # Search for and delete middle initials (space, uppercase letter, period) tr A-Z a-z # Convert all to lowercase
In the script webadvisor2roster.sh
take a roster from webadvisor and transform it into
Last, First [MI], ID
format, writing to the file roster
.
In the script randomseating
, combine last names from a roster
(see 7. above) and a seats
file to randomize seating in HH 305.
Link to randomseating-v2 code (preferred)
Sum and display the total points in the quiz 1 file.
expression=$(cat csse370-su24-quiz1.txt | grep "[0-9] point" | sed "s/[^0-9]//g" | tr '\n' '+' | sed "s/+$//") answer=$(( expression )) echo $answer or echo $(( $(cat csse370-su24-quiz1.txt | grep "[0-9] point" | sed "s/[^0-9]//g" | tr '\n' '+' | sed "s/+$//") )) or # use bc, a command line calculator echo $(cat csse370-su24-quiz1.txt | grep "[0-9] point" | sed "s/[^0-9]//g" | tr '\n' '+' | sed "s/+$//") | bc # # pipeline breakdown # grep "[0-9] point" | # find lines that contain "n point(s)" sed "s/[^0-9]//g" | # delete all non-digit chars, leaving only a column of numbers tr '\n' '+' | # put all on single line, separated by "+" sed "s/+$//" # delete last "+" at end
Sort the following string from a scavenger hunt challenge:
22fl6abbz7yaabcdeezez99178
See the fold core text processing utility.
echo 22fl6abbz7yaabcdeezez99178 | fold -w 1 | # lines can be only 1 char wide (print string vertically) sort | tr -d '\n' # remove newlines to return to horizontal
Write the text2png script that turns standard input into a large wallpaper-type image file.
This will be a fairly long shell script that demonstrates:
Link to text2png.sh code | text2png_getopts.sh (alternate version that uses getopts)