Python Exercises

Deadline: Friday, Mar 21, 11:59pm
- Use the Python exercises dropbox on ecampus. Don't dropbox any input or output files.

NOTE: Unless otherwise noted, all work listed here is mandatory and counts toward your assignments grade.

NOTE 2: You must insert your name, program description, course name, and semester in a comment block at the top of each program or else you will lose points.

Exercise 1 - standard input/output

Create, save and run pyex1.py:

import sys
for line in sys.stdin:
    sys.stdout.write( line )

Create a separate input file and redirect its contents to pyex1.py as we did with Perl programs that required standard input.
How does a plain print statement work compared to sys.stdout.write?
See if strings of standard input are “chomped” by Python by default. Try the string len() and rstrip() methods.

Exercise 2 - Count word lengths using a dictionary

Write pyex2.py to find the number of occurrences of each word length in english.sorted (UTF-8 encoded). That is, find and print the number of words in english.sorted that are one character long, two characters long, three characters long and so on. Use a Python dictionary.

Try running pyex2.py with the ISO-8859-encoded version of english.sorted.

See Python standard encodings.

Exercise 3 - roster to dictionary

Reproduce Perl exercise 4 in Python. Save it as pyex3.py. Use the provided class roster file (roster_raw.txt). Also try a different roster format (roster.txt)

After covering the re module in Python, try this exercise again, and save it as pyex3_re.py.

Exercise 4 - regular expression search and replace

In pyex4.py, write and save a Python program that reads an old webadvisor roster file and writes information to a file called roster2.txt in the following format:

     last name, first name & middle initial, student id

Exercise 5 - html parsing with re

In pyex5.py, parse the article titles and URLs from the HTML source of https://www.monmouth.edu/news/archives. Stor e the titles and URLs in 2 lists. Use a dictionary newsfeed to store the title:URL key:value pairs. Print the title :URL pairs. Use the requests package to retrieve the HTML.

Later: Try using a list comprehension to build the newsfeed dictionary from the titles and urls lists.

Exercise 6 - html parsing with a parser

In pyex6.py, write a second version of exercise 5, try using the bs4 package (BeautifulSoup) to parse the information you need from the HTML.

Link that “explains” why you should not parse HTML with regex:

https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
- Note: not to be taken seriously

Links to information on BeautifulSoup for HTML parsing:

Exercise 7 - web automation with selenium

In pyex7.py, we will try a simple Python Selenium example. If doing this on your own machine, you will need to follow the installation instructions at

https://piazza.com/class/m65min8nryeil/post/13

Run the Python-selenium test program to test your Python-Selenium setup.

Finally, in pyex7.py, we will try to reproduce the mechanize_webadvisor.pl Perl program to interact with Webadvisor.

Also dropbox pyex7_headless.py.

Joe Chung
Monmouth U. Homepage

Table of Contents