−Table of Contents
Python Exercises
- Deadline: Friday, Mar 21, 11:59pm
- Use the Python exercises dropbox on ecampus. Don't dropbox any input or output files.
- NOTE: Unless otherwise noted, all work listed here is mandatory and counts toward your assignments grade.
- NOTE 2: You must insert your name, program description, course name, and semester in a comment block at the top of each program or else you will lose points.
Exercise 1 - standard input/output
Create, save and run pyex1.py
:
import sys for line in sys.stdin: sys.stdout.write( line )
- Create a separate input file and redirect its contents to
pyex1.py
as we did with Perl programs that required standard input. - How does a plain print statement work compared to sys.stdout.write?
- See if strings of standard input are “chomped” by Python by default. Try the string len() and rstrip() methods.
Exercise 2 - Count word lengths using a dictionary
Write pyex2.py
to find the number of occurrences of each word length in english.sorted (UTF-8 encoded). That is, find and print the number of words in english.sorted that are one character long, two characters long, three characters long and so on. Use a Python dictionary.
Try running pyex2.py
with the ISO-8859-encoded version of english.sorted.
See Python standard encodings.
Exercise 3 - roster to dictionary
Reproduce Perl exercise 4 in Python. Save it as pyex3.py
. Use the provided class roster file (roster_raw.txt). Also try a different roster format (roster.txt)
After covering the re module in Python, try this exercise again, and save it as pyex3_re.py
.
Exercise 4 - regular expression search and replace
In pyex4.py
, write and save a Python program that reads an old webadvisor roster file and writes information to a file called roster2.txt
in the following format:
last name, first name & middle initial, student id
Exercise 5 - html parsing with re
In pyex5.py
, parse the article titles and URLs from the HTML source of https://www.monmouth.edu/news/archives. Stor
e the titles and URLs in 2 lists. Use a dictionary newsfeed
to store the title:URL key:value pairs. Print the title
:URL pairs. Use the requests
package to retrieve the HTML.
Later: Try using a list comprehension to build the newsfeed dictionary from the titles and urls lists.
Exercise 6 - html parsing with a parser
In pyex6.py
, write a second version of exercise 5, try using the bs4
package (BeautifulSoup) to parse the information you need from the HTML.
Link that “explains” why you should not parse HTML with regex:
-
- Note: not to be taken seriously
Links to information on BeautifulSoup for HTML parsing:
Exercise 7 - web automation with selenium
In pyex7.py
, we will try a simple Python Selenium example. If doing this on your own machine, you will need to follow the installation instructions at
Run the Python-selenium test program to test your Python-Selenium setup.
Finally, in pyex7.py
, we will try to reproduce the mechanize_webadvisor.pl Perl program to interact with Webadvisor.
Also dropbox pyex7_headless.py
.