Create, save and run pyex1.py
:
import sys for line in sys.stdin: sys.stdout.write( line )
pyex1.py
as we did with Perl programs that required standard input.
Write pyex2.py
to find the number of occurrences of each word length in english.sorted (UTF-8 encoded). That is, find and print the number of words in english.sorted that are one character long, two characters long, three characters long and so on. Use a Python dictionary.
Try running pyex2.py
with the ISO-8859-encoded version of english.sorted.
See Python standard encodings.
Reproduce Perl exercise 4 in Python. Save it as pyex3.py
. Use the provided class roster file (roster_raw.txt). Also try a different roster format (roster.txt)
After covering the re module in Python, try this exercise again, and save it as pyex3_re.py
.
In pyex4.py
, write and save a Python program that reads an old webadvisor roster file and writes information to a file called roster2.txt
in the following format:
last name, first name & middle initial, student id
In pyex5.py
, parse the article titles and URLs from the HTML source of https://www.monmouth.edu/news/archives. Stor
e the titles and URLs in 2 lists. Use a dictionary newsfeed
to store the title:URL key:value pairs. Print the title
:URL pairs. Use the requests
package to retrieve the HTML.
Later: Try using a list comprehension to build the newsfeed dictionary from the titles and urls lists.
In pyex6.py
, write a second version of exercise 5, try using the bs4
package (BeautifulSoup) to parse the information you need from the HTML.
Link that “explains” why you should not parse HTML with regex:
Links to information on BeautifulSoup for HTML parsing:
In pyex7.py
, we will try a simple Python Selenium example. If doing this on your own machine, you will need to follow the installation instructions at
Run the Python-selenium test program to test your Python-Selenium setup.
Finally, in pyex7.py
, we will try to reproduce the mechanize_webadvisor.pl Perl program to interact with Webadvisor.
Also dropbox pyex7_headless.py
.