====== Python Exercises ======
* Deadline: Friday, Mar 21, 11:59pm
* Use the Python exercises dropbox on ecampus. __**Don't dropbox any input or output files.**__
* **NOTE:** __Unless otherwise noted__, all work listed here is mandatory and counts toward your assignments grade.
* **NOTE 2:** You must insert your name, program description, course name, and semester in a comment block at the top of each program or else you will lose points.
----
===== Exercise 1 - standard input/output =====
Create, save and run ''pyex1.py'':
import sys
for line in sys.stdin:
sys.stdout.write( line )
* Create a separate input file and redirect its contents to ''pyex1.py'' as we did with Perl programs that required standard input.
* How does a plain //print// statement work compared to //sys.stdout.write//?
* See if strings of standard input are "chomped" by Python by default. Try the string //len()// and //rstrip()// methods.
----
===== Exercise 2 - Count word lengths using a dictionary =====
Write ''pyex2.py'' to find the number of occurrences of each word length in [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m7f5o40c1a86y3|english.sorted (UTF-8 encoded)]]. That is, find and print the number of words in //english.sorted// that are one character long, two characters long, three characters long and so on. Use a Python dictionary.
Try running ''pyex2.py'' with the [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m6b5e4g0zx337b|ISO-8859-encoded version]] of english.sorted.
See [[https://docs.python.org/3/library/codecs.html#standard-encodings|Python standard encodings]].
----
===== Exercise 3 - roster to dictionary =====
Reproduce [[perl_exercises#exercise_4_-_roster_to_hash|Perl exercise 4]] in Python. Save it as ''pyex3.py''. Use the provided [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m6na16o3pat3uh | class roster file (roster_raw.txt)]]. Also try [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m6v65najj9234t| a different roster format (roster.txt)]]
After covering the **re** module in Python, try this exercise again, and save it as ''pyex3_re.py''.
----
===== Exercise 4 - regular expression search and replace =====
In ''pyex4.py'', write and save a Python program that reads an [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m7kw0qwx1375ot|old webadvisor roster]] file and writes information to a file called ''roster2.txt'' in the following format:
last name, first name & middle initial, student id
----
===== Exercise 5 - html parsing with re =====
In ''pyex5.py'', parse the article titles and URLs from the HTML source of https://www.monmouth.edu/news/archives. Stor
e the titles and URLs in 2 lists. Use a dictionary ''newsfeed'' to store the title:URL key:value pairs. Print the title
:URL pairs. Use the ''requests'' package to retrieve the HTML.
//Later: Try using a list comprehension to build the newsfeed dictionary from the titles and urls lists.//
----
===== Exercise 6 - html parsing with a parser =====
In ''pyex6.py'', write a second version of exercise 5, try using the ''bs4'' package (BeautifulSoup) to parse the information you need from the HTML.
Link that "explains" why you should not parse HTML with regex:
* https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
* Note: not to be taken seriously
Links to information on BeautifulSoup for HTML parsing:
* https://www.crummy.com/software/BeautifulSoup/bs4/doc/
* https://www.digitalocean.com/community/tutorials/how-to-scrape-web-pages-with-beautiful-soup-and-python-3
----
===== Exercise 7 - web automation with selenium =====
In ''pyex7.py'', we will try a simple [[https://selenium-python.readthedocs.io/index.html|Python Selenium]] example. If doing this on your own machine, you will need to follow the installation instructions at
* https://piazza.com/class/m65min8nryeil/post/13
Run the [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m7z72fw9bsc72f|Python-selenium test program]] to test your Python-Selenium setup.
Finally, in ''pyex7.py'', we will try to reproduce the [[https://piazza.com/class_profile/get_resource/m65min8nryeil/m7z744o3bjp2iz|mechanize_webadvisor.pl]] Perl program to interact with Webadvisor.
Also dropbox ''pyex7_headless.py''.
----