Stack is Back

Ordinarily, browsers parse (i.e., read) HTML, but any program can too. Python, for instance, comes with a simple HTML parser, implemented as a class called HTMLParser. Recall that class in Python is quite like struct in C except that a class can also have functions (aka methods) inside of it. Classes also support "inheritance," whereby one class can inherit methods from another.

Indeed, take a look at Python’s own example HTML parser application, excerpted below, to which we’ve added some comments. (You might need to scroll the code to the right to see the comments in their entirety.)

from html.parser import HTMLParser  # Import HTMLParser class from html.parser module

class MyHTMLParser(HTMLParser):  # Define a class called MyHTMLParser that inherits all of HTMLParser's methods
    def handle_starttag(self, tag, attrs):  # Override HTMLParser's handle_starttag method with this one
        print("Encountered a start tag:", tag)

    def handle_endtag(self, tag):  # Override HTMLParser's handle_endtag method with this one
        print("Encountered an end tag :", tag)

    def handle_data(self, data):
        print("Encountered some data  :", data)

parser = MyHTMLParser()  # Instantiate a MyHTMLParser object
parser.feed('<html><head><title>Test</title></head>'
            '<body><h1>Parse me!</h1></body></html>')  # Parse a string of HTML

Unfortunately, per its documentation, HTMLParser "does not check that end tags match start tags". But with the help of a stack, we can write a program to do just that.

Answer the below in stack.md and parser.py.

Questions

3.1 (2 points)

According to Python’s first look at classes, what, in your own words, is __init__?

3.2 (3 points)

Complete the implementation of peek, pop, and push in parser.py in such a way that the methods behave as prescribed by the comments therein. Take care to replace with one or more new lines of code any line commented as TODO. Assume that neither peek nor pop will be called if a Stack is empty; those methods may (and should) indeed raise exceptions if called if a Stack is empty, though you needn’t raise (or worry about!) exceptions yourself.

3.3 (1 point)

According to Python’s documentation for HTMLParser, when, in one sentence in your own words, is handle_starttag called?

3.4 (1 point)

According to Python’s documentation for HTMLParser, when, in one sentence in your own words, is handle_endtag called?

3.5 (4 points)

Complete the implementation of handle_starttag and handle_endtag in parser.py in such a way that, given a string of HTML, the program prints matched if all end tags therein match start tags and mismatched if not. For instance, given a string like <b><i></i></b>, the program should print matched, but given a string like <b><i></b></i>, the program should print mismatched. Take care to replace with one or more new lines of code any line commented as TODO. Your implementation of handle_endtag may propagate exceptions in cases of mismatched tags, though you needn’t raise (or worry about!) exceptions yourself.

Debrief

  1. Which resources, if any, did you find helpful in answering this problem’s questions?

  2. About how long, in minutes, did you spend on this problem’s questions?