Stack is Back
Ordinarily, browsers parse (i.e., read) HTML, but any program can too. Python, for instance, comes with a simple HTML parser, implemented as a class called HTMLParser
. Recall that class
in Python is quite like struct
in C except that a class can also have functions (aka methods) inside of it. Classes also support "inheritance," whereby one class can inherit methods from another.
Indeed, take a look at Python’s own example HTML parser application, excerpted below, to which we’ve added some comments. (You might need to scroll the code to the right to see the comments in their entirety.)
from html.parser import HTMLParser # Import HTMLParser class from html.parser module
class MyHTMLParser(HTMLParser): # Define a class called MyHTMLParser that inherits all of HTMLParser's methods
def handle_starttag(self, tag, attrs): # Override HTMLParser's handle_starttag method with this one
print("Encountered a start tag:", tag)
def handle_endtag(self, tag): # Override HTMLParser's handle_endtag method with this one
print("Encountered an end tag :", tag)
def handle_data(self, data):
print("Encountered some data :", data)
parser = MyHTMLParser() # Instantiate a MyHTMLParser object
parser.feed('<html><head><title>Test</title></head>'
'<body><h1>Parse me!</h1></body></html>') # Parse a string of HTML
Unfortunately, per its documentation, HTMLParser
"does not check that end tags match start tags". But with the help of a stack, we can write a program to do just that.
Answer the below in stack.md
and parser.py
.
Questions
- 3.1 (2 points)
-
According to Python’s first look at classes, what, in your own words, is
__init__
? - 3.2 (3 points)
-
Complete the implementation of
peek
,pop
, andpush
inparser.py
in such a way that the methods behave as prescribed by the comments therein. Take care to replace with one or more new lines of code any line commented asTODO
. Assume that neitherpeek
norpop
will be called if aStack
is empty; those methods may (and should) indeed raise exceptions if called if aStack
is empty, though you needn’t raise (or worry about!) exceptions yourself. - 3.3 (1 point)
-
According to Python’s documentation for
HTMLParser
, when, in one sentence in your own words, ishandle_starttag
called? - 3.4 (1 point)
-
According to Python’s documentation for
HTMLParser
, when, in one sentence in your own words, ishandle_endtag
called? - 3.5 (4 points)
-
Complete the implementation of
handle_starttag
andhandle_endtag
inparser.py
in such a way that, given a string of HTML, the program printsmatched
if all end tags therein match start tags andmismatched
if not. For instance, given a string like<b><i></i></b>
, the program should printmatched
, but given a string like<b><i></b></i>
, the program should printmismatched
. Take care to replace with one or more new lines of code any line commented asTODO
. Your implementation ofhandle_endtag
may propagate exceptions in cases of mismatched tags, though you needn’t raise (or worry about!) exceptions yourself.
Debrief
-
Which resources, if any, did you find helpful in answering this problem’s questions?
-
About how long, in minutes, did you spend on this problem’s questions?