00:00:00,290 --> 00:00:03,081 BRIAN YU: Now that we've figured out how to compare two files based on how many lines they have in common, now let's think about how to compare two files based on the number of sentences that they have in common. Now, what might that look like, and what do you have to do? First, just like in the last function, you'll take in string inputs a and b, each one of which will be the textual representation of some file. Then instead of splitting each string into lines, you'll split each string into sentences. Then you'll calculate the list of sentences that appear in both a and also in b. And finally, you'll return a list that contains all of the sentences that appear in both of the original strings. The challenge here is how do you take a string and convert it into a list of all of the sentences that make it up. If, for example, we had a string like "Hello there! How are you?" we would want to split that up into "Hello there!" and "How are you?" knowing that "Hello there!" is one sentence and "How are you?" is another sentence. And here complicated issues like dealing with different types of punctuation-- whether they're periods, or exclamation points, or question marks-- might come into play. However, luckily for us, someone's already implemented this functionality for us. And we can stand on their shoulders in order to take advantage of the ability to split a string into sentences and use it for our own purposes. For this, we're going to use a Python library called NLTK, or Natural Language Toolkit, which, within it, defines a function called sent_tokenize-- for sentence tokenize-- which takes a string and splits it up into all of the sentences that make it up. In order to use it, you can import the function using a line that looks something like this-- "from nltk.tokenize import sent_tokenize," which will allow you to use the sent_tokenize function in order to take a string and split it up into its component sentences. Once you've done that, the last step is going to be to find the sentences that are in common and return that as a list. So as usual, make sure you avoid duplicates. If a sentence appears multiple times, you only want it to appear once in your final list. And of course, make sure that the data type that you return at the end of the sentence's function is, in fact, a list. After that you should have a list of matching sentences that you can then return back to your program.