1 00:00:00,000 --> 00:00:00,510 2 00:00:00,510 --> 00:00:02,250 ZAMYLA CHAN: Let's analyze some words. 3 00:00:02,250 --> 00:00:04,950 In this part of the problem, we're going to allow the user 4 00:00:04,950 --> 00:00:07,890 to pass in a word as a command line argument. 5 00:00:07,890 --> 00:00:12,780 And then we will analyze whether that word is generally positive, generally 6 00:00:12,780 --> 00:00:15,090 negative, or just neutral. 7 00:00:15,090 --> 00:00:18,040 So per this example, the word "love" is positive, 8 00:00:18,040 --> 00:00:20,460 so it returns a green smiley face. 9 00:00:20,460 --> 00:00:24,180 The word "hate" is negative, so it returns a red sad face. 10 00:00:24,180 --> 00:00:29,940 And then "Stanford" is, well neutral, so we'll return a yellow neutral face. 11 00:00:29,940 --> 00:00:31,800 So what do we have to do here? 12 00:00:31,800 --> 00:00:35,160 Well, first things first is that we read all of the distribution code 13 00:00:35,160 --> 00:00:37,770 carefully and make sure that we understand 14 00:00:37,770 --> 00:00:42,480 how that distribution code already lends itself to the workflow of the problem. 15 00:00:42,480 --> 00:00:45,840 Then we'll move into analyzer.py and learn 16 00:00:45,840 --> 00:00:50,040 how to initialize our Analyzer with the positive and negative words. 17 00:00:50,040 --> 00:00:52,530 Then we'll fill in the function, Analyze, 18 00:00:52,530 --> 00:00:55,350 and analyze the word that's passed in. 19 00:00:55,350 --> 00:00:57,420 Let's start with the distribution code. 20 00:00:57,420 --> 00:01:01,160 Starting with positive-words.txt and negative-words.txt, 21 00:01:01,160 --> 00:01:05,430 we'll see that these text files have comments preceded by semicolons before 22 00:01:05,430 --> 00:01:07,200 we get to the actual words. 23 00:01:07,200 --> 00:01:10,620 So all of the positive words and negative words 24 00:01:10,620 --> 00:01:13,320 are in lines, where each line ends with a new line. 25 00:01:13,320 --> 00:01:17,460 And some of those lines, potentially, are also blank. 26 00:01:17,460 --> 00:01:20,140 Next is the Smile file. 27 00:01:20,140 --> 00:01:23,970 So you'll notice that Smile doesn't have any sort of file extension after it. 28 00:01:23,970 --> 00:01:28,410 But based on the shebang at the very top, it says that it is Python. 29 00:01:28,410 --> 00:01:33,690 So in order to read it correctly, enable syntax highlighting for Python files. 30 00:01:33,690 --> 00:01:38,370 OK, so then you'll see that we import the class Analyzer 31 00:01:35,730 --> 00:01:39,270 Now we can move on to initializing the Analyzer. 32 00:01:38,370 --> 00:01:40,680 from the module Analyzer, and then the function 33 00:01:40,680 --> 00:01:43,320 Colored from the Termcolor package. 34 00:01:43,320 --> 00:01:45,460 So these will come in handy later. 35 00:01:45,460 --> 00:01:47,730 Then there's a main function that ensures 36 00:01:47,730 --> 00:01:51,390 that the user passes in the expected number of command line arguments. 37 00:01:51,390 --> 00:01:55,970 Then it analyzes the word, retrieving the score, and prints the result, 38 00:01:55,970 --> 00:01:57,660 colored accordingly. 39 00:01:57,660 --> 00:01:59,850 So Smile is already finished for us. 40 00:01:59,850 --> 00:02:03,270 What we're going to do is go into analyzer.py 41 00:02:03,270 --> 00:02:07,230 and fill in the functionality so that Smile works correctly. 42 00:02:07,230 --> 00:02:10,350 Analyzer has two functions, an Initialization function, 43 00:02:10,350 --> 00:02:14,430 which will load the positive and negative words, and then Analyze, which 44 00:02:14,430 --> 00:02:18,690 when we're done, will assign every word in the text a value, negative 1 45 00:02:18,690 --> 00:02:22,380 for any negative words, zero for a neutral word, 46 00:02:22,380 --> 00:02:25,470 and 1 for a positive word. 47 00:02:25,470 --> 00:02:27,690 Finally, we're going to want to calculate 48 00:02:27,690 --> 00:02:31,780 the text's total score to determine whether it's positive, negative, 49 00:02:31,780 --> 00:02:32,730 or neutral. 50 00:02:35,800 --> 00:02:39,580 into some structure that allows us to keep track and search it. 51 00:02:39,580 --> 00:02:44,170 So let's look into lists, dicts, or sets-- up to you. 52 00:02:44,170 --> 00:02:50,380 Store those positive and negative words into self.positives and self.negatives 53 00:02:50,380 --> 00:02:54,850 respectively, making sure to omit any leading or trailing whitespace. 54 00:02:54,850 --> 00:02:59,080 So look into the Strip function to see how you can do this. 55 00:02:59,080 --> 00:03:03,880 Now also remember how when we opened the positive and negative word text files, 56 00:03:03,880 --> 00:03:06,940 there were all these comments at the top that started with semicolons. 57 00:03:06,940 --> 00:03:09,010 We don't want to include any of those. 58 00:03:09,010 --> 00:03:11,680 So look into the Startswith function to see 59 00:03:11,680 --> 00:03:17,000 how you can check whether or not to include that line or not. 60 00:03:17,000 --> 00:03:19,269 So once you've loaded those in, then let's 61 00:03:19,269 --> 00:03:22,390 analyze and move on to that function. 62 00:03:22,390 --> 00:03:26,380 We're going to be talking about tokens and tokenizers. 63 00:03:26,380 --> 00:03:30,400 In order to help us analyze these words, we're going to use a tokenizer. 64 00:03:30,400 --> 00:03:33,190 The tokenizer allows us to split up words 65 00:03:33,190 --> 00:03:36,850 into single tokens, where each individual token contains 66 00:03:36,850 --> 00:03:38,600 a single word. 67 00:03:38,600 --> 00:03:42,770 So we can instantiate our tokenizer as follows. 68 00:03:42,770 --> 00:03:44,650 So now let's talk about iteration. 69 00:03:44,650 --> 00:03:47,770 If I have some text file, then if I want to do something 70 00:03:47,770 --> 00:03:49,720 with every line in that text file, then I'm 71 00:03:49,720 --> 00:03:52,810 going to have a With statement opening that file 72 00:03:52,810 --> 00:03:58,120 and then simply iterating over by having the statement "for line in lines." 73 00:03:58,120 --> 00:04:01,180 And then I can do something to every line within that. 74 00:04:01,180 --> 00:04:05,110 Now that we have all of these tools at our disposal, let's go back to Analyze 75 00:04:05,110 --> 00:04:07,780 and iterate over all of our tokens. 76 00:04:07,780 --> 00:04:11,050 I'll give you a hint to look at the lower function here. 77 00:04:11,050 --> 00:04:16,630 Then we'll want to check to see if that token is a positive or a negative word. 78 00:04:16,630 --> 00:04:20,230 If it's in neither data structure, then that means that it's neutral. 79 00:04:20,230 --> 00:04:24,760 So assigning each word the appropriate integer value 80 00:04:24,760 --> 00:04:28,480 returned the final score to Smile. 81 00:04:28,480 --> 00:04:31,580 And with that you've finished analyzing the words. 82 00:04:31,580 --> 00:04:32,740 My name is Zamyla. 83 00:04:32,740 --> 00:04:36,190 And this was analyzer.py. 84 00:04:36,190 --> 00:04:38,736