1
00:00:00,000 --> 00:00:00,510

2
00:00:00,510 --> 00:00:02,250
ZAMYLA CHAN: Let's analyze some words.

3
00:00:02,250 --> 00:00:04,950
In this part of the problem,
we're going to allow the user

4
00:00:04,950 --> 00:00:07,890
to pass in a word as a
command line argument.

5
00:00:07,890 --> 00:00:12,780
And then we will analyze whether that
word is generally positive, generally

6
00:00:12,780 --> 00:00:15,090
negative, or just neutral.

7
00:00:15,090 --> 00:00:18,040
So per this example, the
word "love" is positive,

8
00:00:18,040 --> 00:00:20,460
so it returns a green smiley face.

9
00:00:20,460 --> 00:00:24,180
The word "hate" is negative,
so it returns a red sad face.

10
00:00:24,180 --> 00:00:29,940
And then "Stanford" is, well neutral,
so we'll return a yellow neutral face.

11
00:00:29,940 --> 00:00:31,800
So what do we have to do here?

12
00:00:31,800 --> 00:00:35,160
Well, first things first is that we
read all of the distribution code

13
00:00:35,160 --> 00:00:37,770
carefully and make
sure that we understand

14
00:00:37,770 --> 00:00:42,480
how that distribution code already lends
itself to the workflow of the problem.

15
00:00:42,480 --> 00:00:45,840
Then we'll move into
analyzer.py and learn

16
00:00:45,840 --> 00:00:50,040
how to initialize our Analyzer with
the positive and negative words.

17
00:00:50,040 --> 00:00:52,530
Then we'll fill in
the function, Analyze,

18
00:00:52,530 --> 00:00:55,350
and analyze the word that's passed in.

19
00:00:55,350 --> 00:00:57,420
Let's start with the distribution code.

20
00:00:57,420 --> 00:01:01,160
Starting with positive-words.txt
and negative-words.txt,

21
00:01:01,160 --> 00:01:05,430
we'll see that these text files have
comments preceded by semicolons before

22
00:01:05,430 --> 00:01:07,200
we get to the actual words.

23
00:01:07,200 --> 00:01:10,620
So all of the positive
words and negative words

24
00:01:10,620 --> 00:01:13,320
are in lines, where each
line ends with a new line.

25
00:01:13,320 --> 00:01:17,460
And some of those lines,
potentially, are also blank.

26
00:01:17,460 --> 00:01:20,140
Next is the Smile file.

27
00:01:20,140 --> 00:01:23,970
So you'll notice that Smile doesn't have
any sort of file extension after it.

28
00:01:23,970 --> 00:01:28,410
But based on the shebang at the
very top, it says that it is Python.

29
00:01:28,410 --> 00:01:33,690
So in order to read it correctly, enable
syntax highlighting for Python files.

30
00:01:33,690 --> 00:01:38,370
OK, so then you'll see that
we import the class Analyzer

31
00:01:35,730 --> 00:01:39,270
Now we can move on to
initializing the Analyzer.

32
00:01:38,370 --> 00:01:40,680
from the module Analyzer,
and then the function

33
00:01:40,680 --> 00:01:43,320
Colored from the Termcolor package.

34
00:01:43,320 --> 00:01:45,460
So these will come in handy later.

35
00:01:45,460 --> 00:01:47,730
Then there's a main
function that ensures

36
00:01:47,730 --> 00:01:51,390
that the user passes in the expected
number of command line arguments.

37
00:01:51,390 --> 00:01:55,970
Then it analyzes the word, retrieving
the score, and prints the result,

38
00:01:55,970 --> 00:01:57,660
colored accordingly.

39
00:01:57,660 --> 00:01:59,850
So Smile is already finished for us.

40
00:01:59,850 --> 00:02:03,270
What we're going to do
is go into analyzer.py

41
00:02:03,270 --> 00:02:07,230
and fill in the functionality
so that Smile works correctly.

42
00:02:07,230 --> 00:02:10,350
Analyzer has two functions,
an Initialization function,

43
00:02:10,350 --> 00:02:14,430
which will load the positive and
negative words, and then Analyze, which

44
00:02:14,430 --> 00:02:18,690
when we're done, will assign every
word in the text a value, negative 1

45
00:02:18,690 --> 00:02:22,380
for any negative words,
zero for a neutral word,

46
00:02:22,380 --> 00:02:25,470
and 1 for a positive word.

47
00:02:25,470 --> 00:02:27,690
Finally, we're going
to want to calculate

48
00:02:27,690 --> 00:02:31,780
the text's total score to determine
whether it's positive, negative,

49
00:02:31,780 --> 00:02:32,730
or neutral.

50
00:02:35,800 --> 00:02:39,580
into some structure that allows
us to keep track and search it.

51
00:02:39,580 --> 00:02:44,170
So let's look into lists,
dicts, or sets-- up to you.

52
00:02:44,170 --> 00:02:50,380
Store those positive and negative words
into self.positives and self.negatives

53
00:02:50,380 --> 00:02:54,850
respectively, making sure to omit
any leading or trailing whitespace.

54
00:02:54,850 --> 00:02:59,080
So look into the Strip function
to see how you can do this.

55
00:02:59,080 --> 00:03:03,880
Now also remember how when we opened the
positive and negative word text files,

56
00:03:03,880 --> 00:03:06,940
there were all these comments at the
top that started with semicolons.

57
00:03:06,940 --> 00:03:09,010
We don't want to include any of those.

58
00:03:09,010 --> 00:03:11,680
So look into the
Startswith function to see

59
00:03:11,680 --> 00:03:17,000
how you can check whether or
not to include that line or not.

60
00:03:17,000 --> 00:03:19,269
So once you've loaded
those in, then let's

61
00:03:19,269 --> 00:03:22,390
analyze and move on to that function.

62
00:03:22,390 --> 00:03:26,380
We're going to be talking
about tokens and tokenizers.

63
00:03:26,380 --> 00:03:30,400
In order to help us analyze these
words, we're going to use a tokenizer.

64
00:03:30,400 --> 00:03:33,190
The tokenizer allows
us to split up words

65
00:03:33,190 --> 00:03:36,850
into single tokens, where
each individual token contains

66
00:03:36,850 --> 00:03:38,600
a single word.

67
00:03:38,600 --> 00:03:42,770
So we can instantiate
our tokenizer as follows.

68
00:03:42,770 --> 00:03:44,650
So now let's talk about iteration.

69
00:03:44,650 --> 00:03:47,770
If I have some text file,
then if I want to do something

70
00:03:47,770 --> 00:03:49,720
with every line in that
text file, then I'm

71
00:03:49,720 --> 00:03:52,810
going to have a With
statement opening that file

72
00:03:52,810 --> 00:03:58,120
and then simply iterating over by having
the statement "for line in lines."

73
00:03:58,120 --> 00:04:01,180
And then I can do something
to every line within that.

74
00:04:01,180 --> 00:04:05,110
Now that we have all of these tools at
our disposal, let's go back to Analyze

75
00:04:05,110 --> 00:04:07,780
and iterate over all of our tokens.

76
00:04:07,780 --> 00:04:11,050
I'll give you a hint to look
at the lower function here.

77
00:04:11,050 --> 00:04:16,630
Then we'll want to check to see if that
token is a positive or a negative word.

78
00:04:16,630 --> 00:04:20,230
If it's in neither data structure,
then that means that it's neutral.

79
00:04:20,230 --> 00:04:24,760
So assigning each word the
appropriate integer value

80
00:04:24,760 --> 00:04:28,480
returned the final score to Smile.

81
00:04:28,480 --> 00:04:31,580
And with that you've
finished analyzing the words.

82
00:04:31,580 --> 00:04:32,740
My name is Zamyla.

83
00:04:32,740 --> 00:04:36,190
And this was analyzer.py.

84
00:04:36,190 --> 00:04:38,736