1
00:00:00,000 --> 00:00:03,310
CS50 Live first looks today at GitLab.

2
00:00:03,310 --> 00:00:08,665
GitLab is a popular source code
hosting site, much like GitHub.com,

3
00:00:08,665 --> 00:00:12,160
that developers can use in order
to store their code centrally,

4
00:00:12,160 --> 00:00:14,500
in order to version control
it, share multiple copies,

5
00:00:14,500 --> 00:00:16,360
as well as share it with other users.

6
00:00:16,360 --> 00:00:20,170
Unfortunately, GitLab ran into
a bit of an issue very recently.

7
00:00:20,170 --> 00:00:23,112
The whole incident started
when they saw this.

8
00:00:23,112 --> 00:00:24,820
They support a feature
known as Snippets,

9
00:00:24,820 --> 00:00:26,860
whereby users can
create snippets of code,

10
00:00:26,860 --> 00:00:30,190
much like GitHub Gist, whereby users
can upload small snippets of code

11
00:00:30,190 --> 00:00:31,750
to share them with other people.

12
00:00:31,750 --> 00:00:35,800
Unfortunately, having some
1.5 million snippets of code

13
00:00:35,800 --> 00:00:37,960
created over the course
of just a few days?

14
00:00:37,960 --> 00:00:38,710
Not normal.

15
00:00:38,710 --> 00:00:41,380
In fact, this seemed to be the
result of some spamming behavior

16
00:00:41,380 --> 00:00:43,450
by some adversarial folks online.

17
00:00:43,450 --> 00:00:48,160
Moreover, GitLab also notice that one or
more spammers seemed to be using GitLab

18
00:00:48,160 --> 00:00:51,760
inappropriately, as a content
delivery network, or CDN,

19
00:00:51,760 --> 00:00:54,700
whereby they were serving up
files in ways that they shouldn't.

20
00:00:54,700 --> 00:00:57,280
Now unfortunately,
these kinds of attacks

21
00:00:57,280 --> 00:01:00,880
resulted in a bit of a ripple
effect on their back-end databases.

22
00:01:00,880 --> 00:01:03,137
Particularly, GitLab
posted the following.

23
00:01:03,137 --> 00:01:05,470
"We are experiencing issues
with our production database

24
00:01:05,470 --> 00:01:07,364
and are working to recover."

25
00:01:07,364 --> 00:01:09,280
Now unfortunately, just
minutes later did they

26
00:01:09,280 --> 00:01:12,430
post, "We accidentally
deleted production data

27
00:01:12,430 --> 00:01:15,130
and might have to restore from backup."

28
00:01:15,130 --> 00:01:16,540
Now what exactly happened?

29
00:01:16,540 --> 00:01:20,260
Well, it's quite common for databases
to be replicated from one to another

30
00:01:20,260 --> 00:01:22,130
so that you have a
primary and a secondary,

31
00:01:22,130 --> 00:01:25,690
the latter of which is a backup
of the former in real time.

32
00:01:25,690 --> 00:01:28,840
As part of the diagnosis
challenge for figuring out

33
00:01:28,840 --> 00:01:32,110
why the databases were slowing
down in terms of this replication,

34
00:01:32,110 --> 00:01:34,930
one of GitLab's system
administrators very deliberately

35
00:01:34,930 --> 00:01:37,250
executed a command quite like this.

36
00:01:37,250 --> 00:01:38,340
Now what is this command?

37
00:01:38,340 --> 00:01:40,360
Well at the front of
this command is "sudo,"

38
00:01:40,360 --> 00:01:44,410
which says execute the following
command with administrative, or root,

39
00:01:44,410 --> 00:01:45,460
privileges.

40
00:01:45,460 --> 00:01:46,930
What is the command to be executed?

41
00:01:46,930 --> 00:01:48,970
Well, rm -rf apparently.

42
00:01:48,970 --> 00:01:52,660
And rm, you might know, is to remove
files or folders from a system.

43
00:01:52,660 --> 00:01:55,060
-r though means recursively.

44
00:01:55,060 --> 00:01:58,480
Delete the following thing recursively,
so that any directories inside of that

45
00:01:58,480 --> 00:01:59,890
also get deleted.

46
00:01:59,890 --> 00:02:04,600
And unfortunately, the
"f" -rf means forcibly,

47
00:02:04,600 --> 00:02:08,110
which means don't even prompt
the human to confirm or deny

48
00:02:08,110 --> 00:02:09,610
that he or she wants to do this.

49
00:02:09,610 --> 00:02:12,970
Now the system administrator meant
to execute this command deliberately

50
00:02:12,970 --> 00:02:17,130
on their secondary database,
db2.cluster.gitlab.com,

51
00:02:17,130 --> 00:02:20,020
so that they could resume then
the replication from their primary

52
00:02:20,020 --> 00:02:21,412
to their secondary database.

53
00:02:21,412 --> 00:02:23,620
Unfortunately, it appears
to have been late at night,

54
00:02:23,620 --> 00:02:25,840
and this was a stressful
situation, and darn it

55
00:02:25,840 --> 00:02:30,720
if this command weren't executed
on db1.cluster.gitlab.com,

56
00:02:30,720 --> 00:02:32,830
the actual primary database.

57
00:02:32,830 --> 00:02:35,530
Now no big deal, surely we have
backups all over the place.

58
00:02:35,530 --> 00:02:37,800
So we can just restore from
backup, and our customers

59
00:02:37,800 --> 00:02:40,390
will be perfectly
happy and on their way.

60
00:02:40,390 --> 00:02:45,070
Unfortunately, out of five backup
or replication techniques deployed,

61
00:02:45,070 --> 00:02:49,840
GitLab reported that, "None are working
reliably or set up in the first place."

62
00:02:49,840 --> 00:02:53,350
Indeed, if you'd like to read their
whole post-mortem in which they

63
00:02:53,350 --> 00:02:57,430
discussed exactly what went wrong, and
how, you can check out this URL here.

64
00:02:57,430 --> 00:02:59,380
But the moral of the
story, for our purposes,

65
00:02:59,380 --> 00:03:04,090
is please, please beware
the rm -rf, especially

66
00:03:04,090 --> 00:03:06,430
if you're not just deleting
some directory of your own,

67
00:03:06,430 --> 00:03:09,520
potentially your customers as well.