1 00:00:00,000 --> 00:00:03,234 >> [Music kucheza] 2 00:00:03,234 --> 00:00:05,275 3 00:00:05,275 --> 00:00:06,400 ROBERT KRABEK: Hello, nyie. 4 00:00:06,400 --> 00:00:09,980 Jina langu ni Robert Krabek, na Mimi nitakuwa nyie akifundisha 5 00:00:09,980 --> 00:00:15,470 jinsi ya scrape mtandao na Nokogiri, ambayo ni maktaba Ruby, 6 00:00:15,470 --> 00:00:17,566 na Kimono, ambayo ni ugani Chrome. 7 00:00:17,566 --> 00:00:20,940 8 00:00:20,940 --> 00:00:25,010 >> Hivyo kwanza kuna wanandoa mambo ambayo 9 00:00:25,010 --> 00:00:28,790 anaweza kufanya kama labda tumekuwa kufanya psets zote hadi sasa 10 00:00:28,790 --> 00:00:31,170 na workspace yako ni kupata kidogo kamili. 11 00:00:31,170 --> 00:00:37,060 Tunaweza kweli tu kwenda na kuunda workspace mpya kwa ajili yako 12 00:00:37,060 --> 00:00:41,220 tu kufanya bidhaa mpya mradi katika. 13 00:00:41,220 --> 00:00:46,160 Hivyo kama huna wanataka kuendelea kufanya kazi katika CS50 template ID 14 00:00:46,160 --> 00:00:49,080 kwamba sasa kuwa, kujisikia huru, na unaweza tu 15 00:00:49,080 --> 00:00:54,700 kufunga Nokogiri na CFLAGS equals-- gem kufunga nokogiri. 16 00:00:54,700 --> 00:00:56,930 Lakini vinginevyo mimi nitakuonyesha jinsi ya kuanzisha mpya moja up. 17 00:00:56,930 --> 00:01:01,210 Na kisha hii ni kimsingi kuacha magurudumu mafunzo zaidi. 18 00:01:01,210 --> 00:01:07,120 Na wewe ni coding kama kama ungekuwa coding tu katika Mtukufu au kitu. 19 00:01:07,120 --> 00:01:12,365 Hivyo kama sisi kuhama juu. 20 00:01:12,365 --> 00:01:14,930 21 00:01:14,930 --> 00:01:18,690 >> Hivyo kusema hii ni sasa CS 50 ID yako. 22 00:01:18,690 --> 00:01:21,490 Unaweza tu kwenda Cloud9 hapa. 23 00:01:21,490 --> 00:01:22,725 Unaweza kwenda dashibodi yako. 24 00:01:22,725 --> 00:01:26,720 25 00:01:26,720 --> 00:01:29,950 Ni lazima kuleta workspaces tab. 26 00:01:29,950 --> 00:01:32,980 Na kisha unaweza bonyeza tu hapa, Kujenga New Workspace. 27 00:01:32,980 --> 00:01:37,600 Jina workspace yako mpya, labda mtihani, au kugema. 28 00:01:37,600 --> 00:01:42,700 Na kisha click huu tabo desturi hapa, badala ya CS50 templates tab. 29 00:01:42,700 --> 00:01:45,155 Na kisha unaweza tu kwenda na kuunda workspace mpya. 30 00:01:45,155 --> 00:01:48,280 >> Nimekuwa tayari umba workspace hapa. 31 00:01:48,280 --> 00:01:50,640 Hivyo tutaweza kufanya kazi na hii. 32 00:01:50,640 --> 00:01:55,380 Na kama wewe kuundwa mpya workspace hivyo kwa tab Desturi, 33 00:01:55,380 --> 00:02:04,560 unaweza tu aina gem kufunga nokogiri, ambayo si kwenda hapa. 34 00:02:04,560 --> 00:02:06,230 OK, ni kidogo hao. 35 00:02:06,230 --> 00:02:08,979 Lakini unaweza aina gem kufunga nokogiri. 36 00:02:08,979 --> 00:02:15,970 Na kwamba lazima kila kitu hapo ni ufungaji. 37 00:02:15,970 --> 00:02:20,590 >> Kama nilivyosema hapo kabla, kama wewe ni bado kufanya kazi katika CS50 yako template ID, 38 00:02:20,590 --> 00:02:30,270 wewe tu haja ya aina CFLAGS sawa gem kufunga nokogiri. 39 00:02:30,270 --> 00:02:33,130 Na nimekuwa tayari imewekwa hapa hivyo mimi si kufanya hivyo. 40 00:02:33,130 --> 00:02:38,500 Lakini kwa wale zifuatazo pamoja, kujisikia huru kufanya hivyo. 41 00:02:38,500 --> 00:02:46,000 >> Hivyo mara nimepata Nokogiri yako workspace au maktaba imewekwa, 42 00:02:46,000 --> 00:02:49,500 Mimi nina kwenda kukupa kidogo bila shaka ajali katika Ruby syntax 43 00:02:49,500 --> 00:02:53,380 kwa sababu Nokogiri ni maktaba Ruby. 44 00:02:53,380 --> 00:03:03,710 Hivyo itabidi haja ya kujua baadhi ya msingi Ruby syntax kwa kufanya kazi na Nokogiri. 45 00:03:03,710 --> 00:03:08,750 Hivyo baadhi ya tofauti za msingi kutokana na kile wewe ni kutumika kwa 46 00:03:08,750 --> 00:03:13,370 labda kama tumekuwa kufanya kazi hadi sasa katika tu C na PHP, 47 00:03:13,370 --> 00:03:16,010 wewe kutangaza vigezo na hakuna aina. 48 00:03:16,010 --> 00:03:19,720 Huwezi kutumia semicolons, ambayo ni aina ya misaada. 49 00:03:19,720 --> 00:03:25,480 Hakuna mabano sasa karibu kwa au wakati tanzi, kwa mfano. 50 00:03:25,480 --> 00:03:29,460 Wewe tu na kuzuia wa kanuni, na basi kukomesha mwishoni mwa hiyo. 51 00:03:29,460 --> 00:03:32,380 Hakuna pamoja pamoja na au bala bala, hivyo tu 52 00:03:32,380 --> 00:03:36,180 tunajua kwamba kwa wakati unafanya kwa tanzi, 53 00:03:36,180 --> 00:03:38,620 tu pamoja na bala sawa sawa. 54 00:03:38,620 --> 00:03:43,310 Na badala ya hash ni pamoja na, itabidi kutumia zinahitaji na kisha 55 00:03:43,310 --> 00:03:47,755 chochote maktaba kujaribu kupakia katika mpango wako. 56 00:03:47,755 --> 00:03:51,610 57 00:03:51,610 --> 00:03:53,430 >> Ruby sio lugha ulioandaliwa. 58 00:03:53,430 --> 00:03:55,550 Hivyo hiyo ni misaada mwingine. 59 00:03:55,550 --> 00:03:59,350 Ni zaidi sawa na PHP ambapo ni lugha kufasiriwa. 60 00:03:59,350 --> 00:04:03,570 Unaweza kukimbia yoyote script Ruby kwamba kuandika na Ruby ikifuatiwa 61 00:04:03,570 --> 00:04:07,380 kwa jina la script yako au mpango. 62 00:04:07,380 --> 00:04:13,000 Kwa ishara kwamba ni mpango Ruby, wewe tu kuishia kwa .rb badala ya c. 63 00:04:13,000 --> 00:04:17,440 Na kuna kutofautiana arrays ukubwa katika Ruby, 64 00:04:17,440 --> 00:04:23,200 ambayo ni super rahisi wakati uko kugema na labda wanataka append 65 00:04:23,200 --> 00:04:26,090 data kwamba umefanya scraped katika safu. 66 00:04:26,090 --> 00:04:31,960 Huna malloc safu mpya na nakala safu ya zamani katika safu mpya. 67 00:04:31,960 --> 00:04:36,150 Unaweza tu append kwa mbili mshale ishara. 68 00:04:36,150 --> 00:04:39,820 Na hakuna chars, kuna moja tu masharti barua. 69 00:04:39,820 --> 00:04:44,760 Hivyo kwamba wanapaswa kuwa rahisi kidogo. 70 00:04:44,760 --> 00:04:50,130 >> Hivyo tutaweza tu kukupa baadhi mifano ya baadhi ya msingi Ruby syntax. 71 00:04:50,130 --> 00:04:57,100 Hivyo hapa unaweza kuona kwamba badala ya kufyeka kufyeka, kutoa maoni katika Ruby, 72 00:04:57,100 --> 00:04:58,740 wewe tu kutumia chupa ishara. 73 00:04:58,740 --> 00:05:04,990 Na tamko kutofautiana, wewe aina tu sawa kutofautiana 74 00:05:04,990 --> 00:05:07,971 chochote unataka kutofautiana kuwa. 75 00:05:07,971 --> 00:05:09,220 Wanaweza kuwa masharti. 76 00:05:09,220 --> 00:05:14,120 Unaweza kuwa na safu, ambayo wewe na idadi ya na maadili. 77 00:05:14,120 --> 00:05:17,240 unaweka na prints ni sawa. 78 00:05:17,240 --> 00:05:20,110 Kwa madhumuni yetu, Tofauti tu ni kweli 79 00:05:20,110 --> 00:05:25,500 kwamba unaweka, ambayo inasimamia kwa unaweka, unaweka tu mstari mpya 80 00:05:25,500 --> 00:05:27,440 tabia katika chochote ni uchapishaji. 81 00:05:27,440 --> 00:05:30,980 >> Hivyo kama sisi kutoa dogo maandamano hapa, 82 00:05:30,980 --> 00:05:41,800 tunaweza kukimbia hii with-- kufungua terminal mpya. 83 00:05:41,800 --> 00:05:46,020 Unaweza kuona yote haya mafaili walioko wastaafu yangu. 84 00:05:46,020 --> 00:05:50,960 Na kama mimi kukimbia tu Ruby, akiki intro.rb, ni 85 00:05:50,960 --> 00:05:53,530 unaweka nje tano Habari Mather, Quincy, Carrier. 86 00:05:53,530 --> 00:05:54,410 Adams. 87 00:05:54,410 --> 00:05:59,295 Hivyo hiyo ni yote kuna kwa kutangaza arrays. 88 00:05:59,295 --> 00:06:01,670 Watazamaji: Robert, unaweza kufanya font yako kidogo kubwa? 89 00:06:01,670 --> 00:06:02,461 ROBERT KRABEK: Ndiyo. 90 00:06:02,461 --> 00:06:05,370 91 00:06:05,370 --> 00:06:12,280 Na siwezi kuvuta kwa sababu huwezi kuvuta kwa fonts wastaafu inaonekana. 92 00:06:12,280 --> 00:06:18,790 93 00:06:18,790 --> 00:06:24,630 >> Hivyo hiyo ni jinsi gani magazeti vigezo kwa wastaafu yako. 94 00:06:24,630 --> 00:06:28,820 Unaweza pia kutumia vigezo ndani ya kamba. 95 00:06:28,820 --> 00:06:33,720 Hivyo hivi karibuni katika PHP, unaweza wamejifunza 96 00:06:33,720 --> 00:06:37,340 kuwa kuna kamba nyongeza katika kitabu. 97 00:06:37,340 --> 00:06:43,830 Hivyo kama wewe kuangalia hapa, kama mimi kutangaza vigezo tatu, jina, maktaba, 98 00:06:43,830 --> 00:06:49,700 na lugha, na mimi unaweka, mimi kuandika kamba, hello jina langu ni. 99 00:06:49,700 --> 00:06:54,190 Na kisha badala ya PHP toleo la kamba nyongeza katika kitabu 100 00:06:54,190 --> 00:06:58,960 ambayo inaonekana zaidi kidogo kama hii, una chupa ishara, na kisha 101 00:06:58,960 --> 00:07:01,220 brace curly, na kisha jina la kutofautiana. 102 00:07:01,220 --> 00:07:07,350 Na kwamba ni jinsi wewe d magazeti, wanasema, jina lolote kutofautiana ni. 103 00:07:07,350 --> 00:07:10,140 >> Na kisha unaweza pia concatenate masharti. 104 00:07:10,140 --> 00:07:12,890 Ruby inafanya super rahisi na alama ya kuongeza. 105 00:07:12,890 --> 00:07:16,110 Wewe tu na kamba moja upande wa kushoto pamoja na kutofautiana 106 00:07:16,110 --> 00:07:18,860 au nyingine kamba pamoja na kamba. 107 00:07:18,860 --> 00:07:23,500 Hivyo kama mimi magazeti hii nje, ni lazima tu kusema Hello, jina langu ni Robert. 108 00:07:23,500 --> 00:07:27,340 Mimi nitakuwa kufundisha wewe nokogiri katika Ruby. 109 00:07:27,340 --> 00:07:35,370 >> Na hebu tu kuthibitisha kwamba kwamba ni kweli case-- akiki intro. 110 00:07:35,370 --> 00:07:36,480 Hello, jina langu ni Robert. 111 00:07:36,480 --> 00:07:40,160 Mimi nitakuwa kufundisha wewe nokogiri katika Ruby. 112 00:07:40,160 --> 00:07:45,600 >> Kusonga mbele, kama mwingine kauli, ni tofauti kidogo 113 00:07:45,600 --> 00:07:49,800 kutokana na kile unaweza kutumika kwa kama tumekuwa kufanya kazi katika C. 114 00:07:49,800 --> 00:07:53,200 Huna haja ya mabano. 115 00:07:53,200 --> 00:07:55,220 Huna haja braces curly. 116 00:07:55,220 --> 00:08:00,170 Na badala ya mwingine kama, ni elsif concatenated. 117 00:08:00,170 --> 00:08:07,260 Hivyo katika hapa, kama nimekuwa alitangaza x up hapa, kama tunaweza kuona, x bado ni 5. 118 00:08:07,260 --> 00:08:11,100 Hivyo kama x ni chini ya 3, kutakuwa na kuweka ndogo. 119 00:08:11,100 --> 00:08:14,030 Kama ni chini ya 7, kati, pengine kubwa. 120 00:08:14,030 --> 00:08:17,340 Hivyo 5 ni idadi kati. 121 00:08:17,340 --> 00:08:22,270 Na mimi kuishia hii ya kuzuia wa kanuni na mwisho. 122 00:08:22,270 --> 00:08:24,920 >> Hapa ni yangu kwa kitanzi. 123 00:08:24,920 --> 00:08:28,240 Na syntax hii pia tofauti kidogo. 124 00:08:28,240 --> 00:08:33,500 0 hadi tano tu kimsingi anatangaza arrays ya 0-5. 125 00:08:33,500 --> 00:08:36,120 Hivyo kuna inafaa tano katika safu. 126 00:08:36,120 --> 00:08:40,500 Na kisha kwa kila yanayopangwa kwa kuwa safu, nitakuwa incrementing i. 127 00:08:40,500 --> 00:08:46,080 Hivyo hii lazima magazeti 0-5, au 0-4. 128 00:08:46,080 --> 00:08:49,630 Na hii lazima magazeti kati. 129 00:08:49,630 --> 00:08:51,370 >> Na mimi itabidi kuwaka kupitia. 130 00:08:51,370 --> 00:08:54,466 Nyie watakuwa na fursa kwa kanuni hii baadaye. 131 00:08:54,466 --> 00:08:55,965 Hivyo nyie wanaweza kukimbia wenyewe hii. 132 00:08:55,965 --> 00:09:02,090 133 00:09:02,090 --> 00:09:06,620 >> Hivyo hii ni wakati wako msingi kitanzi. 134 00:09:06,620 --> 00:09:12,230 Hii itakuwa tu kuwa uchapishaji j, incrementing na 1 mpaka sisi hit 5. 135 00:09:12,230 --> 00:09:18,320 >> Super haraka Ruby ajali shaka juu ya jinsi ya kuandika kazi. 136 00:09:18,320 --> 00:09:24,460 Badala ya, kusema, int factorial idadi, sisi tu def. 137 00:09:24,460 --> 00:09:28,450 Na kimsingi uko kufafanua kazi hapa. 138 00:09:28,450 --> 00:09:30,600 Hii ni kwenda kuwa jina la kazi, 139 00:09:30,600 --> 00:09:34,280 na hii ni vigezo yoyote kwamba wanataka kupita katika kazi. 140 00:09:34,280 --> 00:09:36,760 Unaweza kuwa kama kauli ndani ya. 141 00:09:36,760 --> 00:09:38,030 Unaweza kurudi. 142 00:09:38,030 --> 00:09:42,620 Katika kesi hiyo, tuko kufafanua recursively 143 00:09:42,620 --> 00:09:45,000 kutekelezwa factorial kazi. 144 00:09:45,000 --> 00:09:48,660 Hivyo sisi kuwaita tu kazi katika Ruby kama hii. 145 00:09:48,660 --> 00:09:54,700 >> Hivyo kama nimekuwa inavyoelezwa hii, mimi Unaweza kupiga simu factorial, kupita katika 3, 146 00:09:54,700 --> 00:09:59,700 na kisha 3 na idadi kutofautiana kwamba naweza kutumia ndani ya kazi. 147 00:09:59,700 --> 00:10:08,010 Na to_s hii ni kugeuka tu kurudi thamani ya factorial ndani ya kamba. 148 00:10:08,010 --> 00:10:10,760 Vinginevyo hii itatupa kosa kusema oh, mimi 149 00:10:10,760 --> 00:10:13,230 hawezi magazeti kamba kwa sababu kama unakumbuka, 150 00:10:13,230 --> 00:10:18,230 unaweka ni kuweka kamba kwa sababu hii factorial amerejea idadi. 151 00:10:18,230 --> 00:10:21,850 Ili tuweze kubadilisha kwamba kwa kamba kama hizo. 152 00:10:21,850 --> 00:10:27,856 Na kinyume chake, unaweza pia kubadilisha kamba kwa integer na to_i. 153 00:10:27,856 --> 00:10:32,650 >> Hivyo kufanya kila kitu super rahisi, kama mimi tu kutoa maoni hii nje, ila 154 00:10:32,650 --> 00:10:36,250 na kukimbia kazi factorial. 155 00:10:36,250 --> 00:10:39,850 Tunapaswa kuwa na uwezo wa kuona kwamba factorial ya 3 ni 6. 156 00:10:39,850 --> 00:10:42,790 Na huko ndiko kweli. 157 00:10:42,790 --> 00:10:46,160 >> Hivyo hiyo ni ajali yako shaka katika Ruby. 158 00:10:46,160 --> 00:10:53,550 Na sasa unajua Ruby, tunaweza kwenda juu ya kwa Nokogiri msingi kugema kuanzisha. 159 00:10:53,550 --> 00:10:58,190 Kimsingi wote una kufanya ni, katika Ruby, zinahitaji maktaba. 160 00:10:58,190 --> 00:11:04,390 Na kwa madhumuni yetu tutaweza kuwa kutumia maktaba OpenURI kama vile Nokogiri. 161 00:11:04,390 --> 00:11:07,870 Na kisha nini do-- na itabidi kukupa syntax kwa Haya 162 00:11:07,870 --> 00:11:16,010 ni wewe kufungua URL sana kama wewe ungekuwa katika ombi curl, ambayo inasimamia kwa C URL. 163 00:11:16,010 --> 00:11:20,330 >> Hivyo kuchukua URL ya tovuti katika swali. 164 00:11:20,330 --> 00:11:22,030 Wewe kuhifadhi katika kutofautiana. 165 00:11:22,030 --> 00:11:27,400 Na kisha unaweza kutafuta njia ya kuwa kutofautiana kwa HTML kipekee kwa kutumia 166 00:11:27,400 --> 00:11:30,590 Css amri. 167 00:11:30,590 --> 00:11:34,360 Na kisha unaweza pato maudhui ya popote unataka. 168 00:11:34,360 --> 00:11:35,720 Unaweza kuanza katika orodha. 169 00:11:35,720 --> 00:11:42,040 Unaweza pato katika faili, au hata tu magazeti hayo ili screen. 170 00:11:42,040 --> 00:11:47,290 >> Hivyo tutaweza kuonyesha kombe msingi. 171 00:11:47,290 --> 00:11:52,570 Hivyo hapa unaweza kuona tuna wanaohitaji nokogiri, zinahitaji wazi uri. 172 00:11:52,570 --> 00:11:57,150 Kuweka yako ya msingi juu, hebu kuiita hati au doc, 173 00:11:57,150 --> 00:12:07,780 sawa Nokogiri :: HTML wazi, ambayo ni amri zinazotolewa kwetu na OpenURI 174 00:12:07,780 --> 00:12:08,920 maktaba. 175 00:12:08,920 --> 00:12:14,000 Na tutaweza kuwa kutafuta, kwa wale wa wewe ambao wanaweza kuwa wanaishi katika quad, 176 00:12:14,000 --> 00:12:21,270 kwa baiskeli walioko Boston waliotajwa juu ya Boston Craigslist baiskeli sehemu 177 00:12:21,270 --> 00:12:22,020 tovuti. 178 00:12:22,020 --> 00:12:26,460 >> Hivyo kama wewe ni unfamiliar kwa curl, mimi itabidi tu 179 00:12:26,460 --> 00:12:28,930 kuonyesha halisi haraka kile Curl atafanya. 180 00:12:28,930 --> 00:12:38,350 Kama nilitaka kupata yote ya URL kutoka Craigslist tovuti, kama mimi aina curl, 181 00:12:38,350 --> 00:12:44,950 ni madampo tu wote wa URL kutoka Craigslist baiskeli tovuti 182 00:12:44,950 --> 00:12:46,720 kwenye terminal yangu. 183 00:12:46,720 --> 00:12:49,130 Hiyo si hasa muhimu kwa sababu mimi si 184 00:12:49,130 --> 00:12:53,330 wanataka manually kwenda kwa njia na kupata kitu mimi nina kuangalia kwa. 185 00:12:53,330 --> 00:13:01,590 Lakini tu hivyo unaweza kuona kwamba mimi nina kweli 186 00:13:01,590 --> 00:13:13,966 kutumia kanuni sahihi, kama ukiangalia katika URL kwa Craigslist katika bikes-- 187 00:13:13,966 --> 00:13:17,460 kwa sababu fulani ni halikupatikana. 188 00:13:17,460 --> 00:13:20,340 Kama ukiangalia ukurasa huu na ukiangalia URL, 189 00:13:20,340 --> 00:13:23,970 hii inapaswa kuwa sawa na ombi Curl kwamba mimi tu kutuma. 190 00:13:23,970 --> 00:13:27,700 Na hakika, kwamba ni nini kuwa kuhifadhiwa katika doc kutofautiana. 191 00:13:27,700 --> 00:13:36,540 >> Hivyo wakati wewe kwenda nyuma ya kificho yetu, sisi Basi unaweza kufanya kazi juu ya hili kutofautiana doc 192 00:13:36,540 --> 00:13:40,660 kwa kutumia css. 193 00:13:40,660 --> 00:13:49,240 Hivyo kusema nilitaka kupata yote ya vitambulisho kwamba ni span.txt, 194 00:13:49,240 --> 00:13:51,740 na vitambulisho wote a ndani ya tag kwamba. 195 00:13:51,740 --> 00:13:56,150 Na kwa nini huenda tunataka kufanya hivyo, mimi kusikia wewe kilio? 196 00:13:56,150 --> 00:14:02,920 >> Kama sisi Kukagua kipengele, inakupa kuvunjika kwa jinsi URL ni muundo. 197 00:14:02,920 --> 00:14:06,200 Kama mimi kitabu chini kupitia hapa, unaweza kuona 198 00:14:06,200 --> 00:14:08,770 nini kila moja ya haya tofauti mambo inawakilisha. 199 00:14:08,770 --> 00:14:13,410 Hivyo labda nataka kupata hili kipengele fulani. 200 00:14:13,410 --> 00:14:16,820 Hivyo mimi nina kutumia Chrome developer zana Kukagua kipengele. 201 00:14:16,820 --> 00:14:22,970 Naona chini hapa kwamba hii ni tag ndani ya muda 202 00:14:22,970 --> 00:14:26,230 kumtambulisha na darasa la txt. 203 00:14:26,230 --> 00:14:29,610 >> Hivyo hii anapata yetu operesheni ya kwanza ambayo 204 00:14:29,610 --> 00:14:37,330 ni doc.css span, ambayo ni tag kwamba Mimi nina kuangalia kwa ndani URL hii yote. 205 00:14:37,330 --> 00:14:43,650 Na kisha txt kazi kiasi kama CSS anafanya wakati wewe ni kuandika tu CSS 206 00:14:43,650 --> 00:14:49,630 katika mafaili yako HTML na kubainisha darasani. 207 00:14:49,630 --> 00:14:57,980 Hivyo operator hasa hili itakuwa kutaja tag span na darasa la txt. 208 00:14:57,980 --> 00:15:02,800 Na kisha kama mimi kuondoka nafasi, hii kisha kwenda ndani ya tag kwamba 209 00:15:02,800 --> 00:15:05,170 na kisha kupata tag ndani ya hiyo. 210 00:15:05,170 --> 00:15:10,750 >> Hivyo kama mimi tu ya kuweka hii kwa wastaafu, mimi lazima 211 00:15:10,750 --> 00:15:21,630 na uwezo wa kuona kila kitu kimsingi kuwa ni ndani ya muda huu wa tabaka la txt. 212 00:15:21,630 --> 00:15:22,890 Hivyo tutaweza kutoa kwamba kwenda. 213 00:15:22,890 --> 00:15:25,870 214 00:15:25,870 --> 00:15:27,756 akiki craigslist-kombe. 215 00:15:27,756 --> 00:15:31,850 216 00:15:31,850 --> 00:15:37,250 Na hakika kwamba inatupa yote haya vitambulisho ya nyimbo mbalimbali ambazo 217 00:15:37,250 --> 00:15:40,400 ni juu ya Craigslist ukurasa. 218 00:15:40,400 --> 00:15:45,670 >> Hivyo kama sisi kurudi nyuma, tunaweza kugeuka hii katika kitu muhimu zaidi kidogo. 219 00:15:45,670 --> 00:15:51,050 Labda tunataka tu viungo. 220 00:15:51,050 --> 00:15:58,790 Kwa sababu ndani ya tag hii, mimi itabidi pia kuwa hyperlink ya njia 221 00:15:58,790 --> 00:16:00,590 kwamba ukurasa huu inakwenda. 222 00:16:00,590 --> 00:16:09,100 Hivyo kama ukiangalia kanuni hii hapa, nini la kufanya ni badala ya css, 223 00:16:09,100 --> 00:16:12,380 Siwezi kwenda at_css. 224 00:16:12,380 --> 00:16:16,820 Na hii itakuwa tu kupata kwanza kipengele cha mambo hayo yote. 225 00:16:16,820 --> 00:16:20,890 Hivyo kama ningekuwa kufanya hivyo hadi katika kificho I just awali alionyesha, 226 00:16:20,890 --> 00:16:23,800 badala ya kurudi zote ya hii, ingekuwa tu 227 00:16:23,800 --> 00:16:26,850 kurudi kwanza mmoja wa wale. 228 00:16:26,850 --> 00:16:31,310 Hivyo hiyo ni jinsi at_css operator kazi. 229 00:16:31,310 --> 00:16:39,460 >> Hivyo tunataka kuhifadhi njia zote za kwanza tag. 230 00:16:39,460 --> 00:16:47,430 Na kwa sababu itatupa a-- hivyo sisi bado ni kwenda kutumia css. 231 00:16:47,430 --> 00:16:53,830 Lakini kwa sababu hii ni kwenda kutoa sisi nyuma safu nzima ya vitambulisho, 232 00:16:53,830 --> 00:16:55,710 tunaenda kupata kitu cha kwanza. 233 00:16:55,710 --> 00:17:01,700 Hivyo hii ni njia nyingine ambayo unaweza kupata kipengele fulani yoyote kama wewe 234 00:17:01,700 --> 00:17:04,810 safu ya vipengele kuwa ni kurudi, 235 00:17:04,810 --> 00:17:11,930 kwa sababu unaweza kutibu kitu chochote ambacho anarudi css kama safu, kimsingi. 236 00:17:11,930 --> 00:17:16,880 Na kisha tunakwenda kupata HyperText kumbukumbu sifa ya hii. 237 00:17:16,880 --> 00:17:24,810 >> Hivyo kama wewe kuangalia, kama wewe inaonekana kweli karibu hapa, 238 00:17:24,810 --> 00:17:28,270 kama wewe tu kimsingi kuangalia bar URL, 239 00:17:28,270 --> 00:17:33,880 hii ni njia ambayo wewe ni kwenda kuwa kugema. 240 00:17:33,880 --> 00:17:41,565 Hivyo kama sisi kukimbia tu hii tena, na kuhakikisha tumekuwa kuokolewa nayo. 241 00:17:41,565 --> 00:17:47,040 242 00:17:47,040 --> 00:17:48,300 Unaweza kuangalia nyumbani. 243 00:17:48,300 --> 00:17:51,430 Hii kwa kweli mechi na kiungo huu. 244 00:17:51,430 --> 00:17:55,950 >> Hivyo kwa nini huenda tunataka kutumia hii? 245 00:17:55,950 --> 00:17:57,870 Kama unataka scrape ukurasa na ina 246 00:17:57,870 --> 00:18:00,270 ukurasa wa viungo kama Craigslist gani, wewe 247 00:18:00,270 --> 00:18:03,210 kutaka kwenda basi ndani ya kila mmoja viungo wale 248 00:18:03,210 --> 00:18:05,120 na kisha scrape maudhui ya kwamba, ambayo 249 00:18:05,120 --> 00:18:08,520 ni nini hasa sisi ni kwenda kufanya. 250 00:18:08,520 --> 00:18:11,660 >> Hivyo mara moja una njia kama kutofautiana, mimi tena kweli 251 00:18:11,660 --> 00:18:13,200 huduma kuhusu uchapishaji nje. 252 00:18:13,200 --> 00:18:15,420 Mimi tu haja ya kuhifadhi kama kutofautiana. 253 00:18:15,420 --> 00:18:20,980 Na kisha mimi wanaweza kupata mwingine ukurasa njia hiyo mimi kupata 254 00:18:20,980 --> 00:18:22,260 doc katika nafasi ya kwanza. 255 00:18:22,260 --> 00:18:25,920 Ila kwa URL, tunakwenda kutumia kamba nyongeza katika kitabu 256 00:18:25,920 --> 00:18:29,180 kama mimi alikuwa kuelezea katika Ruby mapema juu ya append 257 00:18:29,180 --> 00:18:32,010 njia ya mwisho wa mizizi. 258 00:18:32,010 --> 00:18:38,970 >> Kwa hiyo kile hii ni kwenda kufanya ni hii ni kwenda kuweka kwenye njia ya 259 00:18:38,970 --> 00:18:42,360 kwamba mimi scraped awali na kisha kurejea kuwa 260 00:18:42,360 --> 00:18:49,580 ndani ya bidhaa mpya, chochote unataka piga it-- first_listing, kwa mfano. 261 00:18:49,580 --> 00:18:52,900 Lakini mimi nina ataondoka ni juu ya bidhaa kwa sasa, 262 00:18:52,900 --> 00:18:55,420 kwa sababu hicho ndicho mimi nina kutumia hapa. 263 00:18:55,420 --> 00:19:02,900 >> Hivyo kusema nilitaka kupata maelezo ya posting kwanza katika Craigslist. 264 00:19:02,900 --> 00:19:04,740 Hivyo napenda kwenda chini hapa. 265 00:19:04,740 --> 00:19:10,660 Napenda click kwenye Kukagua kipengele tena, kwa sababu hii ni maelezo. 266 00:19:10,660 --> 00:19:14,350 Ningependa kwenda chini hapa na kuona kama naweza kupata jinsi nipate 267 00:19:14,350 --> 00:19:16,530 kuwa na uwezo wa kutafuta tag hii ya kipekee. 268 00:19:16,530 --> 00:19:19,530 Na katika kesi hii, ina ID, ambayo inaongoza sisi 269 00:19:19,530 --> 00:19:26,810 kwa njia yetu ya pili ya kwa ajili ya kutafuta vitambulisho, ambayo ni pamoja na alama. 270 00:19:26,810 --> 00:19:30,670 >> Hivyo kwa ajili ya madarasa, unaweza kutumia nukta operator. 271 00:19:30,670 --> 00:19:38,610 Hivyo txt ni kubainisha kundi la txt, ambapo hash bayana kitambulisho. 272 00:19:38,610 --> 00:19:43,720 Hivyo katika kesi hii, tag ni sehemu, na ID ni postingbody. 273 00:19:43,720 --> 00:19:47,780 >> Hivyo hii huenda na anaona first-- kwa sababu tuko 274 00:19:47,780 --> 00:19:51,200 kutumia at_css-- hili linakwenda na anaona kipengele kwanza kwamba 275 00:19:51,200 --> 00:19:57,180 anakuja na tag ya sehemu na ID ya postingbody. 276 00:19:57,180 --> 00:20:02,636 Na kisha unaweza kupata kipengele asilia ya bidhaa kwamba akarudi na .text. 277 00:20:02,636 --> 00:20:06,230 Na kisha tunaweza kuhifadhi kwamba katika maelezo. 278 00:20:06,230 --> 00:20:09,370 >> Hivyo sasa kwamba tuna maelezo kutofautiana, 279 00:20:09,370 --> 00:20:14,850 tuweze kuwa na uwezo wa kufanya, kusema, faili I / O. Hivyo faili I / O katika Ruby 280 00:20:14,850 --> 00:20:21,310 ni sawa na faili I / O katika C ambapo sisi kufungua faili. 281 00:20:21,310 --> 00:20:23,260 Tupate kuandika na hiyo. 282 00:20:23,260 --> 00:20:25,060 Na kisha tutaweza karibu faili hilo. 283 00:20:25,060 --> 00:20:29,660 >> Hivyo hapa, sisi ni kumtaja tu faili, baadhi kutofautiana holela. 284 00:20:29,660 --> 00:20:33,120 Tunaweza pia kuweka tu hii hapa. 285 00:20:33,120 --> 00:20:39,630 Tuna kutofautiana kwamba sisi ni hifadhi faili wazi kama kwa File.open. 286 00:20:39,630 --> 00:20:46,370 Na sisi ni kuandika na faili hili, hivyo sisi kufungua hiyo na mwendeshaji w. 287 00:20:46,370 --> 00:20:54,280 Na kisha sisi kuweka kamba katika faili na .puts operator. 288 00:20:54,280 --> 00:20:58,310 Na kisha sisi kuweka kutofautiana kwamba sisi unataka kuandika na faili ndani ya hiyo. 289 00:20:58,310 --> 00:21:00,200 Na kisha sisi tu karibu faili. 290 00:21:00,200 --> 00:21:04,000 >> Hivyo kama sisi kwenda mbele na kukimbia hii, hii inapaswa kuzalisha hati 291 00:21:04,000 --> 00:21:10,840 na description.txt ambayo itakuwa na maelezo haya ndani yake. 292 00:21:10,840 --> 00:21:14,015 Hivyo kama mimi kukimbia it-- hakuna. 293 00:21:14,015 --> 00:21:17,520 294 00:21:17,520 --> 00:21:23,330 Ni zinazozalishwa faili maandishi na, hopefully, kitu kimoja. 295 00:21:23,330 --> 00:21:25,850 296 00:21:25,850 --> 00:21:33,290 Kwa hiyo, kuna uwezekano kuwa posting mpya hiyo ni kuja wakati nimekuwa kuzungumza. 297 00:21:33,290 --> 00:21:36,580 Na hakika inaonekana kama kumekuwa na. 298 00:21:36,580 --> 00:21:43,380 Hivyo kama sisi kwenda baiskeli hii classic, 1962-1966, ambayo inaonekana kwa mechi. 299 00:21:43,380 --> 00:21:45,620 Na kuna kwenda. 300 00:21:45,620 --> 00:21:51,250 >> Hivyo hiyo ni ya msingi zaidi utendaji wa kugema. 301 00:21:51,250 --> 00:21:57,510 Tunaweza kuwa na badala ya kuandika tu kwa faili hili, 302 00:21:57,510 --> 00:21:59,930 tunaweza kuongeza mambo kwa safu. 303 00:21:59,930 --> 00:22:03,770 Hivyo kama mimi kutangaza arrays tatu, cheo, bei, na maelezo. 304 00:22:03,770 --> 00:22:06,310 305 00:22:06,310 --> 00:22:13,790 Na sisi ni kazi ya doc bidhaa sasa. 306 00:22:13,790 --> 00:22:16,940 Tunaweza kwenda kwa njia na kupata yote ya span.txt. 307 00:22:16,940 --> 00:22:21,710 Na kumbuka, hii anarudi safu ya vitu vyote anaona. 308 00:22:21,710 --> 00:22:27,300 Na kisha katika Ruby, unaweza kutumia tu .Kila iterate kupitia kila kitu 309 00:22:27,300 --> 00:22:28,410 ya safu. 310 00:22:28,410 --> 00:22:31,330 Na kisha kwa kila kitu, Mimi tu kwenda kuiita 311 00:22:31,330 --> 00:22:34,620 kiungo, kwa sababu hiyo kimsingi ni nini. 312 00:22:34,620 --> 00:22:46,830 >> Hivyo kama mimi kuweka kila link.css nukta a.hdrlnk, hii ni kweli kwenda kiungo 313 00:22:46,830 --> 00:22:58,280 na kutafuta ndani ya kwamba kiungo mwingine HTML kipengele na darasa sambamba. 314 00:22:58,280 --> 00:23:04,990 Hivyo kama tunakumbuka hii ilikuwa, span.txt, 315 00:23:04,990 --> 00:23:13,160 unaweza kuona- napenda tu kwenda nyuma quick-- halisi ndani ya span.txt 316 00:23:13,160 --> 00:23:17,490 tuna mengi ya madarasa mengine. 317 00:23:17,490 --> 00:23:27,180 Hivyo ndani ya span.txt, sisi ni kuangalia kwa tag na darasa hdrlnk. 318 00:23:27,180 --> 00:23:29,890 Hivyo basi mimi tu kupata kwamba kwa nyie halisi haraka. 319 00:23:29,890 --> 00:23:37,390 320 00:23:37,390 --> 00:23:42,850 >> Hivyo unaweza kuona hapa, hii ni tag hiyo ni ndani ya muda wa darasani txt 321 00:23:42,850 --> 00:23:44,920 ambayo ina tabaka la hdrlnk. 322 00:23:44,920 --> 00:23:47,610 Na kwamba ni kweli nini sisi ni kujaribu kupata. 323 00:23:47,610 --> 00:23:54,680 >> Hivyo sisi ni sasa kujaribu kuhifadhi wote ya viungo wale ndani ya kichwa. 324 00:23:54,680 --> 00:23:59,545 Na kisha tunakwenda magazeti nje kila moja ya viungo wale. 325 00:23:59,545 --> 00:24:00,360 Hakuna, pole. 326 00:24:00,360 --> 00:24:04,530 Tunakwenda magazeti nje bei ya kila moja ya hizo. 327 00:24:04,530 --> 00:24:09,350 Basi hebu kukimbia kweli hii haraka na kuona kile yake. 328 00:24:09,350 --> 00:24:14,680 329 00:24:14,680 --> 00:24:17,720 >> Hivyo hii alikwenda tu kimsingi njia ya kila ya viungo 330 00:24:17,720 --> 00:24:27,310 kwa upande wake, kupatikana tag katika swali, na kisha kujiondoa bei. 331 00:24:27,310 --> 00:24:33,910 Na alifanya hivyo kwa sababu baada ya una kila kitu katika cheo, 332 00:24:33,910 --> 00:24:37,260 tumekuwa tu kuhifadhiwa jina huko. 333 00:24:37,260 --> 00:24:40,180 Tumekuwa tu kuhifadhiwa kiungo ndani ya kichwa safu. 334 00:24:40,180 --> 00:24:47,720 Na katika hii kwa kitanzi operesheni, ambapo badala ya kwenda a.hdrlnk, 335 00:24:47,720 --> 00:24:50,490 sisi ni kuangalia kwa span.price. 336 00:24:50,490 --> 00:24:56,500 Hivyo kama naweza tu kweli haraka kupata bei, kama wewe kukagua kipengele, 337 00:24:56,500 --> 00:25:00,610 utaona kuwa ni span na darasa la bei. 338 00:25:00,610 --> 00:25:04,670 Na hiyo ndiyo kimsingi jinsi sisi ni kupata bei huko. 339 00:25:04,670 --> 00:25:10,040 >> Hivyo hiyo ni kweli kesi ya msingi ya kugema. 340 00:25:10,040 --> 00:25:13,550 Hiyo ni jinsi ya kupata zote vipengele kwenye ukurasa 341 00:25:13,550 --> 00:25:16,510 kwamba, kusema, tayari kujua URL ya. 342 00:25:16,510 --> 00:25:21,050 >> Hivyo kama tunataka kupata kidogo zaidi kwa kina, 343 00:25:21,050 --> 00:25:23,950 tunaweza scrape kurasa ndani ya kurasa. 344 00:25:23,950 --> 00:25:28,480 Na kwa mfano huu, mimi itabidi kuwa outputting kwa jalada la CSV. 345 00:25:28,480 --> 00:25:39,510 Hivyo mimi nina wanaohitaji csv hapa kwa sababu Ruby hana, ndani ya yenyewe, 346 00:25:39,510 --> 00:25:42,350 na utendaji pato tu files CSV. 347 00:25:42,350 --> 00:25:45,030 Hivyo hiyo ni super rahisi. 348 00:25:45,030 --> 00:25:48,710 Napenda tu kwenda ijayo. 349 00:25:48,710 --> 00:25:51,640 350 00:25:51,640 --> 00:25:57,170 Sisi kufunikwa faili I / O. Hivyo hii ni sawa na jinsi ilivyo katika C. 351 00:25:57,170 --> 00:26:00,870 Na kabla ya sisi kuendelea na Kimono, Mimi itabidi kuonyesha jinsi kweli haraka 352 00:26:00,870 --> 00:26:02,790 scrape maeneo ndani ya vituko. 353 00:26:02,790 --> 00:26:10,040 >> Kwa hiyo sisi tayari kujifunza jinsi kutangaza arrays katika Ruby. 354 00:26:10,040 --> 00:26:13,280 Hivyo mimi nina kutangaza tu rundo la arrays holela 355 00:26:13,280 --> 00:26:16,310 kwamba mimi itakuwa kuhifadhi data ndani ya. 356 00:26:16,310 --> 00:26:20,680 doc ni kazi kwa njia hiyo hiyo kama ilivyokuwa katika faili uliopita. 357 00:26:20,680 --> 00:26:23,580 Tunakwenda katika, kutafuta kila mmoja wa span.txt. 358 00:26:23,580 --> 00:26:25,040 Sisi tayari kujua kwamba. 359 00:26:25,040 --> 00:26:32,130 Hiyo ni chombo ambamo kila kiungo ina yote ya data kwamba tunataka. 360 00:26:32,130 --> 00:26:40,800 >> Hivyo hapa sisi ni kufanya ni kwa kila kiungo wa darasa span txt, tunakwenda katika 361 00:26:40,800 --> 00:26:45,720 na sisi ni kutafuta tag, kutafuta kitu cha kwanza ya kwamba. 362 00:26:45,720 --> 00:26:49,937 Kumbuka, css anarudi safu, hivyo huwezi kupata hiyo kama ni. 363 00:26:49,937 --> 00:26:51,520 Tunakwenda kupata kitu cha kwanza. 364 00:26:51,520 --> 00:26:56,430 Hata kama ni safu ya moja bidhaa, una kutumia syntax hii, 365 00:26:56,430 --> 00:26:58,800 na kisha kujiondoa sifa href. 366 00:26:58,800 --> 00:27:01,800 >> Hivyo sisi alifanya hivyo mapema. 367 00:27:01,800 --> 00:27:04,440 Hivyo hii inapaswa kuangalia ukoo. 368 00:27:04,440 --> 00:27:14,330 Na hivyo sasa tuna safu aitwaye njia ya yote ya viungo yetu 369 00:27:14,330 --> 00:27:16,590 kwamba tunakwenda wanataka kutumia. 370 00:27:16,590 --> 00:27:21,350 Hivyo kama tuna safu hii ya yote ya njia ya kuwa tunataka kutumia, 371 00:27:21,350 --> 00:27:26,840 sisi kisha unaweza kuunda bidhaa kwa kila ya kurasa hizo, wakati sisi kufungua ukurasa huo. 372 00:27:26,840 --> 00:27:31,150 Hivyo kama pia tuliona juu ya syntax kabla, ambapo 373 00:27:31,150 --> 00:27:37,450 kufanya kamba nyongeza katika kitabu na njia hapa, hivyo syntax ni tu kwa ajili ya njia. 374 00:27:37,450 --> 00:27:41,450 Na mimi naweza kutaja hii kutofautiana jina yoyote holela. 375 00:27:41,450 --> 00:27:43,070 >> Hii ni moja muhimu. 376 00:27:43,070 --> 00:27:46,650 Hii ni safu kwamba utasikia kuwa kupata kila kipengele. 377 00:27:46,650 --> 00:27:52,400 Lakini unaposema kwa njia katika njia, hii ina maana kwa kila kipengele katika njia, 378 00:27:52,400 --> 00:27:55,150 kuiita njia, na kutumia huo. 379 00:27:55,150 --> 00:27:59,266 Hii ni kimsingi kama wakati kufanya kwa kitanzi na matumizi int i. 380 00:27:59,266 --> 00:28:04,000 Hivyo unaweza kutibu njia kama kutofautiana hiyo incrementing. 381 00:28:04,000 --> 00:28:07,820 >> Na kisha kwa kila moja ya hizo, kwenda katika kila moja ya viungo wale. 382 00:28:07,820 --> 00:28:11,710 Kwa sababu sisi ni hifadhi hiyo katika bidhaa ukurasa, hivyo sisi ni kujenga ukurasa mpya kila wakati 383 00:28:11,710 --> 00:28:13,330 sisi kupata huduma hiyo. 384 00:28:13,330 --> 00:28:20,560 Na kisha ndani ya ukurasa huo mpya, kupata span.postingtitletext, span.price, 385 00:28:20,560 --> 00:28:22,240 na kisha sehemu # postingbody. 386 00:28:22,240 --> 00:28:28,430 Sisi tayari mifuniko sehemu postingbody # wakati sisi inaonekana katika maelezo. 387 00:28:28,430 --> 00:28:34,890 >> Ili tuweze kwenda kuona katika Craigslist baada, kama wewe ni kuangalia tu kwa jina, 388 00:28:34,890 --> 00:28:38,810 unaweza kuona ni hapa juu, span postingtitletext. 389 00:28:38,810 --> 00:28:41,390 Na hii ndiyo maana ni huko. 390 00:28:41,390 --> 00:28:49,120 Na kisha kwa bei, unaweza kupata hiyo kwa span daraja ya bei. 391 00:28:49,120 --> 00:28:54,480 >> Hali kadhalika na sisi labda anaweza unataka kuhifadhi URL. 392 00:28:54,480 --> 00:28:58,580 Hivyo tutaweza kukimbia tu hii tena, kuhifadhi katika safu, 393 00:28:58,580 --> 00:29:01,150 kwa sababu kama wewe ni kuangalia juu ya Craigslist, uko 394 00:29:01,150 --> 00:29:05,290 pengine atataka njia ya, ikiwa unaweza kuona kitu kuwa na maslahi yenu, 395 00:29:05,290 --> 00:29:06,620 kurudi nyuma na tovuti hiyo. 396 00:29:06,620 --> 00:29:10,480 Hivyo wewe tu unataka kuhifadhi URL kwa ajili ya kumbukumbu. 397 00:29:10,480 --> 00:29:13,840 398 00:29:13,840 --> 00:29:19,630 >> Hii ni kimsingi syntax nyingine kwa ajili ya kitanzi. 399 00:29:19,630 --> 00:29:26,360 Mimi nilikuwa tu kufanya paths.each badala ya kwa njia katika njia na ripoti. 400 00:29:26,360 --> 00:29:31,280 Na syntax hii ni Ruby for-- njia ndivyo tulivyofanya hapa juu, 401 00:29:31,280 --> 00:29:33,920 kutangaza variable kwa kila kitu. 402 00:29:33,920 --> 00:29:38,540 Na ripoti kutenda kama i katika C kwa mizunguko. 403 00:29:38,540 --> 00:29:41,280 Hivyo unaweza kuweka wimbo yale ripoti ni. 404 00:29:41,280 --> 00:29:45,200 >> Hivyo hapa ni kitu kidogo rahisi 405 00:29:45,200 --> 00:29:46,950 kwa wakati wewe ni mbio kombe. 406 00:29:46,950 --> 00:29:50,580 Kama wewe ni kugema mamia ya kurasa, kuhakikisha kuwa siyo kunyongwa, 407 00:29:50,580 --> 00:29:53,320 itakuwa pato tu, Mimi kupata ukurasa huu, 408 00:29:53,320 --> 00:29:55,960 na kuhakikisha kuwa ni bado yanaendelea. 409 00:29:55,960 --> 00:29:59,250 Lakini kwa madhumuni yetu, kwa sababu kuna mia vitu, 410 00:29:59,250 --> 00:30:08,000 Mimi nina kwenda kupata tatu tu kati yao hivyo kwamba hatuna kukimbia nje ya muda hapa. 411 00:30:08,000 --> 00:30:13,040 >> Lakini kabla ya sisi kupata kwamba, Mimi tu kwenda kuonyesha kweli haraka, 412 00:30:13,040 --> 00:30:16,940 Mimi nitakuwa outputting cheo, bei, maelezo, na URL 413 00:30:16,940 --> 00:30:19,600 ya kila moja ya viungo kwamba nimepata scraped. 414 00:30:19,600 --> 00:30:23,720 Na kisha hii ni syntax kwa ajili ya maktaba CSV. 415 00:30:23,720 --> 00:30:25,240 Kufungua CSV. 416 00:30:25,240 --> 00:30:27,070 Hii ni nini mimi kwenda kwa simu yake. 417 00:30:27,070 --> 00:30:29,430 Kufungua na kuandika do. 418 00:30:29,430 --> 00:30:33,830 Na kisha CSV itakuwa kwamba faili wewe ni inputting kila kitu ndani. 419 00:30:33,830 --> 00:30:37,800 Hii ni sanity hundi kwa mimi kujua kwamba ni mbio. 420 00:30:37,800 --> 00:30:41,240 Na hii ni sanity yangu kuangalia kujua kwamba ni kukamilika. 421 00:30:41,240 --> 00:30:46,670 Hivyo mimi nina kuweka kichwa katika mstari katika CSV, bei, url, maelezo, 422 00:30:46,670 --> 00:30:49,420 zote katika safu katika CSV. 423 00:30:49,420 --> 00:30:53,410 >> Hivyo kama sisi kwenda na kukimbia now-- hii na mimi tu 424 00:30:53,410 --> 00:31:04,710 kuhakikisha kwamba nimepata kuokolewa it-- badala ya tu outputting kwa wastaafu, 425 00:31:04,710 --> 00:31:09,750 tunapaswa kuwa CSV faili kwamba ni zinazozalishwa. 426 00:31:09,750 --> 00:31:13,500 Hivyo hapa tunaweza kuona CSV faili kwamba imekuwa zinazozalishwa. 427 00:31:13,500 --> 00:31:19,330 Hii ni pato la scape kwamba mimi tu mbio. 428 00:31:19,330 --> 00:31:23,030 Kama unaweza kuona hapa, kupata ukurasa 0, 1, 2, 3. 429 00:31:23,030 --> 00:31:27,400 Hizi ni vyeo, bei, maelezo. 430 00:31:27,400 --> 00:31:31,710 Na kama sisi kuangalia CSV hii faili kwamba tumekuwa yanayotokana, 431 00:31:31,710 --> 00:31:35,700 unaweza kuona wake outputted hapa. 432 00:31:35,700 --> 00:31:40,350 Hii si Excel, hivyo si mpangilio katika safu na nguzo. 433 00:31:40,350 --> 00:31:45,140 Lakini unaweza kufikiria jinsi inaweza kuwa mpangilio. 434 00:31:45,140 --> 00:31:47,740 >> CSV anasimama kwa comma kutengwa maadili. 435 00:31:47,740 --> 00:31:50,090 Hivyo unaweza kufikiria hii inaweza kuwa mfululizo. 436 00:31:50,090 --> 00:31:54,700 Na kila comma ingekuwa zinaonyesha safu tofauti. 437 00:31:54,700 --> 00:32:00,010 Tu ujumbe wa caution-- wakati mwingine uko 438 00:32:00,010 --> 00:32:02,260 kugema mambo na mengi ya koma. 439 00:32:02,260 --> 00:32:05,100 Hivyo kama wewe ni outputting kwa jalada la CSV, 440 00:32:05,100 --> 00:32:10,340 ni nguvu si pato njia unaweza kufikiri. 441 00:32:10,340 --> 00:32:16,770 >> Hivyo hiyo ni kimsingi zote hapo ni kugema HTML msingi 442 00:32:16,770 --> 00:32:20,110 kurasa na Nokogiri. 443 00:32:20,110 --> 00:32:26,000 >> Hivyo biashara ya utu ubunifu kama ina kuja 444 00:32:26,000 --> 00:32:33,220 na automatiska zaidi na GUI toleo la, angalau kidogo imara 445 00:32:33,220 --> 00:32:35,540 toleo la kugema tovuti mbalimbali. 446 00:32:35,540 --> 00:32:39,060 Na kwa madhumuni yetu Mimi itakuwa kuonyesha 447 00:32:39,060 --> 00:32:42,920 ugani Chrome aitwaye Kimono. 448 00:32:42,920 --> 00:32:46,690 Na wote una kufanya ni wewe navigate kwa ukurasa kwamba unataka scrape. 449 00:32:46,690 --> 00:32:48,590 Wewe click kwenye uwanja wa maslahi. 450 00:32:48,590 --> 00:32:51,510 Wewe calibrate mashamba, kwa sababu itakuwa moja kwa moja 451 00:32:51,510 --> 00:32:54,360 kuchunguza nini anadhani unataka kuwa kugema, 452 00:32:54,360 --> 00:32:56,280 na kisha tu kujenga API. 453 00:32:56,280 --> 00:33:03,700 >> Hivyo kama sisi walikuwa wa kuonyesha kwenye Craigslist, ni kweli bila kazi. 454 00:33:03,700 --> 00:33:08,290 Na hii ni nini nilikuwa kurejea akisema kuhusu hilo kutokuwa na kama imara. 455 00:33:08,290 --> 00:33:10,320 Ina shida kujenga API. 456 00:33:10,320 --> 00:33:13,400 Lakini kama maandamano ya nini angefanya, 457 00:33:13,400 --> 00:33:17,460 kama kufunga ugani Chrome, wote kufanya ni bonyeza juu yake. 458 00:33:17,460 --> 00:33:21,750 Ni Kimonofies ukurasa, na kisha click kwenye jambo unataka script. 459 00:33:21,750 --> 00:33:24,480 >> Hivyo kama ningekuwa click kwenye kwamba, ingekuwa kuonyesha 460 00:33:24,480 --> 00:33:28,130 nini anadhani nataka kuwa kugema mbali ukurasa huo. 461 00:33:28,130 --> 00:33:33,660 Hivyo labda mimi wito nyimbo hii. 462 00:33:33,660 --> 00:33:36,430 Hivi ndivyo vitu vingi I have kuchaguliwa. 463 00:33:36,430 --> 00:33:43,810 Na naweza tu kuthibitisha au kukataa baadhi ya wengine nyimbo alipendekeza 464 00:33:43,810 --> 00:33:49,600 kupata na kuongeza nini itakuwa scraped. 465 00:33:49,600 --> 00:33:52,330 >> Hivyo sasa tunaweza kuona kuna vitu mia kuchaguliwa. 466 00:33:52,330 --> 00:33:58,060 Kama mimi nataka kuwa na uwanja mwingine kwamba mimi pia scrape ambayo ni kuhusiana na hii, 467 00:33:58,060 --> 00:34:02,540 kusema nataka scrape bei pamoja, basi siwezi kufanya hivyo. 468 00:34:02,540 --> 00:34:06,190 469 00:34:06,190 --> 00:34:11,550 >> Hivyo hapa ni uthibitisho wa jinsi ni kiasi kidogo imara, kwa sababu sasa ni 470 00:34:11,550 --> 00:34:15,050 kuokota mji badala ya tu bei kuwa nataka. 471 00:34:15,050 --> 00:34:16,989 Na sasa ni ilichukua mambo 200. 472 00:34:16,989 --> 00:34:19,880 Unaweza kwenda nyuma na kufuta. 473 00:34:19,880 --> 00:34:21,449 Unaweza kujaribu tena. 474 00:34:21,449 --> 00:34:24,250 Lakini hakuna dhamana. 475 00:34:24,250 --> 00:34:29,909 Hii ni jinsi hii matendo wakati mwingine. 476 00:34:29,909 --> 00:34:32,969 Kama unaweza kuona hapa, sasa inasema 96 hapa. 477 00:34:32,969 --> 00:34:37,000 Ni ilichukua zaidi ya viungo kwamba unataka scrape, lakini si 478 00:34:37,000 --> 00:34:39,280 lazima wote. 479 00:34:39,280 --> 00:34:43,909 >> Chombo mwingine muhimu wa Kimono ingawa ni unaweza kwenda juu Makala 480 00:34:43,909 --> 00:34:47,980 hapa, kwenda juu, na itakuwa kuonyesha 481 00:34:47,980 --> 00:34:53,139 kuvunjika kwa kipekee njia ya kupata HTML 482 00:34:53,139 --> 00:34:54,909 vitambulisho kwamba unataka scrape. 483 00:34:54,909 --> 00:35:01,450 Hivyo kwa ajili ya nyimbo, kama ukiangalia hapa, kama wewe kupata div p span span, 484 00:35:01,450 --> 00:35:06,030 unaweza kweli kutumia tu huu katika Nokogiri kanuni yako, 485 00:35:06,030 --> 00:35:10,780 ambapo kabla tulikuwa span.txt kupata kila moja ya nyimbo. 486 00:35:10,780 --> 00:35:13,270 Kama mimi nataka tu maandishi ndani ya nyimbo, 487 00:35:13,270 --> 00:35:18,950 Mimi naweza pembejeo div nafasi p nafasi span nafasi nafasi span a, 488 00:35:18,950 --> 00:35:21,570 na ingekuwa kufikia athari sawa. 489 00:35:21,570 --> 00:35:26,320 Na kwa wale ambao ni nia katika kutumia maneno ya mara kwa mara, 490 00:35:26,320 --> 00:35:31,670 hutokea kwa kawaida pia kukupa kujieleza aina ya kamba pembejeo 491 00:35:31,670 --> 00:35:34,900 kupata mambo wewe ni kujaribu kupata. 492 00:35:34,900 --> 00:35:44,130 >> Kwa hiyo, kuna kipengele mwingine baridi ni ya Kimono ambapo unaweza paginate, 493 00:35:44,130 --> 00:35:47,780 ambayo si tu naweza scrape matokeo ya ukurasa huu, 494 00:35:47,780 --> 00:35:50,890 Siwezi click kwenye hii ndogo kifungo hapa, Pagination, 495 00:35:50,890 --> 00:35:55,580 kutaja kifungo kwamba ingekuwa kuchukua yangu kwa ukurasa wa pili, 496 00:35:55,580 --> 00:35:59,500 na basi itakuwa tu kujua kwamba inaweza iterate kwa ukurasa wa pili, 497 00:35:59,500 --> 00:36:04,120 na kisha scrape wote wa the-- kwa muda mrefu kama ni utaratibu huo huo wa kozi hii 498 00:36:04,120 --> 00:36:06,110 scape wote wa viungo wale pia. 499 00:36:06,110 --> 00:36:15,230 >> Hivyo kwa sababu Kimono hataki kazi na Craigslist, kile ambacho tumefanya 500 00:36:15,230 --> 00:36:19,790 ni Nimekuwa Kimonofied Harvard Crimson. 501 00:36:19,790 --> 00:36:29,380 Nimekuwa kujiondoa baadhi ya aina ya juu featured makala, kuthibitisha hapa. 502 00:36:29,380 --> 00:36:33,090 Kusema yote haya. 503 00:36:33,090 --> 00:36:35,830 Nimekuwa ulioandaliwa API huu kwa wewe kabla ya muda. 504 00:36:35,830 --> 00:36:38,990 Lakini vinginevyo nini ungependa kufanya ni wewe ingekuwa tu bonyeza Done. 505 00:36:38,990 --> 00:36:40,940 Kuingia katika API maelezo yako. 506 00:36:40,940 --> 00:36:45,260 Kuweka kwa ama automatiska au mwongozo kutambaa. 507 00:36:45,260 --> 00:36:48,460 Hivyo unaweza kuboresha yako data kila baada ya dakika 15, 508 00:36:48,460 --> 00:36:50,330 kila wiki, kila siku, chochote unataka. 509 00:36:50,330 --> 00:36:51,160 Jina API wako. 510 00:36:51,160 --> 00:36:52,790 Kujenga API. 511 00:36:52,790 --> 00:36:58,460 Kwa manufaa yako, nimekuwa kuundwa Bendera ukurasa wa mbele API tayari. 512 00:36:58,460 --> 00:37:02,480 >> Hivyo tu kujenga akaunti kwenye Kimono, na 513 00:37:02,480 --> 00:37:06,240 mapenzi kuhifadhi APIs yako yote kwa ajili yenu. 514 00:37:06,240 --> 00:37:10,330 Hivyo kimsingi kwamba wote yako tofauti scrapes tofauti. 515 00:37:10,330 --> 00:37:18,250 >> Hivyo kama sisi kuangalia hapa, hii ni maoni viungo kwamba nimepata zilizokusanywa. 516 00:37:18,250 --> 00:37:21,290 Hizi ni featured viungo kwamba nimepata zilizokusanywa. 517 00:37:21,290 --> 00:37:24,090 Na hawa ni wengi kusoma viungo kwamba nimepata zilizokusanywa 518 00:37:24,090 --> 00:37:27,120 kutokana na hili hivi karibuni API scape. 519 00:37:27,120 --> 00:37:30,790 >> Hivyo kama unaweza kuona hapa, haya itakuwa featured, 520 00:37:30,790 --> 00:37:34,130 haya itakuwa maoni, ambayo katika mfano huu, 521 00:37:34,130 --> 00:37:38,150 Nimekuwa pamoja nao wote ndani ya ukusanyaji moja. 522 00:37:38,150 --> 00:37:42,780 Lakini kama wewe tu kucheza karibu na hiyo kidogo kidogo, unaweza kupasuliwa it up 523 00:37:42,780 --> 00:37:45,090 na kuigawanya up hata hivyo unataka kwa muda mrefu 524 00:37:45,090 --> 00:37:47,520 kama formatting ni tofauti kidogo. 525 00:37:47,520 --> 00:37:51,320 >> Tu kucheza karibu na hayo, kutambaa kuanzisha, moja ya downsides 526 00:37:51,320 --> 00:37:58,120 ni unaweza tu kutambaa juu 25 kurasa wakati huo. 527 00:37:58,120 --> 00:38:00,430 Hiyo ni moja ya sababu ya kikwazo. 528 00:38:00,430 --> 00:38:03,060 Lakini hapa, kama wewe kuweka hiyo mwongozo kutambaa, hii 529 00:38:03,060 --> 00:38:06,100 ni jinsi gani unaweza kuwaambia ni kwa mahitaji data yako. 530 00:38:06,100 --> 00:38:11,010 Na hapa unaweza kuona historia kutambaa yako wa kila kitu kwamba umefanya crawled. 531 00:38:11,010 --> 00:38:16,000 Na nyie unaweza kwenda nyuma, ishara ya juu, kucheza karibu na njia zote mbalimbali 532 00:38:16,000 --> 00:38:20,340 ambayo unaweza kurekebisha na kutumia data zako. 533 00:38:20,340 --> 00:38:24,580 >> Kimono inaweza kuweka juu kwa scrape viungo ndani ya viungo. 534 00:38:24,580 --> 00:38:29,700 Na ungependa kufanya hivyo na kugema orodha ya viungo, 535 00:38:29,700 --> 00:38:35,390 na kisha kutumia kwamba API kama kuruka mbali uhakika kwa ajili API mwingine 536 00:38:35,390 --> 00:38:36,710 kwamba kujenga script. 537 00:38:36,710 --> 00:38:42,040 Lakini hiyo ni ngumu zaidi kuliko nini tunakwenda kupata katika leo. 538 00:38:42,040 --> 00:38:44,270 >> Hivyo hiyo ni Kimono. 539 00:38:44,270 --> 00:38:46,980 Tutaweza majadiliano juu ya faida na hasara za Nokogiri na Kimono. 540 00:38:46,980 --> 00:38:50,380 >> Nokogiri, ni kweli kwa haraka. 541 00:38:50,380 --> 00:38:51,640 Ni rahisi kwa mtihani. 542 00:38:51,640 --> 00:38:55,910 Unaweza tu unaweka chochote console, rahisi configure. 543 00:38:55,910 --> 00:39:00,400 Unaweza kuamua nini hasa unataka scrape na kuhifadhi. 544 00:39:00,400 --> 00:39:02,060 Hakuna ukurasa mipaka. 545 00:39:02,060 --> 00:39:08,010 Mimi kwa kweli kutumika kwa scrape kama 1800 tovuti ya Afrika Kusini shule 546 00:39:08,010 --> 00:39:10,870 kwa barua pepe kwa vitendo kwamba mimi alivyofanya. 547 00:39:10,870 --> 00:39:16,060 >> Hivyo hiyo ni iwezekanavyo, ingawa utendaji bora itakuwa kugawa up script. 548 00:39:16,060 --> 00:39:19,310 Kwa sababu kama inashindwa, basi huwezi kupata kitu chochote. 549 00:39:19,310 --> 00:39:22,790 Lakini kama wewe kufanya mia, labda 200 kurasa wakati huo, 550 00:39:22,790 --> 00:39:27,840 basi una baadhi ya nafasi ya angalau kupata ni piecemeal, hasa 551 00:39:27,840 --> 00:39:30,280 kama una biashara mbaya. 552 00:39:30,280 --> 00:39:32,720 >> Kwa bahati mbaya inaweza tu scrape HTML. 553 00:39:32,720 --> 00:39:35,190 Hivyo kama una dynamically kubeba pages-- 554 00:39:35,190 --> 00:39:39,480 na mimi itabidi kuonyesha mfano kama Kayak katika pili 555 00:39:39,480 --> 00:39:42,270 Nokogiri kwa bahati mbaya Huwezi scrape hiyo. 556 00:39:42,270 --> 00:39:45,700 >> Lakini Kimono pia ni rahisi kutumia. 557 00:39:45,700 --> 00:39:48,330 Kama wewe aliona, ni kimsingi uhakika na click. 558 00:39:48,330 --> 00:39:50,260 Ni inaweza scrape JavaScript. 559 00:39:50,260 --> 00:39:53,790 Kwa bahati mbaya, kuna upeo kwa jinsi kurasa nyingi unaweza scrape. 560 00:39:53,790 --> 00:39:55,710 Wakati mwingine ni kidogo ngumu configure. 561 00:39:55,710 --> 00:39:57,240 Ni anapata kuchanganyikiwa. 562 00:39:57,240 --> 00:40:00,920 Lakini ni dhahiri kitu ya kuzingatia 563 00:40:00,920 --> 00:40:05,930 kama wewe si kujaribu kuwa na super imara maintainable scrape. 564 00:40:05,930 --> 00:40:09,010 Kama unataka tu kupata kila kitu mbali ya ukurasa haraka, 565 00:40:09,010 --> 00:40:10,970 kisha Kimono ni kweli chombo nzuri ya kutumia. 566 00:40:10,970 --> 00:40:16,490 Na kama nilivyoeleza hapo awali, kuna kipengele juu ya Kimono 567 00:40:16,490 --> 00:40:19,260 kwamba inaonyesha jinsi ya kupata HTML kipekee 568 00:40:19,260 --> 00:40:24,210 kipengele, ambayo ni super muhimu hata kama wewe ni kufanya kazi katika Nokogiri. 569 00:40:24,210 --> 00:40:30,370 >> Hivyo kama sisi kwenda Kayak tovuti, kwa mfano, unaweza kuona kuna is-- 570 00:40:30,370 --> 00:40:31,750 au labda huwezi kuona. 571 00:40:31,750 --> 00:40:38,910 Lakini kama mimi kuonyesha URL kwa Kayak, hii kwa kweli ni tu chanzo URL. 572 00:40:38,910 --> 00:40:43,800 Hii ni URL kabla ya kuwa iliyopita na chochote JavaScript scripts 573 00:40:43,800 --> 00:40:45,350 kuwa wana kinachoendelea. 574 00:40:45,350 --> 00:40:52,420 Na ni kwenda kuangalia tofauti kutoka akikagua kipengele. 575 00:40:52,420 --> 00:40:55,940 >> Hivyo kama wewe kwenda kwa njia na wewe match up Kukagua kipengele 576 00:40:55,940 --> 00:41:00,340 kificho kwa kificho chanzo, ni kweli kwenda kuwa tofauti. 577 00:41:00,340 --> 00:41:05,640 Na hii ni kimsingi kwa nini Nokogiri Huwezi scrape maeneo dynamically kubeba. 578 00:41:05,640 --> 00:41:08,810 Kwa sababu Nokogiri ni kugema chanzo URL, 579 00:41:08,810 --> 00:41:16,310 ambapo Kimono ni kweli kugema nini wewe kimsingi 580 00:41:16,310 --> 00:41:18,260 kuona katika Teule kipengele. 581 00:41:18,260 --> 00:41:23,880 >> Hivyo kama mimi kwenda kwa njia na mimi kujaribu na Kimonofy Kayak, 582 00:41:23,880 --> 00:41:26,600 Mimi kweli anaweza kwenda kwa njia ya na kuchagua bei. 583 00:41:26,600 --> 00:41:32,360 Ni vigumu kidogo, na katika kesi hii, ni 584 00:41:32,360 --> 00:41:36,600 kweli kuona bei hii kama tofauti na haya. 585 00:41:36,600 --> 00:41:41,110 Hivyo ambapo unaweza configure-- au kama hii walikuwa si dynamically kubeba, 586 00:41:41,110 --> 00:41:43,620 unaweza configure Nokogiri kupata yote haya. 587 00:41:43,620 --> 00:41:48,230 >> Kwa sababu formatting ni kidogo tofauti kwa orodha hii 588 00:41:48,230 --> 00:41:51,280 kama ni ikilinganishwa na wengine wao, na unaweza kuona hapa 589 00:41:51,280 --> 00:41:54,830 ni kweli wamekwenda na kuchaguliwa zote ndege bei. 590 00:41:54,830 --> 00:42:01,200 Labda mimi unataka kuchagua wakati wa ndege pia. 591 00:42:01,200 --> 00:42:04,700 Na siwezi kwenda kwa njia na aina ya configure hiyo. 592 00:42:04,700 --> 00:42:06,950 Sitaki hiyo. 593 00:42:06,950 --> 00:42:10,200 Mimi nataka tu wakati ndege ujao. 594 00:42:10,200 --> 00:42:17,030 Na kisha baada ya wanandoa wa haya kwenda kwa, anapata picha. 595 00:42:17,030 --> 00:42:19,080 Hivyo Kimono ya pretty smart. 596 00:42:19,080 --> 00:42:21,900 Ni tu si kabisa kama imara. 597 00:42:21,900 --> 00:42:26,710 >> Kuna baadhi ya wengine mbadala kwamba unaweza kutumia. 598 00:42:26,710 --> 00:42:31,600 Na mimi nitakuonyesha yao hapa. 599 00:42:31,600 --> 00:42:35,790 Kama ni vizuri zaidi katika Chatu badala ya Ruby labda, 600 00:42:35,790 --> 00:42:39,290 kuna maktaba uitwao Mzuri supu. 601 00:42:39,290 --> 00:42:40,430 Unaweza kutumia hiyo. 602 00:42:40,430 --> 00:42:42,270 Ni sawa na Nokogiri. 603 00:42:42,270 --> 00:42:44,620 Ina chache makala zaidi. 604 00:42:44,620 --> 00:42:52,160 Unaweza kupata HTML tag na kisha hoja juu au hoja sideways. 605 00:42:52,160 --> 00:42:54,690 >> Kuna PyQt. 606 00:42:54,690 --> 00:42:57,820 Hii kweli anaweza scrape nguvu maeneo, kwa sababu ni aina ya 607 00:42:57,820 --> 00:43:02,540 ni WebKit kwamba anajifanya kuwa browser bila kuna kweli 608 00:43:02,540 --> 00:43:03,670 kuwa browser. 609 00:43:03,670 --> 00:43:07,490 Hivyo itakuwa kusubiri kwa wote JavaScript kupakia kwanza, na kisha 610 00:43:07,490 --> 00:43:09,560 kwenda katika na kujaribu na scrape tovuti. 611 00:43:09,560 --> 00:43:13,560 >> Kama unataka fimbo na Ruby, wewe unaweza kwenda ngazi moja juu kutoka Nokogiri. 612 00:43:13,560 --> 00:43:17,650 Unaweza kutumia Capybara kwa Poltergeist kanga. 613 00:43:17,650 --> 00:43:22,910 Na hii inaweza kweli kimsingi kufanya kitu kimoja 614 00:43:22,910 --> 00:43:26,610 kama PyQt, ambayo ni kuwa hii ni WebKit. 615 00:43:26,610 --> 00:43:29,610 Ni kusubiri kwa JavaScript kupakia kwanza. 616 00:43:29,610 --> 00:43:33,340 Kama Fiddle karibu na hayo kutosha, unaweza hata kupata kwa bonyeza juu ya mambo. 617 00:43:33,340 --> 00:43:42,780 >> Hivyo kama kuna kiungo kwamba si classic href ambapo 618 00:43:42,780 --> 00:43:46,350 njia ni urahisi, na ni baadhi kitu JavaScript kwamba hutambua 619 00:43:46,350 --> 00:43:49,490 click, unaweza kweli kufanya hivyo. 620 00:43:49,490 --> 00:43:53,430 Maktaba maarufu zaidi kuiga user 621 00:43:53,430 --> 00:43:56,390 ni katika JavaScript, ambayo ni PhantomJS. 622 00:43:56,390 --> 00:44:01,010 Hii inaweza wazi scrape nguvu maeneo kwa sababu hii ni kimsingi 623 00:44:01,010 --> 00:44:04,270 kujifanya kuwa Chrome bila interface user. 624 00:44:04,270 --> 00:44:09,970 >> Na kisha, bila shaka wengi imara, lakini madogo zaidi chaguo, 625 00:44:09,970 --> 00:44:13,260 ni automatisering Selenium browser. 626 00:44:13,260 --> 00:44:15,550 Na kwa bahati mbaya, wewe si kwenda kuwa 627 00:44:15,550 --> 00:44:19,770 uwezo wa kufanya hivyo ndani ya CS50 yako IDE. 628 00:44:19,770 --> 00:44:24,140 Kwa sababu kimsingi ni nini Je, ni buti up Chrome yako, 629 00:44:24,140 --> 00:44:27,090 Firefox, browser chochote kwamba unataka kutumia, 630 00:44:27,090 --> 00:44:32,570 na ni tracks labda panya yako harakati, chochote aina katika, 631 00:44:32,570 --> 00:44:35,170 na ni aina tu ya automates mchakato huu. 632 00:44:35,170 --> 00:44:42,070 Hivyo ilitengenezwa kama aina ya tovuti automatisering kupima chombo. 633 00:44:42,070 --> 00:44:45,910 Lakini mengi ya watu kutumia Selenium scrape tovuti 634 00:44:45,910 --> 00:44:49,990 kuwa wao vinginevyo na mengi ya ugumu kugema 635 00:44:49,990 --> 00:44:53,700 na baadhi ya hizi nyingine, kwa kasi zana. 636 00:44:53,700 --> 00:44:57,530 >> Hivyo hiyo ni yote Mimi nimepata kwa mtandao kugema. 637 00:44:57,530 --> 00:44:58,090 Kuwa na furaha. 638 00:44:58,090 --> 00:45:01,762 639 00:45:01,762 --> 00:45:02,680 >> Watazamaji: Swali. 640 00:45:02,680 --> 00:45:04,016 >> ROBERT KRABEK: Ndiyo. 641 00:45:04,016 --> 00:45:12,840 >> Watazamaji: Je, kuna utaratibu wa hash tovuti hivyo nilikuwa kimsingi 642 00:45:12,840 --> 00:45:14,207 kwenda kwa njia hiyo baadaye. 643 00:45:14,207 --> 00:45:15,040 ROBERT KRABEK: Naam. 644 00:45:15,040 --> 00:45:21,530 Hivyo sisi kuweka, katika yetu mfano, kwa wote wawili, 645 00:45:21,530 --> 00:45:24,980 sisi kuweka tovuti nzima katika doc. 646 00:45:24,980 --> 00:45:31,260 Na hivyo unaweza kweli tu kuchukua kutofautiana doc na kuandika ni kwa file. 647 00:45:31,260 --> 00:45:35,490 Hivyo kama nilitaka, mimi naweza kuandika ni nje kama faili HTML, 648 00:45:35,490 --> 00:45:39,280 na kisha badala ya kutumia OpenURI na ombi curl, 649 00:45:39,280 --> 00:45:43,520 basi mimi naweza tu kufungua doc HTML na kisha kutafuta kwa ajili hiyo. 650 00:45:43,520 --> 00:45:47,960 >> Watazamaji: Lakini unaweza kuhifadhi aina ya uzoefu online 651 00:45:47,960 --> 00:45:48,930 wakati wewe kufanya nje ya mkondo. 652 00:45:48,930 --> 00:45:51,013 Kwa mfano. wakati uko kuruka kwa saa kadhaa, 653 00:45:51,013 --> 00:45:54,070 Nataka kimsingi archive tovuti nzima. [Inaudible] 654 00:45:54,070 --> 00:45:58,780 >> ROBERT KRABEK: Yeah, hiyo ni exactly-- hivyo literally nini hii ni kufanya 655 00:45:58,780 --> 00:46:03,010 ni ni kuchukua kila kitu hiyo inaweza kuwa katika URL hii. 656 00:46:03,010 --> 00:46:11,280 Hivyo kama sisi mbio curl, ni kuchukua yote ya HTML huu, 657 00:46:11,280 --> 00:46:14,590 na ni hifadhi hiyo ndani ya doc kutofautiana. 658 00:46:14,590 --> 00:46:17,290 Hivyo basi unaweza kufanya chochote unataka kufanya na doc. 659 00:46:17,290 --> 00:46:18,575 Unaweza pato kwa faili. 660 00:46:18,575 --> 00:46:19,950 Watazamaji: Lakini si zimeunganishwa. 661 00:46:19,950 --> 00:46:20,780 Siyo nguvu. 662 00:46:20,780 --> 00:46:22,770 Siyo kujirudia, sawa? 663 00:46:22,770 --> 00:46:24,016 Unaweza kuona nini namaanisha? 664 00:46:24,016 --> 00:46:28,359 Mimi nina kujaribu kimsingi aina ya hash tovuti nzima kwenye gari yangu ngumu 665 00:46:28,359 --> 00:46:31,150 ili niweze kimsingi kufanya hivyo kwa masaa kadhaa bila ya mtandao. 666 00:46:31,150 --> 00:46:32,025 >> ROBERT KRABEK: Haki. 667 00:46:32,025 --> 00:46:37,140 Hivyo kama mimi had-- hivyo ambapo ni faili yangu I / O? 668 00:46:37,140 --> 00:46:47,766 Hivyo hii ni faili I / O. Hivyo kusema badala ya hii, mimi wito craigslist.html hii. 669 00:46:47,766 --> 00:46:52,620 670 00:46:52,620 --> 00:46:53,940 Ningependa kufungua kwamba up. 671 00:46:53,940 --> 00:46:59,020 Ningependa unaweka doc ndani yake. 672 00:46:59,020 --> 00:47:00,470 Mimi karibu faili. 673 00:47:00,470 --> 00:47:05,410 Na kisha tu kwa sababu CS50 IDE ni juu ya wingu, hiyo ni chochote. 674 00:47:05,410 --> 00:47:07,710 Siwezi kwenda hapa. 675 00:47:07,710 --> 00:47:09,320 Siwezi kupakua faili. 676 00:47:09,320 --> 00:47:11,830 Na kisha kwamba itakuwa kwenye gari yangu ngumu. 677 00:47:11,830 --> 00:47:13,930 Hivyo unaweza kufanya hivyo kwa njia hiyo. 678 00:47:13,930 --> 00:47:18,830 Au kama wewe ni nyumbani, si kwa kutumia CS50 IDE, kama Mtukufu au kitu, 679 00:47:18,830 --> 00:47:21,900 hii ni hata rahisi, kwa sababu hii yote ni inapatikana ndani ya nchi, 680 00:47:21,900 --> 00:47:23,020 si amefungwa kwa mtandao. 681 00:47:23,020 --> 00:47:24,720 >> Watazamaji: Mimi naona. 682 00:47:24,720 --> 00:47:26,580 Hii ni kwa tatizo moja tu. 683 00:47:26,580 --> 00:47:30,410 Je, unaweza kufanya hivyo recursively ili uweze kwenda tabaka kadhaa aina ya ndani ya jambo? 684 00:47:30,410 --> 00:47:33,801 >> ROBERT KRABEK: Mimi unaweza kushusha folders pia, kama kwamba ni nini wewe ni kuuliza. 685 00:47:33,801 --> 00:47:34,426 Watazamaji: Naam. 686 00:47:34,426 --> 00:47:39,890 687 00:47:39,890 --> 00:47:41,440 >> ROBERT KRABEK: Moto. 688 00:47:41,440 --> 00:47:43,182