# BatchParser BaseParser为BatchSpiderçåºç±»ï¼ç¨æ¥å®ä¹ä»»å¡ä¸å䏿°æ®è§£æï¼æ¯é¢åç¨æ·æä¾çæ¥å£ é¤äºæä¾[BaseParser](source_code/BaseParser)æææ¥å£å¤ï¼è¿æä¾ä»¥ä¸æ¹æ³ ## æ¹æ³è¯¦è§£ ### 1. æ·»å ä»»å¡ add_task add_task, æ¯æ¬¡æ§è¡start_monitoré½ä¼è°ç¨ï¼ä¸å¨init_taskä¹åè°ç¨, ç¨äºå¨æ¹æ¬¡ç¬è«å¯å¨åæ·»å ä»»å¡å°æ°æ®åº ``` class TestSpider(feapder.BatchSpider): def add_task(self): pass ``` ### 2. æ´æ°ä»»å¡ #### æ¹æ³ä¸ï¼ 䏿¡æ¡æ´æ° ```python def update_task_state(self, task_id, state=1, **kwargs): """ @summary: æ´æ°ä»»å¡è¡¨ä¸ä»»å¡ç¶æï¼å宿¯ä¸ªä»»å¡æ¶ä»£ç é»è¾ä¸è¦ä¸»å¨è°ç¨ è°ç¨æ¹æ³ä¸º yield lambda : self.update_task_state(task_id, state) --------- @param task_id: ä»»å¡id @param state: ä»»å¡ç¶æ --------- @result: """ ``` 举ä¾è¯´æ ``` def parse(self, request, response): yield item # è¿åitemï¼ itemä¼èªå¨æ¹éå ¥åº yield lambda : self.update_task_state(request.task_id, 1) ``` å¨`yield item`åï¼è°ç¨`self.update_task_state`彿°å®ç°ä»»å¡ç¶ææ´æ°ã è¿é为ä»ä¹ä½¿ç¨`yield lambda`æ¹å¼å¢ï¼å 为`yield item`åï¼itemä¸ä¼é©¬ä¸å ¥åºï¼ä¼åå¨ä¸ä¸ªbufferä¸ï¼æ¹éå ¥åºï¼å¦ææä»¬ç´æ¥è°ç¨`self.update_task_state`æ´æ°ä»»å¡ç¶æï¼å¯è½è¿æ¶itemè¿å¹¶æªå ¥åºï¼å¦ææ¤æ¶ç¨åºæå¤éåºï¼é£ä¹ç¼åä¸çè¿ä¸é¨åitemæ°æ®å°ä¼ä¸¢å¤±ï¼ä½æ¯æ¤æ¶ä»»å¡ç¶æå·²æ´æ°ï¼ä»»å¡ä¸ä¼éåï¼è¿ä¾¿ä¼å¯¼è´è¿ä¸ªä»»å¡æå¯¹åºçæ°æ®ä¸¢å¤± `yield lambda`è¿åçæ¯ä¸ä¸ªåè°å½æ°ï¼è¿ä¸ªå½æ°å¹¶ä¸ä¼é©¬ä¸æ§è¡ï¼ç³»ç»ä¼ä¿è¯itemå ¥åºååæ§è¡ï¼å æ¤è¿ä¹åçç¨æå¨äºitemå ¥åºååæ´æ°ä»»å¡ç¶æ #### æ¹æ³äºï¼ æ¹éæ´æ° ```python def update_task_batch(self, task_id, state=1, **kwargs): """ æ¹éæ´æ°ä»»å¡ å¤å¤è°ç¨ï¼æ´æ°çåæ®µå¿ é¡»ä¸è´ 注æï¼éè¦ åæ yield update_task_batch(...) å¦åä¸ä¼æ´æ° @param task_id: @param state: @param kwargs: @return: """ ``` 举ä¾è¯´æ ```python def parse(self, request, response): yield item # è¿åitemï¼ itemä¼èªå¨æ¹éå ¥åº yield self.update_task_batch(request.task_id, 1) # æ´æ°ä»»å¡ç¶æä¸º1 ``` å¨`yield item`åè°ç¨`self.update_task_batch`å®ç°æ¹éæ´æ° 注æï¼æ¹éæ´æ°å¿ é¡»ä½¿ç¨ `yield`, å 为`update_task_batch`彿°å¹¶æªå®ç°æ´æ°é»è¾ï¼åªæ¯è¿åäº`UpdateItem`ï¼ `UpdateItem`ä¸`Item`类似ï¼åªä¸è¿å¸¦ææ´æ°åè½ï¼æ¡æ¶ä¼å¨Itemå ¥åºåå¨è°ç¨`UpdateItem`å®ç°æ¹éæ´æ°ãå ³äº`UpdateItem`详解ï¼è¯·åè[UpdateItem]() #### ä¸¤ç§æ¹å¼éå åä¸å¼ 表ï¼è¥æ´æ°å段ç¸åï¼æ¨èä½¿ç¨æ¹éæ´æ°çæ¹å¼ï¼æçæ´é«ï¼è¥å段ä¸åï¼ç¨ä¸æ¡æ¡æ´æ°çæ¹å¼ãå 为æ¹éæ´æ°ï¼è¿ä¸æ¹çæ´æ°åæ®µå¿ é¡»ä¸è´ æ¯å¦å½è¯·æ±å¤±è´¥æ¶ï¼å°ä»»å¡æ´æ°ä¸º-1ï¼åæ¶æ 记失败åå ï¼æåæ¶å°ä»»å¡æ´æ°ä¸º1ï¼åæ³å¦ä¸ï¼ ```python def parse(self, request, response): yield self.update_task_batch(request.task_id, 1) # æ´æ°ä»»å¡ç¶æä¸º1 def failed_request(self, request, response): """ @summary: è¶ è¿æå¤§éè¯æ¬¡æ°çrequest --------- @param request: --------- @result: request / item / callback / None (è¿åå¼å¿ é¡»å¯è¿ä»£) """ yield request yield lambda : self.update_task_state(request.task_id, -1, remark="失败åå ") # æ´æ°ä»»å¡ç¶æä¸º-1 ``` å ä»»å¡å¤±è´¥æ¶å¤æ´æ°äºä¸ªremarkåæ®µï¼ä¸ä»»å¡æåæ¶åªæ´æ°stateåæ®µä¸åï¼å æ¤éè¦å°æ¤æ´æ°æä½åç¬æåºæ¥ï¼ç¨`update_task_state`æ¹å¼æ´æ° ### 3. è·åæ¹æ¬¡æ¶é´ 示ä¾ï¼ def parse(self, request, response): item = SpiderDataItem() # 声æä¸ä¸ªitem item.batch_data = self.batch_date item.title = title # ç»item屿§èµå¼ yield item # è¿åitemï¼ itemä¼èªå¨æ¹éå ¥åº ä½¿ç¨`self.batch_date`å¯è·åå½åæ¹æ¬¡æ¶é´ï¼ç¶åæ¼æ¥å°itemå ¥åº æ°æ®ç¤ºä¾ | id | title | batch_date | | --- | --- | --- | | 1 | ç¾åº¦ä¸ä¸ | 2021-01-01 |