首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于遗传算法的聚焦爬虫搜索策略设计与研究
引用本文:陈悦,陈运,杨义先,胡迪.基于遗传算法的聚焦爬虫搜索策略设计与研究[J].成都信息工程学院学报,2011,26(5):533-537.
作者姓名:陈悦  陈运  杨义先  胡迪
作者单位:1. 成都信息工程学院信息安全研究所,四川成都,610225
2. 北京邮电大学信息安全中心,北京,100083
基金项目:国家自然科学基金资助项目(60821001);高等学校学科创新引智计划资助项目(B08004)
摘    要:网络爬虫是搜索引擎的重要组成部分。针对目前聚焦爬虫搜索策略的不足,提出了一种新的搜索策略解决方案。在搜索过程中对适应度高于或低于种群平均适应度的个体采用不同的交叉概率和变异概率来扩大爬虫的爬取范围、增加新个体,并通过改进遗传算子,提高聚焦爬虫的搜索效率。实验证明,基于自适应遗传算法的聚焦爬虫在一定程度上解决了传统遗传算法的"早熟"问题,而且能够爬取到更多主题相关的网页和相关度高的网页。

关 键 词:搜索引擎  搜索策略  聚焦爬虫  遗传算法  自适应

Design and Research on Search Strategy of Focused Crawler Based on Genetic Algorithm
CHEN Yue,CHEN Yun,YANG Yi-xian,HU Di.Design and Research on Search Strategy of Focused Crawler Based on Genetic Algorithm[J].Journal of Chengdu University of Information Technology,2011,26(5):533-537.
Authors:CHEN Yue  CHEN Yun  YANG Yi-xian  HU Di
Institution:1(1.Information Security Institute,Chengdu University of Information Technology,Chengdu 610225,China;2.Information Security Center,Beijing University of Posts and Telecommunications,Beijing 100083,China)
Abstract:Web crawler is an important component of search engine. To overcome the deficiency of focused crawler search strategy, a new search strategy was proposed. In search process, we enlarged the crawling range of crawler and increased new samples by applying different crossover or mutation probability to individuals with fitness above or below average population. We also improved the search efficiency by advancing the genetic operators. The experiment results indicate that the focus crawler method with self adaptive genetic algorithm could resolve the "premature" problem of with general genetic algorithm, and then get more theme concentric web pages.
Keywords:search engine  search strategy  focused crawler  genetic algorithm  self-adaptivity
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号