首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于支持向量机与余弦夹角法的中文网页过滤的研究与设计
引用本文:胡迪,陈运,杨义先,陈悦.基于支持向量机与余弦夹角法的中文网页过滤的研究与设计[J].成都信息工程学院学报,2011,26(5):527-532.
作者姓名:胡迪  陈运  杨义先  陈悦
作者单位:1. 成都信息工程学院信息安全研究所,四川成都,610225
2. 北京邮电大学信息安全中心,北京,100083
基金项目:国家自然科学基金资助项目(60821001);高等学校学科创新引智计划资助项目(B08004)
摘    要:为了进一步准确过滤对青少年危害较大的色情网页,在汉语词法分析系统添加去禁用词功能实现中文分词,通过改进的词频-逆文档频率及文档频率-互信息方法完成特征提取,从而实现基于支持向量机的过滤方案。并在给出的余弦夹角公式的基础上,提出了一种基于余弦夹角法的中文网页过滤方案。结合两种方案,实验证明在统一资源符总库存在条件下方案对色情等网页过滤效果进一步提高。

关 键 词:信息处理  网页过滤  支持向量机  余弦夹角法  特征提取

Research and Design of Chinese Web Filtering Based on SVM and Cosine Angle Method
HU Di,CHEN Yun,YANG Yi-xian,CHEN Yue.Research and Design of Chinese Web Filtering Based on SVM and Cosine Angle Method[J].Journal of Chengdu University of Information Technology,2011,26(5):527-532.
Authors:HU Di  CHEN Yun  YANG Yi-xian  CHEN Yue
Institution:1(1.Information Security Institute,Chengdu University of Information Technology,Chengdu 610225,China;2.Information Security Center,Beijing University of Posts and Telecommunications,Beijing 100083,China)
Abstract:To improve the filtering capability to pornographic websites harmful to the immaturity, removing stop words module was added in Institute of Computing Technology-Chinese Lexical Analysis System for Chinese words parsing. It fulfilled feature extraction by improved Term Frequency-Inverse Document Frequency and Document Frequency-Mutual Information methods, so as to archive Support Vector Machine based solution. Further, the cosine angle formula based Chinese-web filtering method was presented. Experiment results show better pornographic websites filtering abilities for terms in URL repository by combining the two methods.
Keywords:information processing  web filtering  support vector machine  cosine angle method  feature extraction
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号