首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Data selection using support vector regression
Authors:Michael B Richman  Lance M Leslie  Theodore B Trafalis  Hicham Mansouri
Institution:1. School of Meteorology and Cooperative Institute for Mesoscale Meteorological Studies,University of Oklahoma, Norman, Oklahoma, 73072, USA
2. School of Industrial and Systems Engineering, University of Oklahoma, Norman, Oklahoma, 73019, USA
3. Power Costs, Inc., 301 David L.Boren Blvd., Suite 2000, Norman, Oklahoma 73072, USA
Abstract:Geophysical data sets are growing at an ever-increasing rate, requiring computationally efficient data selection(thinning)methods to preserve essential information. Satellites, such as Wind Sat, provide large data sets for assessing the accuracy and computational efficiency of data selection techniques. A new data thinning technique, based on support vector regression(SVR), is developed and tested. To manage large on-line satellite data streams, observations from Wind Sat are formed into subsets by Voronoi tessellation and then each is thinned by SVR(TSVR). Three experiments are performed. The first confirms the viability of TSVR for a relatively small sample, comparing it to several commonly used data thinning methods(random selection, averaging and Barnes filtering), producing a 10% thinning rate(90% data reduction), low mean absolute errors(MAE) and large correlations with the original data. A second experiment, using a larger dataset, shows TSVR retrievals with MAE 1 m s-1and correlations 0.98. TSVR was an order of magnitude faster than the commonly used thinning methods. A third experiment applies a two-stage pipeline to TSVR, to accommodate online data. The pipeline subsets reconstruct the wind field with the same accuracy as the second experiment, is an order of magnitude faster than the nonpipeline TSVR. Therefore, pipeline TSVR is two orders of magnitude faster than commonly used thinning methods that ingest the entire data set. This study demonstrates that TSVR pipeline thinning is an accurate and computationally efficient alternative to commonly used data selection techniques.
Keywords:data  selection  data  thinning  machine  learning  support  vector  regression  Voronoi  tessellation  pipeline  methods
本文献已被 CNKI 万方数据 SpringerLink 等数据库收录!
点击此处可从《大气科学进展》浏览原始摘要信息
点击此处可从《大气科学进展》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号