首页 | 本学科首页   官方微博 | 高级检索  
     检索      

多模态融合的家庭音乐相册自动生成
引用本文:刘君芳,邵曦.多模态融合的家庭音乐相册自动生成[J].南京气象学院学报,2017,9(6):661-668.
作者姓名:刘君芳  邵曦
作者单位:南京邮电大学 通信与信息工程学院, 南京, 210003,南京邮电大学 通信与信息工程学院, 南京, 210003
基金项目:国家自然科学基金(61401227);北京市自然科学基金(4152053)
摘    要:随着大数据以及社交网络的发展,电子相册与在线服务成为如今人们使用计算机与互联网的基础应用.尤其是近年社交网络的流行,电子相册的数量得到了爆炸增长,而如何增强相册的用户体验变得尤为重要.具有某种主题的相册一般都带有一定的情感信息,因此,本文研究了基于多模态融合的家庭音乐相册自动生成问题,旨在使用户能够在享受音乐的同时配以与音乐情感相同的相册图片.针对音乐与图片中所蕴含的情感,本文在音乐和图像中分别选取能够表达其情感的句子级别的音频特征和图像特征,然后在图像与音乐之间异构和跨模态的特征融合问题上,采用局部保持投影(LPP)方法,将图像特征与音乐特征映射到更具情感分类能力的隐式特征空间中,实现了音乐相册的自动生成.在实验中,客观评测结果表明,采用LPP方法在查准率方面高于纯CCA方法;在主观评测中LPP获得72.06%的满意度,与人工推荐的评价结果(78.09%)比较接近,明显高于随机推荐和CCA方法的满意度.

关 键 词:音乐相册  情感模型  句子级别  多模态融合  隐式空间
收稿时间:2017/8/28 0:00:00

Automatic generation of family music album based on multi-modal fusion
LIU Junfang and SHAO Xi.Automatic generation of family music album based on multi-modal fusion[J].Journal of Nanjing Institute of Meteorology,2017,9(6):661-668.
Authors:LIU Junfang and SHAO Xi
Institution:College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003 and College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003
Abstract:With the development of the big data and social network,electronic albums and online services have become basic uses of computers and the Internet.Especially in recent years,the number of electronic albums has exploded with the popularity of social network.So how to improve the user experience of music album becomes particularly important.A photo album with certain topic usually has some emotion information.This paper studies the problem of automatic generation of family music album based on multi-modal fusion,so that users can enjoy music when browsing album photos with matched emotion.According to the emotions in music and images,the representative sentence-level features both for music and images are selected,and the LPP (Locality Preserving Projection) is employed to study the relevance between the music and the images in the same emotion.The image feature and the music feature are mapped into the latent space with more emotional classification ability to realize the automatic generation of music album.In the experiments,the objective evaluation result shows that the LPP method is higher than pure CCA (Canonical Correlation Analysis) method in precision;and in the subjective evaluation,the proposed LPP method achieves 72.06% at satisfaction level,which is close to the results of manually recommended approach (78.09%) and is higher than the results of randomly recommended approach and pure CCA approach.
Keywords:music album  emotion model  sentence-level  multi-modal fusion  latent space
点击此处可从《南京气象学院学报》浏览原始摘要信息
点击此处可从《南京气象学院学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号