基于核特征的商品图像句子标注Product image sentence annotation based on kernel features
张红斌;姬东鸿;任亚峰;尹兰;
摘要(Abstract):
用单词标注图像会产生歧义或噪声,故采用句子标注商品图像,以准确刻画商品特性.现有商品图像句子标注方法存在特征学习不充分的问题,针对该问题,提出基于核特征模型抽取图像的形状、颜色和梯度3种核特征,并在多核学习模型内融合生成新特征,基于新特征完成商品图像分类,检索视觉相似的训练图像,摘录其标题中的关键文本标注商品图像.最后,从信息检索和机器翻译两个角度分别评价标注性能.实验表明:基于新特征能获取最优的商品图像分类性能,图像分类缩小了图像检索范围,有助于改善检索性能;标注模型的MAP(Mean Average Precision)值和P-R(Precision-Recall)指标均优于基线;所标句子与图像内容语义相关,且连贯性和流畅性更优.
关键词(KeyWords): 核特征;多核学习;商品图像;句子标注;自然语言生成
基金项目(Foundation): 国家自然科学基金重点项目(编号:61133012);; 教育部人文社会科学研究项目(编号:16YJAZH029);; 江西省科技厅科技攻关项目(编号:20142BBG70011;20121BBG70050);; 江西省高校人文社科基金项目(编号:XW1502;TQ1503);; 江西省社科规划项目(编号:16TQ02);; 江西省普通本科高校中青年教师发展计划访问学者专项资金;; 华东交通大学校立基金项目(编号:11RJ01)
作者(Authors): 张红斌;姬东鸿;任亚峰;尹兰;
DOI: 10.14188/j.1671-8844.2017-01-021
参考文献(References):
- [1]Monay F,Gatica-Perez D.PLSA-based image auto annotation:constraining the latent space[C]//Proceedings of ACM International Conference on Multimedia,2004:348-351.
- [2]Makadia A,Pavlovic V,Kumar S.A new baseline for image annotation[C]//Proceedings of European Conference on Computer Vision,2008:316-329.
- [3]Feng F,Lapata M.Automatic caption generation for news images[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence,2013:797-812.
- [4]Yao B,Yang X,Lin L,Lee M W,Zhu S C.I2t:Image parsing to text description[J].Proceedings of the IEEE,2010,98(8):1485-1508.
- [5]Hodosh M,Young P,Hockenmaier J.Framing image description as a ranking task:Data,models and evaluation metrics[J].J.Artif.Intell.Res.(JAIR),2013,47:853-899.
- [6]Farhadi A,Hejrati M,Sadeghi M A,Young P,Rashtchian C,Hockenmaier J,Forsyth D.Every picture tells a story:Generating sentences from images[C]//Proceedings of European Conference on Computer Vision,2010:15-29.
- [7]Li Piji,Ma Jun,Gao Shuai.Learning to summarize web image and text mutually[C]//Proceedings of International Conference on Multimedia Retrieval,2012.
- [8]Yang Y,Teo C L,Daume H,Aloimonos Y.Corpusguided sentence generation of natural images[C]//Proceedings of Conference on Empirical Methods on Natural Language Processing,2011:444-454.
- [9]Kulkarni G,Premraj V,Dhar S,Li S,Choi Y,Berg A C,Berg T L.Baby talk:Understanding and generating simple image descriptions[C]//IEEE Transactions on Pattern Analysis and Machine Intelligence,2013:2891-2903.
- [10]Tamara L Berg,Alexander C Berg,Jonathan Shih.Automatic attribute discovery and characterization from noisy web data[C]//Proceedings of European Conference on Computer Vision,2010:663-676.
- [11]Rebecca.Domain-independent captioning of domainspecific images[C]//Proceedings of North American Association for Computational Linguistics,2013:69-76.
- [12]Torralba A,Murphy K P,Freeman W T,Rubin M A.Context-based vision system for place and object recognition[C]//Proceedings of IEEE International Conference on Computer Vision,2003:273-280.
- [13]Ryan Kiros,Richard S Zemel,Ruslan Salakhutdinov.Multimodal neural language models[C]//Proceedings of Advances in Neural Information Processing Systems,2013.
- [14]Andriy Mnih,Geoffrey Hinton.Three new graphical models for statistical language modelling[C]//Proceedings of International Conference on Machine Learning,2007:641-648.
- [15]Hadi Kiapour,Kota Yamaguchi,Alexander C Berg,Tamara L Berg.Hipster wars:Discovering elements of fashion styles[C]//Proceedings of European Conference on Computer Vision,2014:472-488.
- [16]Bo L,Ren X,Fox D.Kernel descriptors for visual recognition[C]//Proceedings of Advances in Neural Information Processing Systems,2010:1734-1742.
- [17]Bo L,Ren X,Fox D.Efficient match kernels between sets of features for visual recognition[C]//Proceedings of Advances in Neural Information Processing Systems,2009:135-143.
- [18]Vedaldi A,Gulshan V,Varma M,Zisserman A.Multiple kernels for object detection[C]//Proceedings of IEEE International Conference on Computer Vision,2010.
- [19]Ojala T,Pietikainen M,Maenpaa T.Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24(7):971-987.
- [20]Radev D,Allison T,Blair-Goldensohn S,et al.MEAD-aplatform for multidocument multilingual text summarization[C]//Proceedings of Language Resources and Evaluation Conference,2004:699-702.
- [21]Lowe D.Distinctive image features from scale-invariant keypoints[J].Proceedings of International Journal of Computer Vision,2004:91-110.
- [22]Lazebnik S,Schmid C,Ponce J.Beyond bags of features:Spatial pyramid matching for recognizing natural scene categories[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2006:2169-2178.
- [23]Dalal N,Triggs B.Histograms of oriented gradients for human detection[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2005:886-893.
- [24]Hinton G E,Osindero S,Teh Y.A fast learning algorithm for deep belief nets[J].Neural Computation,2006,18:1527-1554.
- [25]Sivaram G,Hermansky H.Sparse multilayer perceptron for phoneme recognition[J].IEEE Trans.Audio,Speech,&Language Proc,2012:20(1):23-29.
- [26]Yang J,Yu K,Gong Y,Huang T S.Linear spatial pyramid matching using sparse coding for image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2009:1794-1801.
- [27]Wang J,Yang J,Yu K,et al.Locality-constrained linear coding for image classification[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2010:3360-3367.
- [28]Wang Xikui,Liu Yang,Wang Donghui,Wu Fei.Cross-media topic mining on wikipedia[C]//Proceedings of ACM International Conference on Multimedia,2013:689-692.
- [29]Rasiwasia N,Pereira J,Coviello E,et al.A new approach to cross-modal multimedia retrieval[C]//Proceedings of ACM International Conference on Multimedia,2010:251-260.
- [30]Kishore Papineni,Salim Roukos,Todd Ward,Zhu Weijing.Bleu:a method for automatic evaluation of machine translation[C]//Proceedings of the Annual meeting on association for Computational Linguistics,2002:311-318.
- [31]Fan R E,Chang K W,Hsieh C J,et al.LIBLINEAR:A library for large linear classification[J].Journal of Machine Learning Research,2008,9:1871-1874.
- [32]Chang C C,Lin C J.LIBSVM:a library for support vector machines[J].ACM Transactions on Intelligent Systems and Technology,2011,2(3):1-27.
- [33]Ankush Gupta,Yashaswi Verma,Jawahar C V.Choosing linguistics over vision to describe images[C]//Proceedings of American Association for Artificial Intelligence,2012:606-612.