科技资讯

阿里巴巴研发中文语言数据集 Youku-mPLUG,语料数据来自优酷

发布日期:2023-07-11    点击次数:102

品玩6月8日讯,据 arxiv上的一篇论文显示,达摩院近日为了推动视觉语言预训练以及多模态大语言模型在中文社区的发展,发布了一款视频语言数据集 Youku-mPLUG。

这个数据集的所有内容均来自优酷,对其中的安全性、多样性和内容质量有着非常严格的标准。据达摩院介绍,Youku-mPLUG中包含了45种不同类型的1000万个视频文本,这些视频文本从4亿个原始视频中筛选而出,主要用于大规模预训练。

达摩院表示,Youku-mPLUG可以帮助研发人员在未来进行更深入的多模态研究,开发出更好的应用。

","gnid":"90231341503642ba5","img_data":[{"flag":2,"img":[{"desc":"","height":"326","title":"","url":"http://p1.img.360kuai.com/t0105ad40e9b2cf37d7.jpg","width":"750"}]}],"original":0,"pat":"art_src_3,fts0,sts0","powerby":"cache","pub_time":1686190380000,"pure":"","rawurl":"http://zm.news.so.com/03b4b6cd688ad4adca1579d03f5e2791","redirect":0,"rptid":"d09384a6389ff642","rss_ext":[],"s":"t","src":"品玩","tag":[{"clk":"ktechnology_1:达摩院","k":"达摩院","u":""},{"clk":"ktechnology_1:优酷","k":"优酷","u":""}],"title":"阿里巴巴研发中文语言数据集 Youku-mPLUG,语料数据来自优酷","type":"zmt","wapurl":"http://zm.news.so.com/03b4b6cd688ad4adca1579d03f5e2791","ytag":"科技:互联网:互联网安全","zmt":{"brand":{},"cert":"优质科技领域创作者","desc":"有品好玩的科技,一切与你有关。","fans_num":9264,"id":"2991151609","is_brand":"0","name":"品玩","new_verify":"7","pic":"http://p5.img.360kuai.com/t019112a1b3e04850a2.jpg","real":1,"textimg":"http://p9.img.360kuai.com/bl/0_3/t017c4d51e87f46986f.png","verify":"0"},"zmt_status":0}","errmsg":"","errno":0}

上一篇:宏杉科技发布MData数据库一体机 打造可靠易用的企业级国产化数据库
下一篇:30分钟音频当数据,任何音色零门槛生成,产品免费体验