摘 要: 经典的文本鲁棒水印会修改文本内容或格式,从而降低文本的保真性和可用性,文章提出了一种基于Word2Vec的中文文本零水印算法,能够在不修改文本信息的前提下实现水印的生成和检测。首先对文本数据进行分词,统计词频并提取特征词,运用Word2Vec生成相应的特征词向量;然后采用SVD(奇异值分解)算法对其进行降维,并结合AES(高级加密标准)加密生成最终的零水印。水印检测时,通过对比SVD分解产生的特征值和特征向量判断版权归属。基于理论概述和实验结果综合分析,文章提出的零水印算法不需要对原始文本做任何修改,能够抵抗一定程度的增删、句型转换、同义词替换等攻击,具有一定的鲁棒性,切实有效地解决了文本的版权保护问题。 |
关键词: Word2Vec;SVD;零水印;中文文本;词向量 |
中图分类号: TP309.2
文献标识码: A
|
|
A Zero-Watermark Algorithm for Chinese Text based on Word2Vec |
DAI Xiajing, XU Yicheng, WANG Xinya, TONG Deyu
|
(School of Information Engineering, Nanjing University of Finance and Economics, Nanjing 210023, China)
2415700426@qq.com; YichengXu421@163.com; 1520369099@qq.com; tdyforweb@163.com
|
Abstract: Classic text robust watermark can modify the content or the format, thereby reducing the fidelity and usability of the text. This paper proposes a Word2Vec-based zero-watermark algorithm for Chinese text, which ensures that watermark generation and detection make no modification to the original text. Firstly, by dividing the text into words, word frequency is counted and feature words are extracted; the corresponding feature word vector is generated by Word2Vec. Then, SVD (Singular Value Decomposition) algorithm is used to reduce its dimension, and the zero-watermark is finally generated by AES (Advanced Encryption Standard) encryption. In watermark detection, the copyright ownership is determined by comparing the eigenvalues and eigenvectors generated by SVD. Based on theoretical summary and comprehensive analysis of experimental results, the proposed zero-watermark algorithm does not need to make any modification to the original text, and can resist attacks such as addition and deletion, sentence pattern conversion and synonym substitution to a certain extent. It has certain robustness and effectively solves the problem of protecting the copyright of the text. |
Keywords: Word2Vec; SVD; zero-watermark; Chinese text; word vector |