A Chinese Word Similarity Model with Pronunciation, Radical and Semantic Embedding
AUTHORS
Ruiming Xiao,Harbin University of Science and Technology, Harbin, China China United Network Communications Group Co., Ltd., Huizhou, China
Leilei Kong*,Foshan University, Foshan, China
Zhongyuan Han,Foshan University, Foshan, China
Xu Sun,Heilongjiang Institute of Technology, Harbin, China
ABSTRACT
Aiming at the problem that the existing Chinese word similarity calculation research does not make full use of the three major factors of Chinese characters: pronunciation, radical and semantic, this paper proposes a Chinese Word Similarity Model with Pronunciation, Radical and Semantic Embedding. This model uses the distributed representation to learn the pronunciation embeddings, radical embeddings and semantic embeddings of Chinese characters or words, and then interactively measures the semantic similarity of these Chinese factors. Finally, it uses the ridge regression to fuse these similarities to obtain the similarity of the words. The experimental results on Word-Sim297 corpus show the effectiveness of the proposed model.
KEYWORDS
Chinese Word Similarity, Pronunciation, Radical, Semantic, Embedding
REFERENCES
[1] Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, and Jacky, “Montmain. semantic similarity from natural language and ontology analysis. synthesis lectures on human language technologies,” Morgan & Claypool Publishers, (2015)
[2] LeCun Y, Bengio Y, Hinton G, “Deep Learning,” Nature, vol. 521, no.7553, pp.436-444, (2015)
[3] Xu R, Chen T, Xia Y, et al., “Word Embedding Composition for Data Imbalances in Sentiment and Emotion Classification,” Cognitive Computation, vol.7, no.2, pp.226-240, (2015)
[4] Yu M, Gormley M R, Dredze M, and et al., “Combining Word Embeddings and Feature Embeddings for Fine-grained Relation Extraction,” Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.1374-1379, (2015)
[5] Zhou G, He T, Zhao J, et al., “Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering,” Proceedings of the Association for Computational Linguistics and the International Joint Conference on Natural Language Processing, pp.250-259, (2015)
[6] YanSong S S and JingLi Tencent A I. “Joint learning embeddings for Chinese words and their components via ladder structured networks,” Proceedings of the Twenty-Seventh International Joint Conference on Artifificial Intelligence (IJCAI-18), pp.4375-4381, (2018)
[7] Kang R, Zhang H, Hao W, and et al. “Learning Chinese Word Embeddings With Words and Subcharacter N-Grams,” vol.7, pp. 42987-42992, (2019)
[8] Cao S, Lu W, Zhou J, and et al. “cw2vec: Learning Chinese Word Embeddings with Stroke n-gram Information,” Proceedings of AAAI, pp.5053-5061, (2018)
[9] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space”, ICLR Workshop, (2013)
[10] Jin P and Wu Y., “SemEval-2012 Task 4: Evaluating Chinese Word Similarity,” Proceedings of Joint Conference on Lexical and Computational Semantics, pp.374-377, (2012)
[11] Finkelstein L, Gabrilovich E, Matias Y, and et al., “Placing search in context: The concept revisited,” ACM Transactions on information systems, vol.20, no.1, pp.116-131, (2002)
[12] Hauke J, Kossowski T, “Comparison of values of Pearson's and Spearman's correlation coefficients on the same sets of data,” Quaestiones geographicae, vol.30, no.2, pp. 87-93, (2011)