ASV-Label
Login

16px-feed-icon Veröffentlichungen View this page in English

KKPRRY

Abstract:

This paper reports the results of a study on automatic keyword extraction in German. We employed in general two types of methods: (A) unsupervised, based on information theory, i.e., (i) a bigram model, (ii) a probabilistic parser model, and (iii) a novel model which considers topics within the discourse of target word for the calculation of their information content, and (B) supervised, employing a recurrent neural network (RNN). As baselines, we employed TextRank and the TF-IDF ranking function. The topic model (A)(iii) outperformed clearly all remaining models, even TextRank and TF-IDF. In contrast, RNN performed poorly. We take the results as first evidence that (i) information content can be employed for keyword extraction tasks and has thus a clear correspondence to semantics of natural language, and (ii) that—as a cognitive principle—the information content of words is determined from extra-sentential contexts, i.e., from the discourse of words.

Type: Inbook

Author: Max Kölbl, Yuki Kyogoku, J. Nathanael Philipp, Michael Richter, Clemens Rietdorf, Tariq Yousef
Title: The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German
Pages: 139-161
Publisher: Springer
Year: 2021
Volume:939
Series:Studies in Computational Intelligence
Address:
@INBOOK{KKPRRY,
AUTHOR = {Max Kölbl, Yuki Kyogoku, J. Nathanael Philipp, Michael Richter, Clemens Rietdorf, Tariq Yousef},
TITLE = {The Semantic Level of Shannon Information: Are Highly Informative Words Good Keywords? A Study on German},
PAGES = {139-161},
PUBLISHER = {Springer },
YEAR = {2021},
VOLUME = {939},
SERIES = {Studies in Computational Intelligence }
}