Classification of texts
On the basis of the available Korpora a quite exact classification can be accomplished by unknown text. A text (or a sentence) can be assigned to a language, Subsprache or a subject. The word distributions in the Korpora thereby “Chinese serves the department the Whisper” algorithm of Dipl.Ing. Christian Biemann can also without a knowledge base be done as basicby. As concrete implementation is used jLanI (Java LANGUAGE Identifier), which was developed beyond that in the context of the Bachelorarbeit and by Sven Teresniak. The Klassifikator works automatically, statistically, non-supervised, non-learning and gets along without negative examples.
Contact: Sven Teresniak