Andrew Krizhanovsky » Публикация

Поделиться публикацией:
Опубликовать в блог:
Опубликовано 2009-02-17 Опубликовано на SciPeople2012-10-29 16:27:19 ЖурналInternational Journal of Computer and Systems Sciences


On the Problem of Wiki-Texts Indexing
Krizhanovsky A., Smirnov A. / Andrew Krizhanovsky
International Journal of Computer and Systems Sciences, 2009, 48(4). P.616-624.
Аннотация A new type of documents called a "wiki page" is winning the Internet. This is expressed not only in an increase of the number of Internet pages of this type, but also in the popularity of Wiki projects (in particular, Wikipedia); therefore the problem of parsing in Wiki texts is becoming more and more topical. A new method for indexing Wikipedia texts in three languages: Russian, English, and German, is proposed and implemented. The architecture of the indexing system, including the software components GATE and Lemmatizer, is considered. The rules of converting Wiki texts into texts in a natural language are described. Index bases for the Russian Wikipedia and Simple English Wikipedia are constructed. The validity of Zipf's laws is tested for the Russian Wikipedia and Simple English Wikipedia.
Ключевые слова публикации:
     

Комментарии

Вам необходимо зайти или зарегистрироваться для комментирования
Этот комментарий был удален
Этот комментарий был удален