📄️ PAWUK: Polish Automatic Web corpus of UKrainian language
PAWUK is an acronym for Polish Automatic Web corpus of UKrainian language. It is a linguistic corpus containing Ukrainian texts acquired from the Internet (selected web pages and social network accounts) and is updated daily. It is automatically annotated with morphosyntactic tags, syntactic dependencies and named entities using Stanza with a custom-built model for Ukrainian to produce both Universal Dependencies tags and VESUM morphological tags.