Dutch Corpus | Download Upd

While primarily a web-based tool, it provides access to the nlTenTen corpus, a massive web-crawled dataset of 5.9 billion words. Categorized Dutch Datasets for Specialized Use

Maintained by the Royal Library in The Hague, Delpher allows you to download millions of historical newspaper articles, books, and magazines as .txt or .pdf files. download dutch corpus

Several institutional and open-source platforms provide direct downloads or access to Dutch language data: While primarily a web-based tool, it provides access

A European infrastructure that hosts various specialized datasets. For example, the IFA Spoken Language Corpus provides over 1.8GB of speech files and transcriptions. For example, the IFA Spoken Language Corpus provides over 1

This is the primary authority for Dutch linguistic resources. You can access the SoNaR Corpus , a 500-million-token reference corpus of contemporary written Dutch (1954–2011), which is available for non-commercial download upon creating an IvdNT Account .

A modern hub for machine learning datasets. Notable entries include the GPT-NL Public Corpus , which contains 36 billion Dutch tokens—making it one of the largest permissively licensed resources available.

Depending on your project, you might need a specific type of corpus: Can I download the Corpus Contemporary Dutch ... - CLARIN