|<   <   Page 1 / 1   >   >|

open   documented  

The Gigaword Corpus

Authors:  Sigrún HelgadóttirEiríkur RögnvaldssonJörgen PindStarkaður BarkarsonTomaž ErjavecMaciej OgrodniczukPetya OsenovaNikola LjubešićKiril SimovAndrej PančurMichał RudolfMatyáš KoppSteinþór SteingrímssonÇağrı ÇöltekinJesse de DoesKatrien DepuydtTommaso AgnoloniGiulia VenturiMaría Calzada PérezLuciana D. de MacedoCostanza NavarrettaGiancarlo LuxardoMatthew CoolePaul RaysonVaidas MorkevičiusTomas KrilavičiusRoberts DarǵisOrsolya RingRuben van HeusdenMaarten MarxDarja Fišer
Updated:  2022-04-30
Source:  https://malheildir.arnastofnun.is/?mode=rmh2022#?stats_reduce=word&isCaseInsensitive&searchBy=word&cqp=%5B%5D&lang=en&display=about
Keywords:  IcelandicCorpusMonolingual

subs2vec

Python 3.7 resources to evaluate bigram and trigram frequencies in corpora.

Authors:  Jeroen van ParidonBill Thompson
Updated:  2019-04-30
Source:  https://github.com/jvparidon/subs2vec
Keywords:  languagebigramtrigramlexical normspsycholinguisticsAfrikaansArabicBulgarianBengaliBretonBosnianCatalanCzechDanishGermanGreekEnglishEsperantoSpanishEstonianBasqueFarsiFinnishFrenchGalicianHebrewHindiCroatianHungarianArmenianIndonesianIcelandicItalianGeorgianKazakhKoreanLithuanianLatvianMacedonianMalayalamMalayDutchNorwegianPolishPortugueseRomanianRussianSinhalaSlovakSlovenianAlbanianSerbianSwedishTamilTeluguTagalogTurkishUkranianUrduVietnamese

open  

SLABank

SLABank is a component of TalkBank dedicated to providing corpora for the study of second language acquisition.

Authors:  Brian MacWhinney
Updated:  2018-05-04
Source:  https://slabank.talkbank.org/
Keywords:  language-acquisitionsecond-languageCzechEnglishFrenchGermanHungarianIcelandicItalianMandarinSpanish

|<   <   Page 1 / 1   >   >|