Myanmar GPT (Generative Pre-Trained) Model


Based on GPT-2 Model, I have tried to pre-train Myanmar GPT from Scratch for other purposes. The model is pre-trained with 8 Layers, 768 Hidden Units, 8 Heads, 517 Token Lengths and 65,000 Vocabulary Size.

As we can see, with Nucleus Sampling the accuracy rate has become a lot better than random sampling when generating text.

Dataset is generally curated from Wikipedia and other free sources, with the 4 million sentences (~2.5 GB).