Than Lwin Aung

Synthetic Text Image Generation

Generally speaking, a huge amount of data is necessary to train most, if not all, machine learning models. Unfortunately, there are very little or almost no big datasets for underpresented languages, such as Myanmar (မြန်မာ) Language.

Although there are many Machine Learning Models available for Text Spotting and Text Recognition, where we can get enough Text Image Dataset to train such models is highly questionable.

In addition, manually labelling for such huge Dataset is already out of question. Therefore, synthetically generating Synthetic Text Images are the only option to create such a huge dataset. Actually, I was inspired by Visual Geomery Group's SynthText Dataset.

Synthetic Myanmar Text Image with Bounding Box

Syllable Bounding Boxes

Text Bounding Boxes

Synthetic Text Image Generation

Synthetic Myanmar Text Image with Bounding Box

Synthetic English Text Image with Bounding Box