Synatatic and Semantic Analysis of Language


Actually, analysis of a Langauge consists of three primary aspects: words, structures and meanings. The syntatics defines the relationship between words and structures, while the semantics defines the relationships between syntax and meangings. Therefore, any spelling and grammar checking involves three aspects as well. It is to check the correctness of words, the correctness of structural relationships and finally the correctness of meanings.

"Colorless green ideas sleep furioulsy". Noam Chomsky once said it to point out that a sentence can be syntatically correct but semantically wrong.

In English, fortunately, words have clear word-boundaries which are seperated by "Space". Therefore, each individual word can be checked against a dictionary, and so, spelling check is pretty much a trivial task for most cases. However, in Myanmar (မြန်မာ), it is not the case; words are not seperated by a Space, and words have no clear word-boundry, which makes spelling checking a bit challenging.

"It is an apple." Words are seperated by Space. "ဒါသည်ပန်းသီးဖြစ်ပါသည်။" Words are not necessarily seperated by Space.

In English, sentences are formed with Subject + Verb + Object, and therefore, the positions of Subject, Verb and Object are important. However, it is not the case with Myanmar (မြန်မာ). The positions of the phrases are not very important as long as a sentence is ended with a Verb.

However, in any language, it is common that the closer the positions of words, the stronger the relationship between them. Actually, it is one of the reasons why "Attention Mechanism" works in Language Model.

Therefore, it is assumed that the adjacency relationship between words could be analzyed to check the structures of words. Based on that assumption, Grammar is analzyed based on Adjacency Relationship.



Dependency Tree


In any language, a dependency tree can define the adjacency relationship between words.


Original Myanmar Text

မောင်မောင်ကခွေးကိုတုတ်နှင့်ရိုက်သည်။


There are no word-boundaries in the original sentence. However, Word Class Analysis can detect word-boundaries as well as Word Classes.

မောင်မောင်က ခွေးကို တုတ် နှင့် ရိုက်သည်
ရိုက်သည်နှင့်တုတ်ခွေးကိုမောင်မောင်က
ရိုက်သည်။ နှင့် တုတ် ခွေးကို မောင်မောင်က
ရိုက်သည်။ 0 1 2 1 1
နှင့် 1 0 1 INF INF
တုတ် 2 1 0 INF INF
ခွေးကို 1 INF INF 0 INF
မောင်မောင်က 1 INF INF INF 0