Tamil in Morse-code

Can we compose a Tamil Morse-code ? Yes, we can.

315px-International_Morse_Code.svg

International Morse Code – Source: Wikipedia

  1. Start with a frequency count of Tamil letters from various sources
  2. Build a probability distribution from the frequency counts
  3. Build a Huffman code using the above distribution
  4. Each letter of Tamil alphabet gets a Morse code : 0 = ‘.’, 1 – ‘-‘.
    புள்ளி, கோடு.

Tamil Morse Code Table generated from Open-Tamil library. See here for full code and methodology. Full table follows.

Can you decode what this Morse code means in Tamil ? Hint: 2 words (4,5) letters long

…-. .-.—.-.-..– ..-.–.—.-.-… –.-…. …-. –.-…–.–.. –.-…–.-.. –.-…–.–.-…  .-.-…

  1. அ -> ..-…-
  2. ஷ -> —..-
  3. உ -> –.–.-
  4. வ் -> ..-.–.–..
  5. வௌ -> -.-.-…-.-..-..
  6. க -> …-.
  7. வை -> ..-.–…
  8. வோ -> .-…..-….
  9. வொ -> .——.-…–
  10. ங -> -.-.-…-.-…
  11. வே -> .——–
  12. வெ -> -.-…..
  13. வு -> .—…-
  14. வீ -> ..–..-..-
  15. வூ -> ..-.–.—.-..
  16. பௌ -> .-.—..-..-..
  17. தௌ -> —-.
  18. த் -> –.—.–
  19. தொ -> .–…-
  20. னி -> .——..
  21. தை -> …..-
  22. ப் -> ..–…
  23. ன -> .-.—…
  24. தே -> …–…–
  25. தூ -> –.—…
  26. தீ -> -..-.-
  27. து -> …–.–
  28. ற -> ..–.-
  29. வ -> -.-.—
  30. மா -> –.—-.
  31. மி -> -.-.–.-
  32. மை -> -.-….–.
  33. மொ -> ..-.–.–.-
  34. மோ -> .-…..-…-..
  35. மௌ -> –..
  36. ம் -> ..-.–.—-
  37. மீ -> —–..
  38. மு -> –.—..-
  39. மூ -> -.-.-…–
  40. ழொ -> …——-
  41. மெ -> –.-.–
  42. மே -> ..-..-
  43. தா -> …….
  44. தி -> .-.-…
  45. ஞே -> .——.-…-.-.-
  46. வி -> -.-.-.-
  47. வா -> ———.-.-..
  48. றீ -> .—…..-
  49. ய் -> ———.–.
  50. யௌ -> -.-.-…-..
  51. யோ -> .-…..-.—
  52. யொ -> –.-…—-
  53. ஆ -> .——.–
  54. யே -> .—..–.–.
  55. யெ -> …–…-.
  56. ஊ -> -.-.-…-.–.
  57. யூ -> -.-.-…-.—
  58. யு -> -.-.-…-.-..-.–
  59. எ -> .-.—.–
  60. ஞை -> .–.—-
  61. ஞொ -> –.-…–.–.—
  62. ஒ -> –.-…–.–.-.-..
  63. ஞ் -> ..-.-.
  64. ஞீ -> .-.—..-..–
  65. ஞூ -> .—..–.—
  66. ச -> ….-..
  67. ஞெ -> .——-.
  68. ஞ -> ..-.–.—.-.-…
  69. டி -> ….–
  70. டா -> .–.—.
  71. ஷ் -> ..-….
  72. ப -> .—.-
  73. ரா -> -.—
  74. ரி -> —…–..-.
  75. ம -> -.-..-
  76. க் -> -..-..-
  77. கௌ -> ——-.
  78. ல -> .—..-.
  79. கை -> .-…..—
  80. கோ -> ———-
  81. கொ -> .—-.
  82. கே -> ——–..
  83. கெ -> ..-.–.-.
  84. கு -> .-.–..
  85. கீ -> -..–.
  86. ஔ -> …–.-.-..-
  87. கூ -> -.-….—-
  88. கி -> —.-.
  89. கா -> —…–..–
  90. ரூ -> –.-…–.-.-.
  91. ட் -> –.-..-
  92. ரு -> –.-…–.–..
  93. ரெ -> —…–…-
  94. ரே -> –.-.-.-
  95. டை -> .——.-.-..
  96. டே -> .-.—..-..-.–
  97. ரோ -> ———.—
  98. ரை -> ..–..-…-
  99. டூ -> -.-.-…-.-..–
  100. டு -> .–.-.
  101. ர் -> –.-….
  102. ஞா -> .-……
  103. ஞி -> –.-…–.–.-.-.-
  104. ரீ -> ..-.–.—…
  105. யி -> .-.—.-.-.-
  106. யா -> .-…..-.–..
  107. டௌ -> .-…..-.-.
  108. டோ -> –.-…–.—
  109. டொ -> ..–..–
  110. ஃ -> —–.-.
  111. இ -> ……—-
  112. னா -> –.-…-.
  113. ஏ -> .-..—
  114. று -> .-…..-.–.-
  115. ரொ -> .-.—.-..
  116. ஓ -> ———.-.–
  117. றூ -> .-…..-…–.
  118. றே -> —…–.-
  119. டெ -> -.-.-..-
  120. றை -> .——.-….
  121. றோ -> -.-.-…-.-.-
  122. றொ -> .—–.
  123. ற் -> —.–
  124. ட -> -..-…
  125. ண -> ..-.–.—.-.-..-
  126. ஷி -> …–.-..
  127. லா -> -.-…-
  128. லி -> ……-.
  129. நை -> .-.-.-
  130. ய -> -.-….—.
  131. நொ -> .-.—..-.-
  132. நோ -> .——.-..–
  133. ள -> .-…-
  134. ரௌ -> –.-…–.–.-.—
  135. நௌ -> ..-.—
  136. ந் -> ———..
  137. நூ -> —…-..
  138. நீ -> .-..-.
  139. நு -> -.-….-.
  140. நெ -> .-.—.-.–
  141. நே -> .-.-..–
  142. ங் -> .–….-
  143. நி -> –.-…–.–.-…
  144. ஙு -> .-.—..-…
  145. லெ -> –.-…–..
  146. லே -> —…—
  147. லு -> ..-.–.—..-
  148. லூ -> –.-…–.–.-..-
  149. லௌ -> ..—
  150. ல் -> -.-.–..
  151. யீ -> –.–..
  152. லை -> .-.—.-.-..–
  153. லொ -> …—-.—
  154. லோ -> .—..–.-.
  155. ளொ -> .-…..-..-
  156. ஸ் -> –.-…–.–.–.
  157. தோ -> …——.
  158. றி -> .-.-..-.
  159. றா -> ——–.-
  160. னை -> …—-..
  161. னோ -> .——.-.–
  162. னொ -> ———.-.-.-
  163. ன் -> -.–.
  164. னு -> .—……
  165. னீ -> .——.-..-.
  166. னூ -> .——.-.-.–
  167. னே -> ..-.–.—.–
  168. னெ -> —–.–.-
  169. சௌ -> –.-…—.-
  170. ச் -> .-….-
  171. சை -> ..–..-.-
  172. சொ -> …–…..
  173. சோ -> .—..–..
  174. ஈ -> ..–..-….
  175. செ -> ——..
  176. சே -> …–….-
  177. சீ -> —…-.-
  178. சு -> .-..–.
  179. சூ -> ……—.
  180. ளோ -> .-…..-…—
  181. ஐ -> .-.—..-..-.-.
  182. ளை -> .—..—
  183. ணி -> .–.–..
  184. ணா -> –.-…–.-..
  185. ள் -> —….
  186. ளூ -> –.-…–.-.—
  187. நா -> -.-.-….
  188. ளு -> .——.-…-..
  189. ளீ -> .-.—.-.-..-.
  190. ளே -> .-.—.-.-…
  191. ளெ -> …–.-.-…
  192. ழா -> ——.-
  193. ழி -> -…
  194. த -> …–.-.–
  195. ந -> .–…..
  196. யை -> ———.-..
  197. றெ -> .–..-
  198. ர -> ..-.–.—.-.-.–
  199. பை -> —–.–..
  200. ழ -> …—–.
  201. பொ -> –.-.-..
  202. போ -> ..-.–..-
  203. பெ -> .—….-
  204. பே -> .——.-…-.-..
  205. பீ -> …—-.–.
  206. பு -> -..—
  207. பூ -> ……–.
  208. பா -> .-.—-
  209. பி -> .-.–.-
  210. தெ -> …—-.-.
  211. ழ் -> —–.—
  212. ழீ -> –.-…–.–.-.–.
  213. ழோ -> –.-…–.-.–.
  214. ழை -> .-…..–.
  215. ழெ -> ..-.–.—.-.-.-.
  216. ழே -> -.-.-…-.-..-.-.
  217. ழூ -> .-…..-…-.-.
  218. டீ -> .-…..-…-.–.
  219. ழு -> –.—–
  220. ணோ -> .——.-.-.-.
  221. ணொ -> .-…..-…-.—
  222. ணை -> –.—.-.
  223. ளி -> .–.–.-
  224. ளா -> …–.-.-.-
  225. ண் -> ….-.-
  226. ணூ -> .——.-…-.–
  227. ணு -> .-.—..–
  228. ணீ -> —…–….
  229. ணே -> ..-.–.—.-.–
  230. ணெ -> –.-…—..
  231. சா -> …–..-
  232. சி -> …—.

Caveats and Closing Comments

Of course 15 of 247 letters are perhaps not received any codeword in this codebook. Further with inclusion of Grantha letters, 323 letters exist in Tamil some of which we don’t have code words.

Further, a large text corpus like Project Madurai’s [PM] unigram frequency distribution maybe useful to develop a widely representative Morse code table. Once you have this PM unigram data, you know how to get this Tamil Morse codebook regenerated!

தமிழ்கருவி

2007-இல் எனது முதல் தமிழ் மென்பொருளை உருவாக்கினேன். இன்று தொலைந்த மென்பொருள்கலில் ஒன்று. எப்படி தொலயவிட்டேன் ? காலம்.

மிதம் https://egovindia.wordpress.com/2007/01/09/tamil-transliteration-tool-using-gtk-toolkit-for-gnome-environment/

Screen Shot 2018-08-11 at 1.47.00 AM

 

Tamilpesu.us update – Text Summarizer

Tamilpesu.us updated on Aug 8th. This brings all bug fixes of the development in Open-Tamil from March 2018 to present, and new functionality via Tamil text summarizer. This works on text analysis of essay input to split into sentences and words, and forms a correlation matrix to develop a score based off that we pull sentences from text into final summary. Give it a try http://tamilpesu.us/

text-summarizer-TamilPesu.us.png

Text summarizer output of the article from The Hindu Tamil newspaper. Its quite relevant summary in this case, providing an agreeable output.

Unfortunately we are not able to put the Tamil word classifier [using SciKit Learn and Python] online since currently the Python/Django dependency on AWS machine is incompatible; wait for that in future or try it out by yourself.

Thanks to all our contributors, as always to keep this volunteer work going.

 

Tamilisch – தமிழ் மொழியின் பெயர்

முதல் முரை நான் செருமன் மொழி கற்கும் போது தமிழ் மொழியின் பெயர் Tamilisch என்று சொன்னாங்க. ஜெர்மென் கற்க வாய்ப்பை பயன்படுத்திக்கொள்ளமுடியவில்லை.

ஒரு தானியங்கி ஆட்டொமாடிக்கா பல மொழிகளில் தமிழ் மொழியின் பெயர் இதோ!

Language Word for ‘தமிழ்’ Code
Afrikaans tamil af
Albanian tamil sq
Amharic ታሚልኛ am
Arabic التاميل ar
Armenian թամիլերեն hy
Azerbaijani Tamil az
Basque tamil eu
Belarusian тамільская be
Bengali তামিল bn
Bosnian Tamil bs
Bulgarian тамилски bg
Catalan tamil ca
cebCebuano Tamil nga
Chichewa Tamil ny
Chinese (Simplified) 泰米尔人 zh
Chinese (Traditional) 泰米爾人 zh-TW
Corsican Tamil co
Croatian tamilski hr
Czech tamil cs
Danish Tamil da
Dutch Tamil nl
English Tamil en
Esperanto tamila eo
Estonian tamil et
Filipino Tamil tl
Finnish tamil fi
French tamoul fr
Frisian tamil fy
Galician tamil gl
Georgian Tamil ka
German Tamilisch de
Greek Ταμίλ el
Gujarati તમિલ gu
Haitian Creole Tamil ht
Hausa Tamil ha
Hawaiian Tamil haw
Hebrew טמילית iw
Hindi तामिल hi
Hmong Tamil hmn
Hungarian tamil hu
Icelandic tamil is
Igbo Tamil ig
Indonesian Tamil id
Irish tamil ga
Italian Tamil it
Japanese タミル語 ja
Javanese Tamil jw
Kannada ತಮಿಳು kn
Kazakh Тамил kk
Khmer ភាសាតាមីល km
Korean 타밀 ko
Kurdish (Kurmanji) Tamil ku
Kyrgyz Tamil ky
Lao ທະມິນ lo
Latin Tamil la
Latvian Tamilu lv
Lithuanian tamilų lt
Luxembourgish Tamil lb
Macedonian Тамилските mk
Malagasy Tamil mg
Malay Tamil ms
Malayalam തമിഴ് ml
Maltese tamil mt
Maori Tamil mi
Marathi तामिळ mr
Mongolian Тамил mn
Myanmar (Burmese) တမီး my
Nepali तामिल ne
Norwegian Tamil no
Pashto تامیل ps
Persian تامیل fa
Polish Tamil pl
Portuguese tâmil pt
Punjabi ਤਾਮਿਲ pa
Romanian tamilă ro
Russian тамильский ru
Samoan Tamil sm
Scots Gaelic Tamil gd
Serbian тамилски sr
Sesotho Tamil st
Shona Tamil sn
Sindhi تامل sd
Sinhala දෙමළ si
Slovak tamil sk
Slovenian tamil sl
Somali Tamil so
Spanish Tamil es
Sundanese Tamil su
Swahili Tamil sw
Swedish Tamil sv
Tajik тамилӣ tg
Tamil தமிழ் ta
Telugu తమిళ te
Thai มิลักขะ th
Turkish Tamilce tr
Ukrainian тамільська uk
Urdu تمل ur
Uzbek Tamil uz
Vietnamese Tamil vi
Welsh tamil cy
Xhosa Tamil xh
Yiddish טאַמיל yi
Yoruba Tamil yo
Zulu Tamil zu

இதன் நிரல் இங்க்கே:

Apple’s second Keyboard for Tamil

Traditionally Apple Keyboard for Tamil has supported “Tamil Standard” [there are some anecdotal frustrations of senior Tamil computing innovators about Apple culturally appropriating this ‘Tamil99’ as their ‘standard’ and stripping away any mention of Tamil99, but this is for another day. Shh.] or Tamil99 layout which is quite useful for most practitioners. However one may observe in Tamil computer user community the prevalence of preference for transliteration input [Google has trained us by providing some good tools here, and Apple follows suit].

Beginning with iOS 12, Apple has released a Tamil transliterator. இதன் வழி எப்படியும் தமிழில் டைப்பிங் செய்யலாம்.

Screen Shot 2018-07-26 at 9.02.41 PM

Apple iOS v12 update provides an additional IM by Tamil Keyboard with transliteration input. This July, 2018 updates provides this facility in Apple iPhone and Mac devices via iOS 12.

This method is useful for typing short sentences for my use so far, but I have to spend long enough time to pick it up. I hope Apple continues to support this feature and introduce Tamil input to several newer younger generation folks.

Plusses include – easy predictive input. Minusses include – somewhat non-intuitive mapping for certain letters. I cannot post a full review here, only for lack of trying. Thanks, Apple.

Language Transformations

Question  of Translation

How can you convert a text like “Me Amor!” to “என் உயிரே!” [from Spanish to தமிழ்] ? Lets  assume we have Spanish to English and Tamil to English translators [bidirectional with English] then we can convert Spanish to English then to Tamil. Likewise one can translate between any two languages from a clique of languages [so far as the clique is defined such that each language can be translated to at least one other language in clique].

Development – Theory

Language can exist as text (print/message/document) or speech (audio, conversations) etc. Ideas are represented in any language. Ideas originate from one language and move to another, or sometimes originate iñ many lañguages simultaneously. Ideas cañ cross from oñe language to añother via text or speech.

In mathematical terms if we write L as set of lañguages = { L1, L2, .. Ln} and then if we define each language as a tuple Li = (Ti,Si) then we may further define mathematical function operating on text and converting it to speech as :

TTSi : Ti -> Si

we may define a function speech recognition as,

ASRi : Si -> Ti

we may also define a translation function as,

TXij : Li -> Lj

Essentially what we can do is by representing the language as a node in a graph with two text and speech parts to it, we may connect these nodes to each other via the edges – functions – like ASR and TTS, and to nodes of other languages via translators function edge.

In a graph with only two languages [English, Tamil] with all edges representing functions like TTS, ASR within same language and functions like Translator between two languages (one for each direction) we see a graph like the following:

Screen Shot 2018-08-03 at 11.51.08 PM

Fig. 1: Language transformation graph. Nodes represent languages and their components. Edges represent functions like TTS, ASR [for same language] and Translators [directional between languages]. Clearly we may see this is a directed graph with ability to go from a specific language to another language in text or speech or both forms, provided a path exists from source to target language. Using such a graph with no orphan nodes, we may have universal translation powers from language A to language B [so far as bidirectional connectivity is present with at least one neighbor].

Problems to Ponder

So the curious reader now having a background of representing the translation problem as a graph problem of reaching node B from node A, can use rich set of path finding algorithms and shortest distance algorithms may attempt to answer some of these questions:

  1. What is the graph criteria for a language to have no translations ?
  2. What is the graph criteria for a language to not be able to have virtual assistant ? [Siri, Cortana, Alexa etc.]
  3. Conversely, to 2, what is minimum criteria [necessary but not sufficient] to have a virtual assistant [that can speak and listen] ?
  4. Given two paths to translating from language A -> F, which are of two different lengths which one would you choose and why? Assume all jumps have a uniform information loss. What if information loss at each edge is non-uniform, how can you optimized such a problem ?
  5. How would you introduce a new language into this graph so that it maybe translated to all other languages [unidirectionally] ?
  6. How would you introduce a new language into this graph so that it can be bi-directionally translated ?
  7. How can you represent the transliteration function in this graph ?

Answers will be posted soon! Feel free to leave your comments in section below.

-Muthu