Tamil in Morse-code

Can we compose a Tamil Morse-code ? Yes, we can.

315px-International_Morse_Code.svg

International Morse Code – Source: Wikipedia

  1. Start with a frequency count of Tamil letters from various sources
  2. Build a probability distribution from the frequency counts
  3. Build a Huffman code using the above distribution
  4. Each letter of Tamil alphabet gets a Morse code : 0 = ‘.’, 1 – ‘-‘.
    புள்ளி, கோடு.

Tamil Morse Code Table generated from Open-Tamil library. See here for full code and methodology. Full table follows.

Can you decode what this Morse code means in Tamil ? Hint: 2 words (4,5) letters long

…-. .-.—.-.-..– ..-.–.—.-.-… –.-…. …-. –.-…–.–.. –.-…–.-.. –.-…–.–.-…  .-.-…

  1. அ -> ..-…-
  2. ஷ -> —..-
  3. உ -> –.–.-
  4. வ் -> ..-.–.–..
  5. வௌ -> -.-.-…-.-..-..
  6. க -> …-.
  7. வை -> ..-.–…
  8. வோ -> .-…..-….
  9. வொ -> .——.-…–
  10. ங -> -.-.-…-.-…
  11. வே -> .——–
  12. வெ -> -.-…..
  13. வு -> .—…-
  14. வீ -> ..–..-..-
  15. வூ -> ..-.–.—.-..
  16. பௌ -> .-.—..-..-..
  17. தௌ -> —-.
  18. த் -> –.—.–
  19. தொ -> .–…-
  20. னி -> .——..
  21. தை -> …..-
  22. ப் -> ..–…
  23. ன -> .-.—…
  24. தே -> …–…–
  25. தூ -> –.—…
  26. தீ -> -..-.-
  27. து -> …–.–
  28. ற -> ..–.-
  29. வ -> -.-.—
  30. மா -> –.—-.
  31. மி -> -.-.–.-
  32. மை -> -.-….–.
  33. மொ -> ..-.–.–.-
  34. மோ -> .-…..-…-..
  35. மௌ -> –..
  36. ம் -> ..-.–.—-
  37. மீ -> —–..
  38. மு -> –.—..-
  39. மூ -> -.-.-…–
  40. ழொ -> …——-
  41. மெ -> –.-.–
  42. மே -> ..-..-
  43. தா -> …….
  44. தி -> .-.-…
  45. ஞே -> .——.-…-.-.-
  46. வி -> -.-.-.-
  47. வா -> ———.-.-..
  48. றீ -> .—…..-
  49. ய் -> ———.–.
  50. யௌ -> -.-.-…-..
  51. யோ -> .-…..-.—
  52. யொ -> –.-…—-
  53. ஆ -> .——.–
  54. யே -> .—..–.–.
  55. யெ -> …–…-.
  56. ஊ -> -.-.-…-.–.
  57. யூ -> -.-.-…-.—
  58. யு -> -.-.-…-.-..-.–
  59. எ -> .-.—.–
  60. ஞை -> .–.—-
  61. ஞொ -> –.-…–.–.—
  62. ஒ -> –.-…–.–.-.-..
  63. ஞ் -> ..-.-.
  64. ஞீ -> .-.—..-..–
  65. ஞூ -> .—..–.—
  66. ச -> ….-..
  67. ஞெ -> .——-.
  68. ஞ -> ..-.–.—.-.-…
  69. டி -> ….–
  70. டா -> .–.—.
  71. ஷ் -> ..-….
  72. ப -> .—.-
  73. ரா -> -.—
  74. ரி -> —…–..-.
  75. ம -> -.-..-
  76. க் -> -..-..-
  77. கௌ -> ——-.
  78. ல -> .—..-.
  79. கை -> .-…..—
  80. கோ -> ———-
  81. கொ -> .—-.
  82. கே -> ——–..
  83. கெ -> ..-.–.-.
  84. கு -> .-.–..
  85. கீ -> -..–.
  86. ஔ -> …–.-.-..-
  87. கூ -> -.-….—-
  88. கி -> —.-.
  89. கா -> —…–..–
  90. ரூ -> –.-…–.-.-.
  91. ட் -> –.-..-
  92. ரு -> –.-…–.–..
  93. ரெ -> —…–…-
  94. ரே -> –.-.-.-
  95. டை -> .——.-.-..
  96. டே -> .-.—..-..-.–
  97. ரோ -> ———.—
  98. ரை -> ..–..-…-
  99. டூ -> -.-.-…-.-..–
  100. டு -> .–.-.
  101. ர் -> –.-….
  102. ஞா -> .-……
  103. ஞி -> –.-…–.–.-.-.-
  104. ரீ -> ..-.–.—…
  105. யி -> .-.—.-.-.-
  106. யா -> .-…..-.–..
  107. டௌ -> .-…..-.-.
  108. டோ -> –.-…–.—
  109. டொ -> ..–..–
  110. ஃ -> —–.-.
  111. இ -> ……—-
  112. னா -> –.-…-.
  113. ஏ -> .-..—
  114. று -> .-…..-.–.-
  115. ரொ -> .-.—.-..
  116. ஓ -> ———.-.–
  117. றூ -> .-…..-…–.
  118. றே -> —…–.-
  119. டெ -> -.-.-..-
  120. றை -> .——.-….
  121. றோ -> -.-.-…-.-.-
  122. றொ -> .—–.
  123. ற் -> —.–
  124. ட -> -..-…
  125. ண -> ..-.–.—.-.-..-
  126. ஷி -> …–.-..
  127. லா -> -.-…-
  128. லி -> ……-.
  129. நை -> .-.-.-
  130. ய -> -.-….—.
  131. நொ -> .-.—..-.-
  132. நோ -> .——.-..–
  133. ள -> .-…-
  134. ரௌ -> –.-…–.–.-.—
  135. நௌ -> ..-.—
  136. ந் -> ———..
  137. நூ -> —…-..
  138. நீ -> .-..-.
  139. நு -> -.-….-.
  140. நெ -> .-.—.-.–
  141. நே -> .-.-..–
  142. ங் -> .–….-
  143. நி -> –.-…–.–.-…
  144. ஙு -> .-.—..-…
  145. லெ -> –.-…–..
  146. லே -> —…—
  147. லு -> ..-.–.—..-
  148. லூ -> –.-…–.–.-..-
  149. லௌ -> ..—
  150. ல் -> -.-.–..
  151. யீ -> –.–..
  152. லை -> .-.—.-.-..–
  153. லொ -> …—-.—
  154. லோ -> .—..–.-.
  155. ளொ -> .-…..-..-
  156. ஸ் -> –.-…–.–.–.
  157. தோ -> …——.
  158. றி -> .-.-..-.
  159. றா -> ——–.-
  160. னை -> …—-..
  161. னோ -> .——.-.–
  162. னொ -> ———.-.-.-
  163. ன் -> -.–.
  164. னு -> .—……
  165. னீ -> .——.-..-.
  166. னூ -> .——.-.-.–
  167. னே -> ..-.–.—.–
  168. னெ -> —–.–.-
  169. சௌ -> –.-…—.-
  170. ச் -> .-….-
  171. சை -> ..–..-.-
  172. சொ -> …–…..
  173. சோ -> .—..–..
  174. ஈ -> ..–..-….
  175. செ -> ——..
  176. சே -> …–….-
  177. சீ -> —…-.-
  178. சு -> .-..–.
  179. சூ -> ……—.
  180. ளோ -> .-…..-…—
  181. ஐ -> .-.—..-..-.-.
  182. ளை -> .—..—
  183. ணி -> .–.–..
  184. ணா -> –.-…–.-..
  185. ள் -> —….
  186. ளூ -> –.-…–.-.—
  187. நா -> -.-.-….
  188. ளு -> .——.-…-..
  189. ளீ -> .-.—.-.-..-.
  190. ளே -> .-.—.-.-…
  191. ளெ -> …–.-.-…
  192. ழா -> ——.-
  193. ழி -> -…
  194. த -> …–.-.–
  195. ந -> .–…..
  196. யை -> ———.-..
  197. றெ -> .–..-
  198. ர -> ..-.–.—.-.-.–
  199. பை -> —–.–..
  200. ழ -> …—–.
  201. பொ -> –.-.-..
  202. போ -> ..-.–..-
  203. பெ -> .—….-
  204. பே -> .——.-…-.-..
  205. பீ -> …—-.–.
  206. பு -> -..—
  207. பூ -> ……–.
  208. பா -> .-.—-
  209. பி -> .-.–.-
  210. தெ -> …—-.-.
  211. ழ் -> —–.—
  212. ழீ -> –.-…–.–.-.–.
  213. ழோ -> –.-…–.-.–.
  214. ழை -> .-…..–.
  215. ழெ -> ..-.–.—.-.-.-.
  216. ழே -> -.-.-…-.-..-.-.
  217. ழூ -> .-…..-…-.-.
  218. டீ -> .-…..-…-.–.
  219. ழு -> –.—–
  220. ணோ -> .——.-.-.-.
  221. ணொ -> .-…..-…-.—
  222. ணை -> –.—.-.
  223. ளி -> .–.–.-
  224. ளா -> …–.-.-.-
  225. ண் -> ….-.-
  226. ணூ -> .——.-…-.–
  227. ணு -> .-.—..–
  228. ணீ -> —…–….
  229. ணே -> ..-.–.—.-.–
  230. ணெ -> –.-…—..
  231. சா -> …–..-
  232. சி -> …—.

Caveats and Closing Comments

Of course 15 of 247 letters are perhaps not received any codeword in this codebook. Further with inclusion of Grantha letters, 323 letters exist in Tamil some of which we don’t have code words.

Further, a large text corpus like Project Madurai’s [PM] unigram frequency distribution maybe useful to develop a widely representative Morse code table. Once you have this PM unigram data, you know how to get this Tamil Morse codebook regenerated!

Language Transformations

Question  of Translation

How can you convert a text like “Me Amor!” to “என் உயிரே!” [from Spanish to தமிழ்] ? Lets  assume we have Spanish to English and Tamil to English translators [bidirectional with English] then we can convert Spanish to English then to Tamil. Likewise one can translate between any two languages from a clique of languages [so far as the clique is defined such that each language can be translated to at least one other language in clique].

Development – Theory

Language can exist as text (print/message/document) or speech (audio, conversations) etc. Ideas are represented in any language. Ideas originate from one language and move to another, or sometimes originate iñ many lañguages simultaneously. Ideas cañ cross from oñe language to añother via text or speech.

In mathematical terms if we write L as set of lañguages = { L1, L2, .. Ln} and then if we define each language as a tuple Li = (Ti,Si) then we may further define mathematical function operating on text and converting it to speech as :

TTSi : Ti -> Si

we may define a function speech recognition as,

ASRi : Si -> Ti

we may also define a translation function as,

TXij : Li -> Lj

Essentially what we can do is by representing the language as a node in a graph with two text and speech parts to it, we may connect these nodes to each other via the edges – functions – like ASR and TTS, and to nodes of other languages via translators function edge.

In a graph with only two languages [English, Tamil] with all edges representing functions like TTS, ASR within same language and functions like Translator between two languages (one for each direction) we see a graph like the following:

Screen Shot 2018-08-03 at 11.51.08 PM

Fig. 1: Language transformation graph. Nodes represent languages and their components. Edges represent functions like TTS, ASR [for same language] and Translators [directional between languages]. Clearly we may see this is a directed graph with ability to go from a specific language to another language in text or speech or both forms, provided a path exists from source to target language. Using such a graph with no orphan nodes, we may have universal translation powers from language A to language B [so far as bidirectional connectivity is present with at least one neighbor].

Problems to Ponder

So the curious reader now having a background of representing the translation problem as a graph problem of reaching node B from node A, can use rich set of path finding algorithms and shortest distance algorithms may attempt to answer some of these questions:

  1. What is the graph criteria for a language to have no translations ?
  2. What is the graph criteria for a language to not be able to have virtual assistant ? [Siri, Cortana, Alexa etc.]
  3. Conversely, to 2, what is minimum criteria [necessary but not sufficient] to have a virtual assistant [that can speak and listen] ?
  4. Given two paths to translating from language A -> F, which are of two different lengths which one would you choose and why? Assume all jumps have a uniform information loss. What if information loss at each edge is non-uniform, how can you optimized such a problem ?
  5. How would you introduce a new language into this graph so that it maybe translated to all other languages [unidirectionally] ?
  6. How would you introduce a new language into this graph so that it can be bi-directionally translated ?
  7. How can you represent the transliteration function in this graph ?

Answers will be posted soon! Feel free to leave your comments in section below.

-Muthu

India A.I. report – highlights

ஏற்கணவே எழுதிணபடி  இந்திய செயற்கை நுண்ணறிவு அறிக்கை வெளியிட்ட குழுவின் தலைவர்,  IIT-சென்னையைச் சேர்ந்த பேரா. காமகோடி. இந்த அறிக்கையில், முக்கியமான விஷயங்ககள் கீழே படம் வடிவங்களில் பாற்க;

India-AI-report-1

படம் 1: இந்திய செயற்கை நுண்ணறிவு அறிக்கை – மாற்றுத்திறணாளிகள் பற்றி

India-AI-report-2

படம் 2: இந்திய செயற்கை நுண்ணறிவு அறிக்கை – இந்தியமொழிகள் பற்றி

Tournament Model

Muthu@SFO-May-2018.jpg

This year I had chance to speak at my undergraduate institution – a well recognized engineering school in Trichy, India – about various things concerning my professional development and understanding of Science, Engineering and innovation in my short career as software developer and scientist-in-training.

Primarily, my goal was to communicate the tournament model and how we may enjoy our time in educational institutions pursuing a quest for truth regardless of some of the outcomes – just because they are governed by the tournament model.

Consider your task: to pick a winner in 2-player games from a group of N (say 128 or 64 players – like a typical Tennis tournament [or teams of smaller sizes for IPL or World Cup cricket tournaments]) then goal is to organize the games as a championship format with league rounds and knock-out tournaments to eventual final which decides the winner. This is the tournament model.

An alternate version where number of teams/players participating is not a power of 2, we may setup the model as follows algorithm/pseudocode;

  1. Enter all teams/payers in a double-ended-queue [deque]
  2. Select first-2 teams in queue and let them play;
  3. Take the winner of this game and enqueue to the end of queue; discard the loser (obviously!)
  4. Now we have N-1 teams/players in the queue.
  5. Repeat steps 2-4, till number of players is 1.
  6. We have a winner!

Key insight of tournament model is the fact that small differences between entities participating in the model can be amplified by the model making winners, and effects like the Matthew effect can ensure initial advantages snowball over time [esp. in industries like entertainment, social networking etc.]

The tournament model decides frequency of India vs Pakistan cricket matches, why Nadal vs Federer is most likely grand-slam final match up; the system decides success of professional actors and actresses. Why are Kamal Hassan and Rajinikanth more famous than other talented male actors of their generation (e.g. Sathyaraj, Karthik, Prabhu, etc.)[not to mention other female actresses – a whole other question]. Modern day movie star rivalries are also plenty, to wit – Danush vs Simbu etc. in their ascent to fame.

Many principles of randomness of outcomes, and regression toward mean explain the outcomes in retrospect; but none of the techniques have an ability to explain these phenomenon in a predictive manner which one may seek.

Hence as students approaching a potentially lifetime of work in field of engineering or science, I recommend everyone to aspire to understand the fundamental pieces – to learn the instruments, notes, chords, scales of their musical pieces – not just the piece itself- so in the future you can compose your own orchestral music; so that you can build tools for future challenges that you may face – surely different from challenges you were taught to resolve – using an open ended approach to learning.

Tournament model also helps you handle failures – be it product, strategy, problem areas in life. Usually, losing at something is by not making the grade or placing second or being edge out is by being marginally “less” in some way, shape or form, compared to competition.

What is your experience with managing technology projects, and their outcomes ? Leave your comment below.

-M.A.