Artful Arisuvadi – Tamil Alphabet Nomograms

Some weeks ago I started playing with and made a bunch of alphabet nomogram style pictures with Easel JS. Its interesting to think of possibilities.

Canonical Tamil has 12 + 1 vowels [உயிர்], 18 consonants [மெய்] and 12×18 = 216 [உயிர்மெய்] conjugate letters. Together the can be arranged in a Table of named column [12 for vowels] and named rows [18 for consonants] and cells of row-column at the conjugate letters.

I posted several images on Twitter; first one based on 3 concentric circles arrangement of the letters.

Arisuvadi – (C) 2018 Muthu Annamalai

Another image based on sunflower-spiral:

Arisuvadi – (C) 2018 Muthu Annamalai

The other based on a logarithmic spiral: 

Arisuvadi – (C) 2018 Muthu Annamalai

Another image looks to illustrate vowels and consonants as an interactive widget where you select uyir and mei letters from the outer + inner circles to form the uyirmei conjugate letter in the center.

Arisuvadi – (c) 2018 Muthu Annamalai

Tamilisch – தமிழ் மொழியின் பெயர்

முதல் முரை நான் செருமன் மொழி கற்கும் போது தமிழ் மொழியின் பெயர் Tamilisch என்று சொன்னாங்க. ஜெர்மென் கற்க வாய்ப்பை பயன்படுத்திக்கொள்ளமுடியவில்லை.

ஒரு தானியங்கி ஆட்டொமாடிக்கா பல மொழிகளில் தமிழ் மொழியின் பெயர் இதோ!

Language Word for ‘தமிழ்’ Code
Afrikaans tamil af
Albanian tamil sq
Amharic ታሚልኛ am
Arabic التاميل ar
Armenian թամիլերեն hy
Azerbaijani Tamil az
Basque tamil eu
Belarusian тамільская be
Bengali তামিল bn
Bosnian Tamil bs
Bulgarian тамилски bg
Catalan tamil ca
cebCebuano Tamil nga
Chichewa Tamil ny
Chinese (Simplified) 泰米尔人 zh
Chinese (Traditional) 泰米爾人 zh-TW
Corsican Tamil co
Croatian tamilski hr
Czech tamil cs
Danish Tamil da
Dutch Tamil nl
English Tamil en
Esperanto tamila eo
Estonian tamil et
Filipino Tamil tl
Finnish tamil fi
French tamoul fr
Frisian tamil fy
Galician tamil gl
Georgian Tamil ka
German Tamilisch de
Greek Ταμίλ el
Gujarati તમિલ gu
Haitian Creole Tamil ht
Hausa Tamil ha
Hawaiian Tamil haw
Hebrew טמילית iw
Hindi तामिल hi
Hmong Tamil hmn
Hungarian tamil hu
Icelandic tamil is
Igbo Tamil ig
Indonesian Tamil id
Irish tamil ga
Italian Tamil it
Japanese タミル語 ja
Javanese Tamil jw
Kannada ತಮಿಳು kn
Kazakh Тамил kk
Khmer ភាសាតាមីល km
Korean 타밀 ko
Kurdish (Kurmanji) Tamil ku
Kyrgyz Tamil ky
Lao ທະມິນ lo
Latin Tamil la
Latvian Tamilu lv
Lithuanian tamilų lt
Luxembourgish Tamil lb
Macedonian Тамилските mk
Malagasy Tamil mg
Malay Tamil ms
Malayalam തമിഴ് ml
Maltese tamil mt
Maori Tamil mi
Marathi तामिळ mr
Mongolian Тамил mn
Myanmar (Burmese) တမီး my
Nepali तामिल ne
Norwegian Tamil no
Pashto تامیل ps
Persian تامیل fa
Polish Tamil pl
Portuguese tâmil pt
Punjabi ਤਾਮਿਲ pa
Romanian tamilă ro
Russian тамильский ru
Samoan Tamil sm
Scots Gaelic Tamil gd
Serbian тамилски sr
Sesotho Tamil st
Shona Tamil sn
Sindhi تامل sd
Sinhala දෙමළ si
Slovak tamil sk
Slovenian tamil sl
Somali Tamil so
Spanish Tamil es
Sundanese Tamil su
Swahili Tamil sw
Swedish Tamil sv
Tajik тамилӣ tg
Tamil தமிழ் ta
Telugu తమిళ te
Thai มิลักขะ th
Turkish Tamilce tr
Ukrainian тамільська uk
Urdu تمل ur
Uzbek Tamil uz
Vietnamese Tamil vi
Welsh tamil cy
Xhosa Tamil xh
Yiddish טאַמיל yi
Yoruba Tamil yo
Zulu Tamil zu

இதன் நிரல் இங்க்கே:

Namashkaar!

A.I./ML for Hindi Language Processing

Sometimes its good to look around and learn from what’s happening in other realms of Indian language processing. In my limited experience language efforts in computing for Indian language revolve around the Dravidian languages, Bengali, Marathi or Hindi. சில நேரங்களில் குண்டு சட்டியில் குதிரை ஓட்டுரமாதிரி கணினி மொழியியல் ஆயிடக்கூடாது – தனிபட்டபடியும் சரி – மொழிகளுக்கிடையிலும் சரி.

Some good project efforts in Hindi Language processing (open-source) are reviewed in this blog; [there are  projects like open-tamil API for Hindi, e.g. a get_letters like function, provided by tokenizer project here (with caveat that it is a small function only compared to expansive open-tamil), but we talk about the ML/A.I. focused projects here].

  1. Hindi word embedding called Hindi2vec (along lines of word2vec project). The idea is to associate similar words (e.g. ‘பல்’,’நாக்கு’,’வாய்’) with similar vectors within a neighborhood of each other using concepts of linear-algebra – vector spaces and matrices. So when you search or mistype or want to classify there is a neighborhood of known words closer to the potentially unknown word input from the user; such known neighborhood identification can help decision making and drive various learning, classification or dialogue systems.
  2. Hindi Transliteration Model project and the DeepTrans project– this is a really cool where they developed a reference data set of English to Hindi and trained a model for transliteration from English to Hindi of user input.
    1. We can do this in Tamil with the as we have many transliteration schemes as set out in open-tamil, but the even a same user is not strictly going to follow the scheme strictly, nor do different users follow the same scheme – in all these cases a machine learning A.I. model maybe more robust by virtue of learning the underlying rules. Very interesting project, and fairly simple to implement for Tamil from open-tamil transliterate module and SciKit Learn or other frameworks with high 95% correct prediction rate.
  3. Hindi-English parallel dictionary with 8MB size (probably 500,000 words or so I imagine) here – this can be a good jump starting point for translation projects if such existed for Tamil. e.g. Can we have a parallel dictionary English – Tamil for the simple TVU word list/dictionary ?
  4. Hindi Sentiment Analysis project does a ternary [good, bad, neutral] classification of text. They do this by using a CDAC-model which is super curious to me; maybe CDAC-India (Pune) has a Tamil POS-Tagger too ? Probably they do.
    1. Tamil POS-Taggers widely reported; AU-KBC Chennai has a POS-Tagger, probably the best for Tamil; Dr. Vasu Renganathan has a POS-Tagger, but both these works are not available currently for open-source use, however their techniques are openly shared via their papers in INFITT conferences.
    2. Sorkandu project can also be revived for making an open-source POS-Tagger
  5. Emotion Recognition in Hindi Speech project – this work from IIT KGP students builds a reference audio data set with known emotion labels and build some kind of a machine learning model, and then they get 5x better than random coin-toss/guess for the audio emotion recognition from speech.
    1. We probably don’t have any work on this direction in the open, but interestingly NIST in USA sponsored a Tamil Key Word Search (KWS), reports of which were published by a Singapore team in academic journals. More interestingly the KWS challenge released 2 hrs of speech data with tagged information. In USA, government released data usually qualifies for public-domain – e.g. pictures from NASA etc. so maybe there is a way to get this data. கடவுளுக்கு தான் வெளிச்சம்!

While we know, Google ASR, Youtube online translation of English videos into Tamil closed-captioning, foreign languages to Tamil Translation, Transliteration inputs all use perhaps the most advanced models in Tensorflow on cloud hardware, none of this technology is directly usable for free – maybe for a price via their Google cloud API offerings – and we probably don’t know all the details of how they achieved these magical software applications for Tamil language – anyones guess like mine is using the massive data sets they have from our Tamil news groups, emails, websites, and user input + Tensorflow A.I / ML magic. At least, we have to be grateful for Google-aandavar like some friends commented on freetamilcomputing group. 🙂

Surprisingly, to my knowledge, there are no planned efforts, ongoing or completed open-source projects like these in Tamil. Maybe another avenue for growth, and in this case Hindi projects (at least in open-source domain) seem to have forged ahead!

Shukriya.

-Muthu

 

 

காதல் -> தவம் – பாகம் 2

விடை: சொல் ஏணி (word-ladder games ) என்பன காதல்-இல் இருந்து தவம் வரை மாற்ற உதவும் – இதை காண்க.

  1. அதாவது, ஒரு அகராதியை கொண்டு, முனை-ஓரம் படம் அமைக்கவும்.
  2. இரு சொற்கள் ஓரத்தால் இணைக்கப்பட்டால், அவை ஒன்ருடன் ஒன்று ஒரு எழுத்து மாற்றம் வழி தொடர்புடையது என்று அர்த்தம்.

இதை கொண்டு ஏற்கனவே ‘காதல் -> தவம்‘ எழுதினோம்.

மேலும் இந்த ஆய்வுக்கட்டுரை அழகாக உள்ளளது – (கட்டுரை) ‘Word Morph and Topological Structures: A Graph Generating Algorithm’, Jürgen Klüver, Jörn Schmidt, Christina Klüver, (2016), Complexity, Vol. 21, No. S1. Wiley Publications.

 

Chennai Python 24th, March, 2018

24th March, 2018,  Chennai Python Meet-up

Open-Tamil and Ezhil-Language Projects

“எழில் என்பது முதல் திர மூலமாக கிடைக்கக்கூடிய தமிழ் ஸ்கிரிப்டை அடிப்படையாகக்
கொண்ட நிரலாக்க மொழி ஆகும், இது விண்டோஸ் 32, 64 மற்றும் Ubuntu, Fedora Linux மற்றும் Docker தளங்களில் 2017 ஆம் ஆண்டில் வெளியான http://ezhillang.org. எழில் ஒரு பைத்தான்-அடிப்படையிலான மொழிஇயக்கி. வளர்ச்சி GitHub வழியாக நடைபெறுகிறது.

திறந்த-தமிழ் தமிழ் நெருக்கமாக தொடர்புடைய தமிழ் மொழி செயலாக்க கருவிகள் கொன்டது; நூலகம் ஆரம்பத்தில் எழில் மொழியின் ஒரு கீற்றாக துவங்கியது; ஆனால் விரைவாக வார்த்தை-வடிகட்டுதல், N- கிராம் பகுப்பாய்வு, புணற்சசி இலக்கணம், தமிழ் எழுத்துப்பிழை சொல்திருத்தி உருவாக்கம் முதலியன, பல மொழிகளில் பைத்தான், முக்கியமாக, ஜாவா, ரூபி முதலியவற்றிற்கான தமிழ் தொகுப்புகள் பரிசுரம் செய்யபட்டன். http://tamilpesu.us வலையில், மற்றும் Play Store இல் Kalsee பயன்பாட்டில் எங்கள் வேலைகளை பயன்படுத்தலாம்.”

600_469542627

 

Thanks to kind arrangements of friends in Chennai Python, and open-tamil community I had an opportunity to make a presentation on Open-Tamil and Ezhil-Lang projects, and completion. Talk was well received, and delivered in unique Tamil mixed with English due to comfort of being in Chennai only!

open-tamil on web

Today, you are welcome to play with open-tamil API via web at http://tamilpesu.us

DXrBTyUX0AEm7ET.jpg-large

Generating multiplication tables via Open-Tamil APIs’: http://tamilpesu.us/vaypaadu/

This is collective work of our team underlying the website (written in Django+Python) highlighting various aspects of open-tamil like transliteration, numeral generation, encoding converters, spell checker among other things. At this time I hope to keep the website running through most of this year, and add features as git-repo https://github.com/Ezhil-Language-Foundation/open-tamil gets updated.

Thanks to Mr. Syed Abuthahir, many months ago, in winter of 2017, he has developed an interface for open-tamil on the web and shared with us under GNU Affero GPL terms. Later, we is added as part of main open-tamil as well.

Open-Tamil moves forward; come join us!

-Muthu

Tamil Internet Conference 2018 – Coimbatore, India

Tamil internet Conference 2018 to take place at TNAU, Coimbatore, India later this year. Please see call for papers (March 30th deadline) to share your new and upcoming works in Tamil, linguistics and applied computer technology.

Please see the email from Prof. Kalyanasundaram, chair of Tamil Internet Conference – 2018.

DXgFYf8U0AATWuQ.jpg-large

Email from Prof. Kalyan announcing call for papers for Tamil Internet Conference 2018, at TNAU Coimbatore, India.