India A.I. report – highlights

ஏற்கணவே எழுதிணபடி  இந்திய செயற்கை நுண்ணறிவு அறிக்கை வெளியிட்ட குழுவின் தலைவர்,  IIT-சென்னையைச் சேர்ந்த பேரா. காமகோடி. இந்த அறிக்கையில், முக்கியமான விஷயங்ககள் கீழே படம் வடிவங்களில் பாற்க;

India-AI-report-1

படம் 1: இந்திய செயற்கை நுண்ணறிவு அறிக்கை – மாற்றுத்திறணாளிகள் பற்றி

India-AI-report-2

படம் 2: இந்திய செயற்கை நுண்ணறிவு அறிக்கை – இந்தியமொழிகள் பற்றி

Tournament Model

Muthu@SFO-May-2018.jpg

This year I had chance to speak at my undergraduate institution – a well recognized engineering school in Trichy, India – about various things concerning my professional development and understanding of Science, Engineering and innovation in my short career as software developer and scientist-in-training.

Primarily, my goal was to communicate the tournament model and how we may enjoy our time in educational institutions pursuing a quest for truth regardless of some of the outcomes – just because they are governed by the tournament model.

Consider your task: to pick a winner in 2-player games from a group of N (say 128 or 64 players – like a typical Tennis tournament [or teams of smaller sizes for IPL or World Cup cricket tournaments]) then goal is to organize the games as a championship format with league rounds and knock-out tournaments to eventual final which decides the winner. This is the tournament model.

An alternate version where number of teams/players participating is not a power of 2, we may setup the model as follows algorithm/pseudocode;

  1. Enter all teams/payers in a double-ended-queue [deque]
  2. Select first-2 teams in queue and let them play;
  3. Take the winner of this game and enqueue to the end of queue; discard the loser (obviously!)
  4. Now we have N-1 teams/players in the queue.
  5. Repeat steps 2-4, till number of players is 1.
  6. We have a winner!

Key insight of tournament model is the fact that small differences between entities participating in the model can be amplified by the model making winners, and effects like the Matthew effect can ensure initial advantages snowball over time [esp. in industries like entertainment, social networking etc.]

The tournament model decides frequency of India vs Pakistan cricket matches, why Nadal vs Federer is most likely grand-slam final match up; the system decides success of professional actors and actresses. Why are Kamal Hassan and Rajinikanth more famous than other talented male actors of their generation (e.g. Sathyaraj, Karthik, Prabhu, etc.)[not to mention other female actresses – a whole other question]. Modern day movie star rivalries are also plenty, to wit – Danush vs Simbu etc. in their ascent to fame.

Many principles of randomness of outcomes, and regression toward mean explain the outcomes in retrospect; but none of the techniques have an ability to explain these phenomenon in a predictive manner which one may seek.

Hence as students approaching a potentially lifetime of work in field of engineering or science, I recommend everyone to aspire to understand the fundamental pieces – to learn the instruments, notes, chords, scales of their musical pieces – not just the piece itself- so in the future you can compose your own orchestral music; so that you can build tools for future challenges that you may face – surely different from challenges you were taught to resolve – using an open ended approach to learning.

Tournament model also helps you handle failures – be it product, strategy, problem areas in life. Usually, losing at something is by not making the grade or placing second or being edge out is by being marginally “less” in some way, shape or form, compared to competition.

What is your experience with managing technology projects, and their outcomes ? Leave your comment below.

-M.A.

PySangamam

Mr. B. Vijaykumar, founder of GLUG-Trichy, Zylogic, and GNU evangelist, and ChennaiPy organizer , is launching a Python conference in Chennai – PySangamam. Pre-registration is open at http://pysangamam.org

Announcement follows:

From: Vijay Kumar <vijaykumar@bravegnu.org>
Date: Sun, May 27, 2018 at 12:52 AM
Subject: [Chennaipy] PySangamam: Tickets is Open
To: Chennai Python User Group Mailing List <chennaipy@python.org>

Hi Everyone,
Early bird tickets for PySangamam, are now open! The early bird ticket is priced at Rs. 900 (Inclusive of GST). You can purchase tickets from http://pysangamam.org/

Do note that we have a contributor ticket priced at Rs. 5000 (Inclusive of GST). As a contributor, you can take pride in making the conference more accessible to students. Your contribution will go towards providing discounted tickets to students. You will also be credited on the conference website.

Regards,
Vijay
_______________________________________________
Chennaipy mailing list
Chennaipy@python.org
https://mail.python.org/mailman/listinfo/chennaipy

I had pleasure of participating in Chennai Python meetup in March, 2018; their technical audience is high-level and very sincere in their attempts to understand and communicate in regards to your material. Highly recommended to attend this event.

-Muthu

 

India A.I. task force report

Indian government Ministry of Commerce has released a report on the developing A.I. related tooling, technologies, marketplaces and workforce – read more here

The AI Taskforce also has a website aitf.org.in

At this time I do not have an analysis on this report itself; but it does seem like India is trying to (in principle) catch up with other nation states.

செம்பு, Toms River, தூத்துக்குடி

Citizens and residents of Toms River, NJ, USA, have a higher incidence of cancer directly attributed to the causative agents by untreated industrial effluents dumped into their groundwater.

The journalist Dan Fagin, wrote a self-titled book laying out the details of the problem and its aftermath – personally for an unfortunate few, and legally from lawsuits to compensate for suffering of the people – all stemming from the negligence of the industrial overlords. You can read more about his book here. It got 2014 Pulitzer prize for non-fiction.

Close to town of Karaikudi (Ramanathapuram/Sivagangai District) we know of stories of neighboring villages polluted by a certain public-sector petro-product company etc. But despite court-orders they continue to be functioning with little respite for the citizens.

I see what is happening in Thoothukudi right now as an effort to raise awareness of higher than random incidence of cancer most-likely attributed to past Sterlite unsafe practices; people are coming together to literally save their livelihoods and need assurances and updated practices and concrete buy-in before questionable practices can be allowed to continue.

Our computers – interconnects, microprocessor chips, IC’s in appliances like microwave oven, toaster etc, smart phones, lines of telephone wires, and cable TV’s need copper. We have to find a way to extract it safely and dispose off the effluents in internationally agreed safe processes, and continuously monitor the neighboring areas for contamination of ground-water, air and other environmental hazards. Up until that point it should be a no-go.

People of Tamilnadu deserve their voices heard. Tamil people have written on Copper – செப்பேடு – now we don’t want our fates written by Copper. Surely this can’t be too much to ask!

-Muthu

 

 

Namashkaar!

A.I./ML for Hindi Language Processing

Sometimes its good to look around and learn from what’s happening in other realms of Indian language processing. In my limited experience language efforts in computing for Indian language revolve around the Dravidian languages, Bengali, Marathi or Hindi. சில நேரங்களில் குண்டு சட்டியில் குதிரை ஓட்டுரமாதிரி கணினி மொழியியல் ஆயிடக்கூடாது – தனிபட்டபடியும் சரி – மொழிகளுக்கிடையிலும் சரி.

Some good project efforts in Hindi Language processing (open-source) are reviewed in this blog; [there are  projects like open-tamil API for Hindi, e.g. a get_letters like function, provided by tokenizer project here (with caveat that it is a small function only compared to expansive open-tamil), but we talk about the ML/A.I. focused projects here].

  1. Hindi word embedding called Hindi2vec (along lines of word2vec project). The idea is to associate similar words (e.g. ‘பல்’,’நாக்கு’,’வாய்’) with similar vectors within a neighborhood of each other using concepts of linear-algebra – vector spaces and matrices. So when you search or mistype or want to classify there is a neighborhood of known words closer to the potentially unknown word input from the user; such known neighborhood identification can help decision making and drive various learning, classification or dialogue systems.
  2. Hindi Transliteration Model project and the DeepTrans project– this is a really cool where they developed a reference data set of English to Hindi and trained a model for transliteration from English to Hindi of user input.
    1. We can do this in Tamil with the as we have many transliteration schemes as set out in open-tamil, but the even a same user is not strictly going to follow the scheme strictly, nor do different users follow the same scheme – in all these cases a machine learning A.I. model maybe more robust by virtue of learning the underlying rules. Very interesting project, and fairly simple to implement for Tamil from open-tamil transliterate module and SciKit Learn or other frameworks with high 95% correct prediction rate.
  3. Hindi-English parallel dictionary with 8MB size (probably 500,000 words or so I imagine) here – this can be a good jump starting point for translation projects if such existed for Tamil. e.g. Can we have a parallel dictionary English – Tamil for the simple TVU word list/dictionary ?
  4. Hindi Sentiment Analysis project does a ternary [good, bad, neutral] classification of text. They do this by using a CDAC-model which is super curious to me; maybe CDAC-India (Pune) has a Tamil POS-Tagger too ? Probably they do.
    1. Tamil POS-Taggers widely reported; AU-KBC Chennai has a POS-Tagger, probably the best for Tamil; Dr. Vasu Renganathan has a POS-Tagger, but both these works are not available currently for open-source use, however their techniques are openly shared via their papers in INFITT conferences.
    2. Sorkandu project can also be revived for making an open-source POS-Tagger
  5. Emotion Recognition in Hindi Speech project – this work from IIT KGP students builds a reference audio data set with known emotion labels and build some kind of a machine learning model, and then they get 5x better than random coin-toss/guess for the audio emotion recognition from speech.
    1. We probably don’t have any work on this direction in the open, but interestingly NIST in USA sponsored a Tamil Key Word Search (KWS), reports of which were published by a Singapore team in academic journals. More interestingly the KWS challenge released 2 hrs of speech data with tagged information. In USA, government released data usually qualifies for public-domain – e.g. pictures from NASA etc. so maybe there is a way to get this data. கடவுளுக்கு தான் வெளிச்சம்!

While we know, Google ASR, Youtube online translation of English videos into Tamil closed-captioning, foreign languages to Tamil Translation, Transliteration inputs all use perhaps the most advanced models in Tensorflow on cloud hardware, none of this technology is directly usable for free – maybe for a price via their Google cloud API offerings – and we probably don’t know all the details of how they achieved these magical software applications for Tamil language – anyones guess like mine is using the massive data sets they have from our Tamil news groups, emails, websites, and user input + Tensorflow A.I / ML magic. At least, we have to be grateful for Google-aandavar like some friends commented on freetamilcomputing group. 🙂

Surprisingly, to my knowledge, there are no planned efforts, ongoing or completed open-source projects like these in Tamil. Maybe another avenue for growth, and in this case Hindi projects (at least in open-source domain) seem to have forged ahead!

Shukriya.

-Muthu