A bipartite graph structure for Tamil

Remark: Tamil alphabets [which are Abugida or alphasyllabary in nature] can be written as a fully connected bipartite graph G(C+V,E). Both the basic 247 letters [known to have a ring representation] and sequences involving வட மொழி letters can be written in the sequence of two sets, V – vowels [உயிர்] and C – consonants [மெய்], and edges E: C -> V forming a map from each consonant to vowel (e.g.: க் + அ -> க ) are the உயிர்மெய் எழுத்துக்கள். This is a K_{\left[18 \times 12\right]} bipartite graph. Strictly speaking we can add அயுத எழுத்து ‘ஃ’ as a disconnected node and call it a K_{\left[ 18 \times 12\right]} + 1 forest graph. This may be simply extended to cover the வட மொழி எழுத்துக்கள் [Sanskrit letters optionally used in Tamil]. Full alphabet set is obtained by cumulative sum of edges and vertices.

Corollary: Most other alphasyllabary, Abugida languages have a similar bipartite graph representation.

Fig. 1: A fully connected Bipartite graph K(5,3). Credit: Wikipedia.

A group structure for Tamil

We can form a group structure for Tamil alphabets in many ways; simply we may apply residue classes modulo N or symmetric group of permutations modulo N for any cardinality. However, one interesting group structure with applications is the abstraction of 247 Tamil letters written on a torus; in this essay I will attempt to describe it and show that it forms a group.

We consider the 247 Tamil letters formed by 1 ayudha letter and 12 uyir letters for 13 vowels, and 18 mei letters for 18 consonants and 216 uyirmei or conjugate letters [247 = 13 + 18 + 216]. By consider a mapping of 13 vowels to Z13[residue classes modulo 13] and 18 uyirmei letters + ayutha letter to Z19 [residue classes modulo 19].

Fig. 1: The Cayley table for Z13 can represent Uyir letters.
Fig. 2: The Cayley table for Z19 can represent Mei letters (with modification)


Further we may represent each uyirmei letter as a index into a 2D table formed by rows of mei letters, and columns of uyir letters. So, for example letter ‘கு = க் + ஊ’ can be written as 6 + 1*13 = 19. Uyir letters are all represented from [0-12], Mei letters are represented as multiples of 13, [13, 26, 39, .. 234] for [க், ச், … ல், வ், ழ், ள்]. Uyirmei letters form everything in between.

The general representation of a letter can be: t = a + 13*b, where a goes from [0-12] and b goes from [0-18]. This representation pegs ‘ஃ’ at the origin. In the direct product of Z13 and Z19 this will be represented as (a,b)

Letter representation in the product group: Z13 x Z19


Further since we showed uyir and mei letters can be embedded into the Z13, and Z19 residue classes and we know 247 factors neatly into 2 primes 13 and 19, we may use the Chinese remainder theorem (which guarantees that given two sets of residue classes which are co-prime, we can form a residue class with a unique representation for the direct-sum [direct-product] of the underlying sets). In our case we are guaranteed that Z13 x Z19 direct sum structure forms an isomorphic group in Z247. This is the key result in this easy:

Tamil letters [247] have a direct product representation in group Z247 which is isomorphic to the direct product of Z13, Z19 as mapping the uyir and mei group representations.

Key result – Group representation for Tamil alphabets

While Chinese remainder theorem guarantees a ring structure, I don’t know the second operator which can take role of product to make the ring structure possible at this writing.


Tamil Entry via Keypad

One problem that seem to not draw interest from various actors in digital Tamil community seems to be the Tamil input via 4 x 3 standard Keypad.

A standard 4×3 keypad shown with digits and letters, including Japanese key entry on a vodafone device. Image credits to Wikipedia.

Problem Statement: Given a 4×3 matrix of keys in a phone keypad, how can we input the basic 13 + 18 + 12×18 = 247 letters of Tamil alphabet using this device ?

Alternate: Clearly, 247 letters have an information content of \log{2}{247} = 7.94836723158 bits or roughly 8bits. So we can simply punch in 3 keys for indicating this 8bit combination and we are done. Provide a table to the user about 247 letters and their 3-numeric key map and we have solved this problem in one way.

This is not very satisfying however; we seem to put the user to more work; we would instead like to have similar entry method in Tamil just like in English (where 3 letters are grouped per telephone key). The processor for application in the phone or mainframe can decode any ambiguity of the telephone keypad mapping into meaningful words or phrases.

Ideas: We can come up with various proposals; being lazy, and the official jester of Tamil computing community, I will try and make a simple combinatorial analysis for this problem without giving a specific solution.

Details: We can consider the factors of 247 = 19 x 13 which form a matrix of all letters representing the Tamil alphabets and we can count the partitions of this matrix onto the smaller keypad matrix. Following the roman letters of English alphabet consisting of 26 letters are fit easily into the 4 x 3 matrix on average of little less than 3 letters per key, we can also adopt a similar convention.

There are many ways to fit this large 19 x 13 matrix into a 4 x 3 matrix. Using simple combinatorial analysis we may show 19 letters can be divided into 4 groups as {19 \choose 4} (ignoring the assignment of letter groups to keys – 4! ways) along the rows. Similarly, we group along columns in {13 \choose 3} ways (and ignoring the 3! column permutation themselves). In all we have a total of {19 \choose 4}\times{13 \choose 3} = 1801371 key grouping combinations.

Clearly we have an alternate possibility of grouping the 19 x 13 matrix as a transposed matrix – i.e grouping dimension of 13 elements of Tamil alphabets into larger keypad dimension of 4, and assigning 19 elements along the fewer keypad dimension of 3. This alternative gives us {13 \choose 4}\times{19 \choose 3} = 692835

Together we have a total of 1,801,371. Thats roughly 1.8 million possibilities! Check them yourself by running this code:

. The real grand total of possible designs is to include the key permutations of the grouping we have already found, thereby adding a factor of 4! \times 3! = 144 to the previous 1.8 million so we get grand total of keypad mapping designs as 259,397,424 or 259 million keyboard combinations in all!

Conclusion: How are we going to find a suitable keypad mapping? Well we may need more heuristics and more cleverness to find the keypad mappings [a few definitely exist in this 259 million possibilities, which maximize a utility function.

So that leads us to the next problem: what is the utility of mapping a Tamil letters in the keypad ? Well – we don’t know apparently, so it doesn’t exist! This also ties into the philosophical question of what is the purpose of all software if not to support use.

தமிழ் ஒரு வடை [அதாங்க – டோரஸ்]

Lemma 1:

தமிழ் ஒரு வடை [அதாங்க – டோரஸ்]. வடை என்றால் சராசரி உளுந்து வடைதாங்க [படம்: இடது]. Donut. Torus [படம்: வலது].


இதை எப்படி நம்ம சொல்லுரது ? அதாங்க வடையின் இரு திசைகளில், உயிர் எழுதுக்களை தரை மட்டம் அளவிலும், குறுக்கே மெய்யெழுதுக்களும் அமைத்தும், இவ்விரண்டு வரிகளின் குறுக்குச் சந்திப்பு இடங்களில் அந்தந்த உயிர்மெய் எழுதுக்கள் வரும் படி அமைத்தால் தமிழும் ஒரு வடை.

ஆகயால், எவ்வித ‘அபுகிடா’ [abugida] மொழிகளையும் ஒரு வடையில் எழுதலாம்.

Theorem 1: சொற்களை வடையில் பிரதிபலிக்கலாம்.

சொற்களில் எழுதுக்கள் உள்ளன. லெம்மா 1, படி எழுதுக்கள் வடையில் பிரதிபலிக்கலாம். அடுதடுத்து வரும் சொல்லின் எழுதுக்களை அம்பின் வாயிலாக கோர்த்து அமைத்தால் அது ஒரு வடையில் பிரதிபலிக்கும் ஒரு வகையாகும்.

Theorem 2: மேற்கண்ட படைப்பின் விதி படி விகடகவி – சொற்கள் [anagram] சுழல்-வட்டமாக அமையும்

விகடகவி சொற்கள் முன் பின் திசைக்கு வேற்றுமையில்லாமல் வசிக்கும் தன்மையுடயவை. அதனால் இவை சரியாக தொடங்கும் சொல்லில் முடியவெண்டும். எனவே இவற்றின் பிரதிபலிப்பு சுழல்-வட்டமாக அமயும்.

Theorem 3: Two words that don’t intersect in torus don’t share common letters

Corollary of Theorem 3: Two words that share letters will intersect.