GPUs powering the AI revolution

Ganapathy Raman Kasi*, Muthiah Annamalai+

[This article originally appeared in the 2017 Tamil Internet conference, UT-SC, Toronto, Canada, magazine ]


The current hot trend in AI revolution is “deep learning” – which is a fancy way of talking about multi-layered convolutional neural networks; this field of study has heralded a new age in computing extending human capabilities by automation and intelligent machines [1].

These neural networks aren’t the same as neuron networks in your brain! We are talking about artificial neural networks which reside in computers and tries to mimic the biological neural network with its synapses (connections) of axons, dendrons and their activation potentials. These thinking machines have their beginnings in post WW-II research at MIT, in the work of Seymour Papert who introduced “Perceptrons,” and Norbert Weiner’s “Cybernetics”.

But do we know why there is sudden interest in these biologically inspired computer models ? It is due to GPUs which has accelerated all the complex computations associated with neural networks for it be practical in such a large scale. They allow these networks to operate on gigabytes (or even terabytes) of data and have significantly reduced the computation time from months to days, or days to hours, or hours to minutes usually by an order of magnitude – not possible in an earlier generation of computing. Before we jump into the details let us understand why we need deep learning and convolutional neural networks in the first place.

Scientific Innovations

Science and engineering have traditionally advanced by our ability to understand phenomena in natural world and describe them mathematically, since the times of Leonardo Da Vinci, Nicolas Copernicus, Galileo Galilei, Tycho Brahe, Johannes Kepler and Isaac Newton. However gaining models through experimentation and scientific breakthroughs piece-meal for each problem at hand is a slow process. Outside of Physics and Mathematics the scientific method is largely driven by an empirical approach.

It is in such pursuits of building models of unknown processes where observational data far exceed our human intelligence to divine an analytical model, the advent of deep learning and GPU based multi-layered neural networks provide an ad-hoc computable model. System identification for particular classification tasks, image recognition, and speech recognition to the modern miracle of a self-driving cars are all enabled by deep learning technology. All this came about due to the seminal work of many innovators culminating in the discovery of efficient convolutional neural networks by Prof. Geoff Hinton, who trained them by hardware acceleration via GPUs.

An original pioneer in the field of AI, before the AI winter, Prof. Geoff Hinton and co-workers [2] recently showed deep learning models that beat status-quo benchmarks on classification and prediction tasks on the following speech, text or image datasets: Reuters, TIMIT, MNIST, CIFAR and ImageNet, setting off the renewed interest in the field of AI from academia and industry giants – Google, Microsoft, Baidu and Facebook alike [3].

What is a GPU ?

GPU stands for Graphics Processing Unit [4]. These were originally designed for graphics rendering used in video games in 1990s. They have a large number of parallel cores which are very efficient for doing simple mathematical computations like matrix multiplications. These computations are the fundamental basis for machine learning methods such as deep learning. While the improvement in CPUs over years has slowed down over the years as Moore’s law has hit a bottleneck, the GPUs increase in performance has continued unabated showing tremendous improvements over the generations.

Figure. 1 (left): Deep Learning training task times as function of various GPU processors from NVidia. Figure. 2(right): AlexNet training throughput for 20 iterations on various CPU/GPU processing platforms.

Such GPUs were originally invented for shading algorithms algorithms, are now applied in training large machine learning models using a Open CL or CUDA like frameworks (variants of C-language with description for parallel execution via threading) from the vendors.

The pioneering hardware vendors include Nvidia with their GPU series like GeForce, Tesla; AMD with its Radeon, GP GPU, Google has entered this race with its TPU (Tensor Processing Unit) and some offerings from Intel for ML training applications. Nvidia and AMD are the main players in the GPU space with Nvidia laying special emphasis on parallel computing and deep learning over the years. Nvidia just announced the new Volta generation chip based GPU V100 which is about 2.5 x faster than the previous generation chip Pascal GP100 which was announced less than 2 years ago [5].Compared to CPU, however GPUs are more than 50x faster for Deep learning. Performance of GPUs as function of various GPU families in shown in Figure. 1, and for another AlexNet data set is shown in Figure. 2.

Hardware Innovation

If the Harvard architecture and RISC architecture based CPUs have been workhorses of personal computer revolution, then the advent of high framerate video-gaming pushed the CPU based graphics rendering from CPU + Video card based rendering to CPU + GPU, to CPU + GPU + GP-GPU (general purpose GPU); some of this overview is shown in Figure. 3a, 3b.

Figure. 3(a,b): Evolution of GPU performance from video graphics cards and rendering from CPU; courtesy PC Magazine [4]; Figure. 3(c): NVIDIA Tesla GPU applications in scientific research.


GPU’s are suitable for large numerical algorithms where various data have to be moved through a computational pipeline often in parallel; this SIMD problem, like genome sequencing shown in Figure. 3c, when solved by GPU gain the maximum speedup/acceleration. However, there is a fundamental limitations of GPU acceleration due to the Amdahl’s law which saturates the parallelization upto the available serial bottlenecks for a given computational task.

Software Frameworks

To build a deep learning application one may use their labeled datasets to build a learning model on any of the various frameworks [6] (both open-source or closed) provided from competing vendors in the industry as follows:

  1. TensorFlow, developed by google, python API over C++ engine, low level api, good for researchers, not commercially supported; notably Google is in process of developing a TPU – an advanced version of GPU for direct use with TensorFlow.

  2. Caffe 2, developed by UC Berkeley used at Facebook among other places, focussed on computer vision, one of the earlier frameworks to gain significant adoption, Python API over C++ and CUDA code

  3. Scikit Learn (Python based) general inference and machine-learning framework

  4. Theano written in python, grand-daddy of deep learning frameworks

  5. CNTK developed by Microsoft


Tamil applications for deep learning including providing or improving existing solutions to the problems of,

  1. Tamil Speech Recognition
  2. Tamil Character Recognition [7,8]
  3. Natural Language Processing for Tamil

Hardware acceleration and availability of big-data (labeled datasets) will play key role in the success of applying deep learning techniques to these problems.


  1. Jensen Huang, “Accelerating AI with GPUs: A New Computing Model,” link

  2. G. E. Hinton et-al. “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems (2012).

  3. LeCun, Y., Bengio, Y. and Hinton, G. E., “Deep Learning” Nature, Vol. 521, pp 436-444. (2015), link.

  4. GPU definition at PC Magazine Encyclopedia, PC Magazine, (2017) link.

  5. Tesla GPU Application notes from NVidia, (2017) link.

  6. Comparing deep learning frameworks”, (2017), link.

  7. Prashanth Vijayaraghavan, Mishra Sra, “Handwritten Tamil Recognition using a Convolutional Neural Network,” NEML Poster (2015) link.

  8. R. Jagadeesh Kannan, S. Subramanian, “An Adaptive Approach of Tamil Character Recognition Using Deep Learning with Big Data-A Survey”, Proceedings of 49th Annual Convention of Computer Society of India (vol. 1) pp 557-567 (2015), link.

Ezhil, Open-Tamil conference articles – 2017

One of major achievements of last year has been collecting inputs from our team and writing up two important papers – one for historical review and other for collective call to action on great opportunity that is Tamil open-source software.


We also take time to thank all co-authors who have pulled together their efforts at short notice to make these research works happen! Together these two papers represent a value of tens of thousands of Indian rupees, or more in the making (going by estimates of other Tamil software foundations).

We also thank conference organizers for partial travel grant toward making this presentation happen. Thank you!

Conference Articles – 2017

Ezhil, Open-Tamil conference articles – 2017 presented at Tamil Internet Conference, August, 2017, in Toronto, Canada. Both the papers were well received and good academic and development points were debated at the forum.

  1. Ezhil – எழில் மொழி பொது பயன்பாட்டிற்கும், வெளியீடு நோக்கிய சவால்களும்
    • This paper summarizes the path taken by Ezhil from inception toward delivering a fully installable product on Windows 64/32bit, Linux (Unbutnu, Fedora) systems, and offers a meditation on how students and teachers may adopt this product, and future pathways.
    • Presentation slides are here on slideshare.
  2. Open-Tamil / Open Source in Tamil – Tamil Open-Source Landscape – Opportunities and Challenges_MA_2017_final 
    • Some important contributions of this paper show collective interest in Tamil open-source which outpaces the other languages with larger speaking-population. This is a key indicator to develop better pathways to bring new developers and train them into developing Tamil software
    • GitHub Tamil language repositories

      GitHub Tamil language repositories compared with other languages, as measure of software developers interest.

    • Presentation slides are at slide-share

For questions and queries on these articles, please write to us at or leave your comments below.

Ezhil Language Foundation


Classifying Tamil words – part 2


Continuing from previous post (see part-1) I am sharing my results on classifying a Tamil alphabet sequence as a valid Tamil-like word or English-like word using a binary classifier.


You need to get scikit-learn API installed by following directions on website here.

pip install -U scikit-learn

This will also get dependencies like Numpy and other Python libraries supporting the SciKit learn.

Next ensure your installation is okay by typing,

python -m sklearn

which should run without any output if all your settings are okay.

Training the AI Classifier

To train the classifier based on multi-layer perceptron (in other words – an AI neural network)

  1. we need to represent our input as a CSV file, with each sampled encoded as a feature of rows.
    • for this case the data are in the form of CSV files representing features of Jaffna, Azhagi, Combinational transliterated output of input words
    • See: files ‘english_dictionary_words.azhagi’ and ‘tamilvu_dictionary_words.txt’ at repo open-tamil/examples/classifier
  2. each word (represented as features) will also be given training label usually as integer, forming a column data on CSV file (across all samples); typical features encoded for the data file are defined in class Field under file ‘classifier/’;
    • Typically the information for each word like number of letters, juries, medics, ayutha letters, vallinams, mellinams, idayinams, first, last and vowels are stored in feature record within CSV.
    • We can generate various feature records of the data files by running the code of
  3. next we may train the neural network using the Scikit learn API,
    • this is key code in ‘classifier/’
    • first we load the CSV feature vectors into Python as Numpy array for both class-0 (English words) and class-‘1’ (Tamil)
    • next we setup scaling of data sets for both classes
    • we pick test set, and training set which are key information to getting a good model network and generalized fit
    • We import various tools out of scikit learn like input scaler ‘StandardScalar’, ‘train_test_split’ etc for keeping up with good training conventions
    • Since we are doing classification both test and training inputs need to be scaled but not the label data
  4. Next step we setup a 3-layer neural network with ‘lbfgs’ activation function. We can fit this data with X_train data  and corresponding Y_train labels
    • nn = MLPClassifier(hidden_layer_sizes=(8,8,7),solver=lbfgs),Y_train)

      Y_pred = nn.pred( X_test )

      print(” accuracy => “,accuracy_score(Y_pred.ravel(),Y_test)

  5. The fitted neural network is capable of generating a score (goodness of fit), and immediately serialized into disk for future references; we also output diagnostic informations like,
    • confusion matrix
    • classification report
  6. Next we use the training neural network to show the results of  a few known inputs.

Fig. 2: 89% accuracy trained classifier with correct identification of word “Hello”; while both are acceptable in native script form it is a English origin word!

  1. Key points for this prediction with ANN are to keep the input transformed as a feature vector before applying it to the classifier input
  2. Once the training is complete we see results like in item [6].

Finally we can automatically tell (via a neural network) if computer is a Tamil or English origin word; there is some sensitivity in this decision due to the 10% error. I have a screenshot of the predictions for various words (feature vectors are written as output as well)

Screen Shot 2017-12-20 at 2.28.35 AM.png

Fig. 3: Neural Network prediction of Tamil words and English (transliterated into Tamil) words

Finally we would like to conclude saying various types of Artificial Neural Network topologies and hidden-layer sizes were used but we chose to stick with simplest. At this time this trained neural network seems like a quite satisfying, and even ready to use for practical purposes.


Scikit-learn provides powerful framework to train and build classification neural networks.

This work has shown easy classification with 10% false-alarm rate (or ~90% classification rate) of various Tamil/English vocabularies and out of training vocabulary sets. The source codes are provided at open-tamil site including the various CSV data etc.

Goodluck, to exploring Neural Networks. Getting beyond 90% in this task seemed hard, and long way to go.

Classifying Tamil words – part 1


One of problems faced when building a Tamil spell checker, albeit somewhat marginal, can be phrased as follows:

Given a series of Tamil alphabets, how do you decide if the letters are true Tamil word (even out of dictionary) or if it is a transliterated English word ?

e.g. Between the words, ‘உகந்த’ vs ‘கம்புயுடர்’ can you decide which is true Tamil word and which is transliterated ?


This is somewhat simple with help of a neural network; given sufficient “features” and “training data” we can train some of these neural networks easily. With current interest in this area, tools are available to make this task quite easy – any of Pandas, Keras, PyTorch and Tensorflow may suffice.

Generally, the only thing you need to know about Artificial Intelligence (AI) is that machines can be trained to do tasks based on two distinctive learning processes:

  1. Regression,
  2. Classification

Read more at the Wikipedia – the current “problem” is a classification task.


Naturally for task of classifying a word, we may take features as following:

  1. Word length
  2. Are all characters unique ?
  3. Number of repeated characters ?
  4. Vowels count, Consonant count
    1. In Tamil this information is stored as (Kuril, Nedil, Ayudham) and (Vallinam, Mellinam and Idayinam)
  5. Is word palindrome ?
  6. We can add bigram data as features as next step

Basically this task can be achieved with new code checked into Open-Tamil 0.7 (dev version) called ‘tamil.utf8.classify_letter

Screen Shot 2017-12-17 at 1.03.03 PM.png

Data sets

To make data sets we can use Tamil VU dictionary as a list of valid Tamil words (label 1); next we can use a transliterated list of words from English into Tamil as list of invalid Tamil words (label 0).

Using a 1, 0 labeled data, we may use part of this combined data for training the neural network with gradient descent algorithm or any other method for building a supervised learning model.

Building Transliterated Data

Using the Python code below and the data file from open-tamil repository you can build the code and run it,

def jaffna_transliterate(eng_string):
  tamil_tx = algorithm.Iterative.transliterate(jaffna.Transliteration.table,eng_string)
  return tamil_tx

def azhagi_transliterate(eng_string):
  tamil_tx = algorithm.Iterative.transliterate(azhagi.Transliteration.table,eng_string)
  return tamil_tx

def combinational_transliterate(eng_string):
  tamil_tx = algorithm.Iterative.transliterate(combinational.Transliteration.table,eng_string)
  return tamil_tx

# 3 forms of Tamil transliteration for English word
jfile ='english_dictionary_words.jaffna','w','utf-8')
cfile ='english_dictionary_words.combinational','w','utf-8')
afile ='english_dictionary_words.azhagi','w','utf-8')
with'english_dictionary_words.txt','r') as engf:
for idx,w in enumerate(engf.readlines()):
  w = w.strip()
  if len(w) < 1:

to get the following data files (left pane shows ‘Jaffna’ transliteration standard, while the right pane shows the source English word list); full gist on GitHub at this link

Screen Shot 2017-12-17 at 1.47.42 PM.png

In the next blog post I will share the details of training the neural network and building this classifier. Stay tuned!


எழில் – டுவிட்டரில் ஒரு தானியங்கியாக


சென்ற வாரம் எழில் மொழியை டுவிட்டர் வழி செயல்படுத்த ஒரு உத்தி ஒன்றை உருவாக்கலாம் என்று தீர்மானித்தேன். பல செயல்பாடுகள் facebook, skype, போன்றவை messenger bot என்ற தானியங்கிகள் வழி செயல்படுவது ஓர் இரண்டு ஆண்டுகளாக சமணியமாகின.  இதே போல கடந்த மாதம் கனடாவில் குறள் பாட் என்ற தானியங்கி facebook செயலி பற்றி  கேள்விப்பட்டேன்; நிரைய நாட்களாக இப்படி ஒரு எழில் இடைமுகம் கொடுக்க வேண்டும் என்று எண்ணினேன், இதற்க்கு இப்போது ஒரு காலம் வந்துவிட்டது!


ஏற்கனேவே குறள்களை புதுவள்ளூர் @puthuvalluvar என்ற முகவரியில் தானியங்கி வழி செய்திருந்தேன். இது தற்போது செயலுற்று கிடக்கிறது ஆனால் இதனை செயல்படுத்த python-twitter என்ற நிரல் தொகுப்பை பயன்படுத்தினேன்; இதனை கொண்டு @ezhillangbot என்ற புது கணக்கில் ஒரு தனியாகியை உருவாக்கினேன். இதன் மூல நிரல் இங்கு. twitter பக்கம் நீங்கள் ஒரு


இதனை ஒரு cron-வேலையாக நிறுவிய பின்னர் அனைவரும் பயன்பாடு செய்ய இப்படி உங்கள் கணக்கில் இருந்து ஒரு எழில் நிரலை டுவீட் செய்யுங்கள்;

               @ezhillangbot அச்சிடு(“வணக்கம் உலகம்!”)

இதனை படித்துவிட்டு தானியங்கி உங்கள் பெயரை சூட்டி நிரலின் விடையை அளிக்கும்; உதாரணம்,


இதனை நீங்களும் பரிசோதனை செய்து எனக்கு தகவல்கள் சொல்கிறீர்களா ? டுவிட்டரில் நேர்வழி சொல்லுங்கள், இல்லகாட்டி இங்கும் சொல்லுங்கள்!




Tamil language model


Last week I collected letters (323 letter forms)  from open-tamil and estimated the unigram, bigrams and trigram frequencies in a given Tamil lexicon with about 65,0000 odd words. The interesting results are found in this Open-Office Calc spreadsheet.

This was somewhat of a enjoyable exercise to me, to revisit some of the hardwork I have done in Open-Tamil, particularly in the utf8 module, among other contributions to Open-Tamil library from a wider team.

However, whats in it for you, dear reader ? To cut to the chase, here is all the meat and potatoes of the results:

  1. Tamil word frequencies sorted by word-length for the 65k words show a mean wordlength (using weighted average) of 5.404; 5 is a beautiful Prime number and Indian mythology will also have some suitable references.
    • This word frequency distribution comes out like the following (y-axis log scale)

      tamil word frequency as function of word length

      Fig. 1. Tamil word frequency as function of word length

    • Word Length Frequency
      1 102
      2 1799
      3 6434
      4 13200
      5 14489
      6 11636
      7 8119
      8 4626
      9 2224
      10 817
      11 286
      12 104
      13 26
      14 24
      15 8
      18 1
      19 1
  2. Unigram data show Zipf’s law like distribution (e.g. from NLP course material); also we see only 100 of 323 possible letter forms in Tamil make up the text of the lexicon. One wonders of a Samuel Morse that sent his telegram’s for Tamil, if he would have chosen ‘.’ to represent the ‘ம்’ ? However Tamil reading or recitation of Morse code would be like jathi-reciting Barathanatyam dance teacher. Dit daa daa. The first 100 most frequent letters in lexicon are presented here.
    Letter Frequency
    ம் 18164
    ல் 14165
    த் 9540
    க் 8257
    ன் 8133
    தி 6625
    கு 6154
    ப் 5809
    ட் 5690
    டு 5566
    ர் 5503
    ரு 4536
    பு 4292
    கா 4262
    து 4162
    வி 3838
    டி 3798
    ண் 3773
    சி 3720
    ரி 3379
    ங் 3284
    ந் 3254
    ற் 3099
    று 2811
    ச் 2811
    சு 2751
    பா 2705
    கி 2625
    பி 2614
    வா 2569
    மு 2458
    ள் 2432
    லை 2212
    டை 2156
    தா 2154
    கை 2121
    மா 2015
    ய் 1916
    சா 1837
    லி 1744
    வு 1522
    கொ 1497
    நி 1465
    ஞ் 1461
    ரா 1452
    ணி 1450
    ளி 1432
    யா 1421
    நா 1303
    றி 1263
    கோ 1260
    செ 1236
    ழி 1234
    னி 1219
    ழு 1122
    மி 1117
    யி 1095
    பொ 1082
    ரை 1057
    வெ 1036
    மை 990
    றை 976
    பூ 949
    னை 937
    லா 911
    சை 837
    வை 822
    போ 815
    கூ 802
    வே 797
    டா 793
    தை 786
    பெ 765
    ளை 764
    தே 674
    ழ் 618
    லு 613
    நீ 581
    • Fitting the Zipf’s law to the Unigram data looks quite interesting too:
  3. Bigram data also has promising structure as Shannon would have imagined it would be from a human language; these things are known to have redundancy, structure and predictability.
    • First 2000 bigrams occupy more than 50% of all observed bigrams.
    • Lexicon contained only ~ 13.25% of all possible bigrams in the wild!
    • This sparseness of bigram data indicates mainly a limited data set or highly structured vocabulary in Tamil, but I’ll wager the former.
    • Zipf’s law fit is not as nice as for unigrams but here it goes: 
    • The top 100 bigrams, by frequency, are the following:
    • தல் 8670
      த்த 4645
      க்க 2844
      கம் 2824
      த்தி 2160
      ரம் 2023
      க்கு 2019
      தம் 1893
      ட்டு 1805
      ப்பு 1757
      டுத 1641
      ப்ப 1582
      யம் 1532
      த்து 1517
      ம்ப 1484
      னம் 1402
      ம்அ 1397
      ந்த 1305
      ங்க 1286
      டம் 1233
      லம் 1230
      ட்ட 1108
      க்கா 1082
      சம் 985
      ட்டி 976
      ம்பு 958
      கன் 904
      ம்க 904
      ல்க 882
      க்கி 869
      திர 852
      ந்தி 823
      ணம் 819
      ம்ச 799
      ங்கு 797
      ச்சி 789
      ண்ட 767
      ர்த் 757
      கட் 755
      குத 743
      ம்இ 729
      ப்பி 720
      கண் 716
      ரன் 712
      ல்அ 707
      கார 692
      ற்று 689
      ப்பா 688
      ம்ம 681
      வன் 672
      ம்பி 641
      ச்ச 632
      ம்ஆ 624
      தன் 617
      வம் 599
      கர 592
      பம் 587
      கல் 581
      ம்உ 534
      கரு 534
      ல்ப 530
      யன் 519
      றுத 517
      ல்வ 515
      ந்து 511
      த்தா 510
      ச்சு 502
      ம்பா 500
      ஞ்ச 495
      டுத் 492
      பிர 490
      ரிய 488
      டித் 480
      படு 477
      ல்த 475
      ல்கு 467
      ல்உ 467
      னல் 462
      ளம் 459
      ன்அ 456
      ற்ற 450
      ட்டை 443
      திரு 442
      ருத் 435
      ல்இ 431
      ங்கா 415
      ன்ன 414
      தலை 411
      வர் 406
      ம்த 403
      ன்ம 398
      ன்க 394
      க்கொ 392
      ண்டு 391
      ம்வி 388
      ல்வி 384
      மம் 384
      ர்க் 384
      டுக் 381
      ல்ம 379
  4. Moving on to trigrams we find, even more sparseness since the data is so limited – 65k words with total letter size of 345,315 letters only. Of the possible 323^3 = 33,698,267 ~ 34 million trigrams we have only 107,715 trigrams present in the Lexicon, about 2%, making this is the weakest dataset yet.

    • About 10,000 trigrams form more than 50% of the available trigrams from the data-set with rest of trigrams occurring sparsely.
    • The most frequently occurring 100 trigrams and their frequency in this lexicon are shown below:
      த்தல் 2992
      டுதல் 1573
      குதல் 674
      தல்க 575
      ட்டுத 557
      க்கம் 513
      த்திர 479
      தல்அ 476
      றுதல் 472
      ர்த்த 446
      ட்டம் 431
      டித்த 431
      த்தம் 382
      கட்டு 380
      தல்ப 374
      தல்த 353
      தல்உ 351
      துதல் 332
      டுத்த 331
      காரன் 325
      திரம் 322
      க்கட் 310
      தல்கு 309
      க்கார 298
      ந்தம் 295
      க்குத 293
      தல்வ 288
      ங்கம் 280
      தல்இ 271
      படுத 271
      ங்குத 256
      த்துத 250
      த்திய 250
      ந்திர 250
      தல்மு 246
      ளுதல் 245
      தல்ம 242
      தனம் 242
      சனம் 234
      ய்தல் 228
      ர்க்க 228
      ப்படு 227
      தல்வி 226
      கம்அ 222
      க்கல் 218
      காரம் 218
      ரித்த 208
      தல்ச 206
      ர்தல் 206
      பத்தி 202
      தம்அ 201
      ருத்த 196
      ள்ளுத 195
      தல்பு 193
      கம்ப 192
      ண்டம் 185
      ரம்அ 183
      த்துவ 180
      ம்பிர 174
      ட்டுக் 174
      வுதல் 171
      தல்கா 170
      ரணம் 170
      ற்றுத 168
      தல்ந 167
      யம்அ 165
      ரியம் 164
      கொள்ளு 164
      தல்சி 164
      லுதல் 162
      கரம் 162
      புதல் 160
      கம்க 159
      தல்ஒ 158
      சுதல் 157
      தல்நி 156
      ர்த்தி 156
      ப்பிர 155
      ணுதல் 154
      காட்டு 153
      தல்கை 150
      தல்பி 149
      போடுத 148
      தல்ஆ 147
      கண்ட 147
      க்கிர 146
      தியம் 146
      தல்எ 145
      சித்த 145
      தல்சு 144
      வைத்த 143
      க்கர 141
      ரம்க 140
      த்தன் 138
      தல்து 138
      காலம் 138
      மரம் 137
      ரம்ப 137
      விடுத 136
      சங்க 135


It is quite easily possible to build a random word/text generation in Tamil with these statistical data and smoothing information for the missing 80% bigram, 98% trigram data using Monte Carlo techniques. Further word-level frequency, word-level bigram and trigram data will make a more relevant text generation at the sentential level.

More later. Adios amigo.

எழில் உதவி ஆவணம் காட்டி மேம்பாடு

எல்லா மென்பொருளிலுமே ஒரு “Help” (உதவி) மெனு கொடுப்பது IT துறையில் உள்ள ஒரு எழுதாத சட்டம் என்றே சொல்லலாம்.

இப்போது இந்த எழில் செயலி எழுதியில் ஒரு உதவி ஆவணம் கட்டியை முன்னாடியே இணைத்திட்டோம். ஆனால் அது மனசுக்கு பிடித்தமாதிரி இல்லை.

இப்போது இந்த window (சாளரத்தில்) எனக்கு மூன்று விஷயங்கள் பிடிக்கவில்லை; இதற்க்கு படம் 1 உதவியாக இருக்கும்:

  1. புத்தகத்து தலைப்பு “0” என்று சொல்லக்கூடாது; இது நீக்கப்படவேண்டும்.
  2. அடுத்து அத்தியாயங்களின் தலைப்பு இடது பக்கம் ஆரம்பித்து  இருக்கவேண்டும்.


    படம் 1:   எ ழுதி உதவி ஆவணம் காட்டி

  3. அடுத்து அத்தியாயத்தின் தலைப்பை பெரிய எழுத்துக்களில் இருக்க செய்யவேண்டும்.

இவற்றை github வழு பட்டியலில் issue என்று பதிவு செய்தேன்



இதனை GTK3+ documentation கைவசம் வைத்து நிரலை மாத்தி எழுதலாம். இதற்க்கு முன்னும் பின்னும் ஒரு மணி நேரம் மேல் ஆனது.

ஒவ்வொரு முறையும் ezhil-lang/editor/ என்ற நிரலை மாற்றிய பின் python என்று இயக்கி “உதவி > புத்தகம்” என்ற மெனுவில் இருந்து சுடக்கி இதன் தோற்றத்தை சரிபார்க்கவேண்டும்.

எல்லாம் சரியானதும் இது போன்று காட்சி அளித்தது:


படம் 2: திருத்தம் செய்த நிரலில் “உதவி > புத்தகம்” தோற்றம்

இதனை github-இல் உடனே சேர்த்துவிட்டேன். இப்போது எழில் ஆவணம் காட்டி மேம்பாடு செய்தாச்சு ! மீன்டும் பார்க்கலாம்.