Open-Tamil v0.7 release

I’m happy to announce Open-Tamil 0.7 release, today 23rd March, 2018. Open-Tamil is distributed under MIT license, and available for Python 2.6, 2.7, 3+ and PyPy platforms, via the Python Package index at

You can install the package via ‘$ pip install –upgrade open-tamil’ command issued in your console.

Following updates are made to the Python package:

  1. Series of command line tools will be installed into your Python (for Windows) or local/bin directory (for Linux) with this release. The command line tools are,


1. tamilphonetic – convert EN input to Tamil text

2. tamilwordfilter – filter Tamil input only from all input text data

3. tamilurlfilter – filter Tamil text from the input website data

4. tamiltscii2utf8 – convert encoding from TSCII to UTF-8 for input file

5. tamilwordgrid – generate a crossword from Tamil input text and write to output.html file

6. tamilwordcount – like UNIX wc program but for Tamil

  1. Transliteration package updates: Reverse transliteration functions is added; Univ of Madras scheme support is added.
  2. Tamil package: added text summarizer tool via module ‘tamil.utils.SummaryTool’
  3. Solthiruthi package updates: To do spell checking reasonable times and ability to identify and correct many classes of errors are added.
  4. Bug fixes for issues in get_letters(), tamil.numeral, added capability for generating string version of numerals in Tamil [previously only numeric version was supported]

In addition to the package, a web interface was developed for Open-Tamil in Django hosted at for demonstrating some of our capabilities.

We like to thank all our contributors in general, and in particular those members who contributed new code or bug fixes going into this release.

Previous release was v0.67 on Aug 23rd, 2017 and v0.65 was released on Oct 22nd 2016. Please share the word, and send us any bugs, feature requests or feedback via our github page


Muthu for Open-Tamil team.

Chennai, India.

நிரல் அலசிஆராய்தல் – art of debugging

Debugging – அதாவது கணினியில் பிழைகளை கண்டு திருத்தம் செய்வது எப்படி ? பைத்தான் மொழியில் இது சற்று சகஜமானது : முழு விவரம் இங்கு.

What is Debugging ?

Computer programs don’t always work like how we want them to. So at times we need to stop the program in the middle of execution and inspect them. By doing that – looking at the variables, functions, statements/source code in the debugger – we can understand the problem better than before and by stepping through the source code we can understand the source of the error to arrive at a solution.

This may sound somewhat complex, but in practice its quite repetitive and you will get the hang of it. Its the equivalent of a software detective work, and it is surprisingly fun, and you keep getting better at it with more practice.

How to Debug Python Code ?

To debug python we use the python module ‘pdb‘ [read documents இங்கு]; pdb is named evocatively like the more famous, powerful gdb – GNU source debugger. The simple usage is to call your program throwing the error from the command line as follows,

$ python -m pdb

Once you see the (Pdb) prompt you can do the following:

  1. Setup a breakpoint at a particular function, class or module
  2. Resume the program running and
  3. Wait for the program to enter the breakpoint code or hit an exception
  4. At this time Pdb will enter the breakpoint and give you options to inspect variable, function, the call stack, and step up or down the frames
  5. For exceptions caught by gdb, we can go through the same scope variables and source-stepping inspections only via the post-mortem execution

Finally, you can figure out the cause of the problem and fix it!

Bon Voyage. You are starting on a powerful journey to write cool software and fix buggy ones!

Goodluck, and Godspeed.

Open-Tamil and Ezhil updates (2016)

Today we are releasing updates to two packages maintained by Ezhil Language Foundation;

  1. open-tamil v0.65
    • open-tamil package contains minor bug fixes and solid performance on Python 2 and Python 3.
    • pip install –upgrade open-tamil
  2. ezhil-v0.82
    • Fix some issues for Python 3 installation from the previous release
    • pip install –upgrade ezhil

Both these packages maybe downloaded from PyPi (Python Package Index) via ‘pip’ command.

Thanks very much to the original contributors, bug reporters, and Tamil open-source software (TOSS) enthusiasts.


Lessons from Open-Tamil Library for Indian Language Applications – PyCon India 2015

The open-tamil team is proposing a talk, at upcoming PyCon India 2015, titled “Lessons from Open-Tamil Library for Indian Language Applications,”

The first 20 years of Indian languages on the Internet have been spent debating encoding schemes and editors, to be concerned with application development. India with its rich enthno-linguistic history needs to preserve and grow this heritage in the digital space. We believe this can be done only through writing novel, and useful applications specific to each languages.

  • As a community developed effort, and due to proximity of the various Indian languages, we believe Open-Tamil can form a prototype open-source toolbox for other Indian languages.

support our talk by voting for the open-tamil library at the Python 2015, here.

தொடக்க அளவு பைத்தன் (Beginning Python)

தொடக்க அளவு பைத்தன் கற்பதற்கு, Swaroop எழுதிய “Byte of Python” என்ற மின் புத்தகம் மிக அறுமையானது. இதை இங்கிருந்து தரவிரக்கலாம்.

இந்த புத்தகம் பல மொழிகளில் மொழிபெயர்க்கப்பட்டாலும் இன்னும் தமிழில் வரவில்லை. இதற்கு தன்னார்வலர்கள்  இங்கிருந்து பங்களிக்கலாம்.

open-tamil updates (Jan 2015)

I have the pleasure of introducing a few new features in Open-Tamil this week. Among them are quite novel features like Tamil regexp for pattern matching, and Tamil numerals in American counting system. You also probably know Open-Tamil is supported in Python 2 and Python 3 as well, in the development version 0.32.

  1. Tamil tamil.utf8.get_letters bug fixes
    1. get_letters function had a few subtle bugs and somewhat of fuzzy algorithm. With this update, we can completely split a given UTF-8 string into the constituent letters.
    2. get_letters_iterable function is also updated for bug-fixes and works with smaller memory footprint using the iterators in Python.
    3. We do this in linear time O(n)
  2. Numeral generation from open-tamil in American style
    1. Previously we introduced a numeral generation using the Indian convention of crores, lakhs upto 1 lakh crore; tamil.numeral.num2tamilstr
    2. In this update we can convert numbers using the million, billion and trillion of the American numeral system, in Tamil words.tamil.numeral.num2tamilstr_americanAn example of conversions is shown from our test suite,
      def test_numerals(self):
        var = {0:u"பூஜ்ஜியம்",
        long(1e7):u"பத்து மில்லியன்",
        long(1e9-1):u"தொள்ளாயிரத்து தொன்னூற்றி ஒன்பது மில்லியன் தொள்ளாயிரத்து தொன்னூற்றி ஒன்பது ஆயிரத்தி தொள்ளாயிரத்து தொன்னூற்றி ஒன்பது",
        3060:u"மூன்று ஆயிரத்தி அறுபது",
        21:u"இருபத்தி ஒன்று",
        1051:u"ஓர் ஆயிரத்தி ஐம்பத்தி ஒன்று",
        100000:u"நூறு ஆயிரம்",
        100001:u"நூறு ஆயிரத்தி ஒன்று",
        10011:u"பத்து ஆயிரத்தி பதினொன்று",
        49:u"நாற்பத்தி ஒன்பது",
        55:u"ஐம்பத்தி ஐந்து",
        1000001:u"ஒரு மில்லியன் ஒன்று",
        99:u"தொன்னூற்றி ஒன்பது",
        101:u"நூற்றி ஒன்று",
        1000:u"ஓர் ஆயிரம்",
        111:u"நூற்றி பதினொன்று",
        1000000000000:u"ஒரு டிரில்லியன்",
        1011:u"ஓர் ஆயிரத்தி பதினொன்று"}
        for k,actual_v in var.items():
            v = tamil.numeral.num2tamilstr_american(k)
            print('verifying => # %d'%k)
    3. There were a few minor bug fixes
  3. Tamil regular expression processing
    1. Regular expression is form of finite automata. These are machines with local states which may be used for pattern matching.

    2. We have introduced new API in the Python module ‘tamil’ under the namespace ‘regexp’ which will expand Tamil letters into fully formed regular expressions, and can work in tandem with Python re module.
    3. example: the following ‘pattern’ will matching the elements 1, 2, 6 of the list variable ‘data’.
      pattern = u"^[க-ள].+[க்-ள்]$"
      data = [u"இந்த",u"தமிழ்",u"ரெகேஸ்புல்",u"\"^[க-ள].+[க்-ள்]$\"",\
              u"இத்தொடரில்", u"எதை", u"பொருந்தும்"]
      expected = [1,2,6] # i.e.தமிழ், ரெகேஸ்புல், and பொருந்தும்
    4. Another simple example experimenting with Tamil #regexp: pattern = u”^ரிச்.*[க்-ழ்]$” matches strings like ரிச்மாண்டின் and ரிச்மண்டில்.
    5. Tamil wikipedia article on has a good explanation on regular expressions (சுருங்குறித்_தொடர்).

An example:
You can do a lot more things with open-tamil, and a simple example demonstrated in the item #3 above.

Have a nice weekend. Share your comments, and thoughts on open-tamil below.