Bangla in GNU/Linux : making the Penguin speak in Bengali

The need to localize
In a Sourceforge article on localization, Prof. Venkatesh Hariharan makes a pertinent comment - “The localisation of Linux to Indian languages can spark off a revolution that reaches down to the grassroots levels of the country”

The problem faced in harnessing the power of ICT(information & communication technologies) in India is that all such enabling software is in English - a language that a mere 10-15% of the population is proficient in. The dialogs, menus, interfaces are in English as is the documentation.Although such a scenario is desirable and advantageous to the point that it establishes a ‘lingua franca’, in case of countries like India and Bangladesh the situation leads to ‘technological poverty’.

To bridge the digital divide by taking technology to the masses, one of the important aspects of the Free Software and Open Source Software movements has been taking GNU/Linux OS and adapting it to the local cultural nuances. Localization is a national level collaborative effort to “Indianize” software and make culture an integral part of the computing experience. Locale handles the cultural conventions as for example the formatting of date and time, the representation of numbers, the symbols for currency etc. The glibc package currently possesses both bn_IN and bn_BD (for India and Bangladesh respectively)

Bangla localization

Bangla (or Bengali) is one of the most important languages in the world in terms of the number of users. The language possesses a well-established phonetic script and the Unicode range for Bangla Characters is from 0980 to 09FF. However, Indic scripts are notoriously difficult to add support to due to the existence of conjuncts (’yuktakshars’) and non-standard spellings. Thus, one of the major difficulties in localizing GNU/Linux in Bangla is also the absence of suitable font (font sets). Unicode represents Bengali text as a sequence of Bengali characters. Unlike most European scripts, just rendering these characters is not enough for Bengali (and other Indic scripts); it is necessary to form new glyphs by combining several characters. Until recently, there was no accepted standard that described it completely. This has been addressed in the extension to the TrueType font format known as Open Type (along with some rules in Unicode for reordering the characters before combining them). A description of the parts of the specification relevant to Indic scripts is available through Microsoft’s typography site Under the Windows operating system, Internet Explorer uses the Uniscribe layout engine to render open type features. Recent versions of Windows, e.g. Windows 2000 or XP, ship with the required version, and installing a recent version of Internet Explorer is usually sufficient for older versions. On GNU/Linux, the Free type library has implemented Open Type Layout features, and the rapidly developing Pango project is trying to deal with other internationalization issues. Although the Pango rendering engine which handles all text rendering in the GTk2/Gnome2 environment) has a module for rendering bangla, open source browsers like Mozilla still do not possess support for Bangla rendering. This combination is already in working condition on recent distributions, though none of the more popular browsers use this technology yet. A completely independent implementation is in the Unicode Text editor Yudit, available for both Linux and Windows. Even with a layout engine that implements open type layout features, an open type Bengali font is required before one can view Bengali text. Several efforts have started recently with the aim of creating free Bengali open type fonts, and are available from the Free Bangla Fonts project The Free Bangla Fonts Project

The Free Bangla Fonts Project - a volunteer run collaborative model based project dedicated towards creating Free, high quality, completely Unicode compliant Open Type Bengali fonts. The project positions itself as the central resource for getting and developing Free Bangla fonts. The initial aim of this project is to release a full set of Bangla fonts that supports all the major Bangla Yuktakshars conjuncts). The Akaash set of fonts (with more than 650 glyphs) aims to be such a set. The project also plans for a conversion of other existing Free Bangla (non Unicode compliant) fonts into Unicode compliant Bengali Open Type fonts.Currently the team is working on four sets of fonts. Sayamindu Dasgupta is developing the Akaash set. This set will be having three OTFs, AkaashNormal.ttf, AkaashWide.ttf and AkaashSlanted.ttf. Dr. Anirban Mitra is developing the Ani set. It has two fonts, Ani.ttf and Mitra.ttf. The Mitra font is a monospaced font, which is useful in certain specialized applications. Dr. Mitra is also developing the Mukti set of fonts (which uses the glyphs donated by Cyberscape Multimedia Limited, Mumbai). This set has four fonts, MuktiRegular.ttf, MuktiBold.ttf, MuktiNarrow.ttf, and MuktiNarrowBold.ttf. Deepayan Sarkar is developing the Likhan set of fonts. Taneem Ahmed has packaged all the fonts under development in the project into a RPM file for RedHat 8.0. Bangla Innovative Open Source(BIOS)

BIOS is the brainchild of a band of programmers and students with the aim of utilizing the open source aspect of GNU/Linux to take computing to the masses. Based on a distributed collaborative model of project management, BIOS aims to create:

1. Bengali Open Office - full featured, open source Bengali office suite
2. Bengali database - Bengali Unicode based implementation of database
3. Bengali Speech application - Open Source speech recognition and text-to-speech application

BIOS believes that the prevalent standard interfaces in English form a barrier to taking computing power to all levels of society. The licensing format under which GNU/Linux is available makes it an ideal content delivery medium for grassroots level social programs as well as educational content delivery platform. Thus it aims to create a ‘Bangla’ interface for both the graphical (or X) and text modes. Such an effort will make it economically feasible to install computers at village levels, thus ushering in a knowledge-based economy built on community-based knowledge sharing platform. Bangla Gnome Translation Project - Ankur

Ankur (the Gnome Translation Project has been named as such by Dr K Ghosh) is working toward supporting Bangla(Bengali) language on GNU/Linux operating system. A majority of the projects are focused on XFree86.org’s Xserver, however some are also platform independent and adds supports of other operating systems. Ankur project has as its primary goals the following:

1. Translate popular and major XServer applications
2. Providing Bangla support for some major XServer applications such as office suites, database, development tools and desktop environments like GNOME,KDE. The aim/intention is to help develop and maintain open source/free software targeted towards the Bangla speaking users.
3. Create awareness among Bangla speaking computer users .
4. Content creation with the aim of educating people about GNU/Linux and FLOSS movement.

On 02/02/2003, the project team released bspeller-0.4 Ankur is also involved in the Bengali Dictionary Project. Kaushik Ghose outlines the aims of this project as:

1. Bangla dictionary
2. Webpage interface to bangla dictionary
3. CD version or an offline version of web interface
4. Various converters to turn bangla dictionary into say ISCII, higher ASCII for display in other fonts
5. Various interface programs a) a dictionary GUI, b) a command-line version of the GUI (can act as spellchecker for other programs)

‘Progga’ states that the project is in need of volunteers so as to attain the deadline of August, 2003 (when Gnome v2.4 is scheduled to be released). Till date, approximately 30% of the project has been completed. The translation project is one of the first ‘team-oriented’ project of Ankur and is based on the Open Source Software development model. The current volunteer strength is around 10, with profiles varied across all levels. The project allows volunteers to download files, after duly notifying others using the mailing list. After completing either partially or fully, the files are posted for peer review. These are generally reviewed once and then committed to the CVS. However, in Progga’s opinion one of the major constraints to a successful completion of such a distributed project is the lack of publicity as well as the low level of motivation of the volunteers, especially Bengalis. And he states that more often than not there have been cases of people who after exhibiting initial interest have just disappeared. While this in some cases can be rationalized as to the cutting-edge technology used, in others it can be attributed to being daunted by the task at hand. Conclusions

Localization projects must follow the bazaar model of distributed development. While the robustness and the stability of this model is well established in various successful implementations, the localization and more specifically the Translation project suffers form the lack of a firmly established command and control structure. The project, till recently, was lacking a common ‘word pool’ for words that need to be part of translated strings on a regular basis. The peer review cycle along with the existing model needs to be modified and re-structured so as to ensure that the translations are consistent in quality. As is the need to create a localized set that is encapsulates the dialects and semantics of the common populace. However, these difficulties are the part of any such project. Given that within a span of 2 months more than 25% of the translation has been completed, it might not be too much ambitious to say that we can be sure to see this group meet their deadline. Till then we will be wishing them all the best. References & Links:

* Bangla Penguin Project: www.banglapenguin.org
* Bangla Gnome Translation Project - Ankur: www.bengalinux.org
* BIOS: banglalinux@yahoo.com
* Deepayan Sarkar’s page on archive of Bengali Documents on the Internet: www.stat.wisc.edu/~deepayan/Bengali/WebPage/bengali.html
* Free Banglafonts Project: http://savannah.nongnu.org/projects/freebangfont/
* Kaushik Ghose: kghose@wam.umd.edu
* Progga: abulfazl@juniv.edu
* Sayamindu DasGupta’s homepage: www.peacefulaction.org/sayamindu/
* Indian Linux Users Group - Kolkata Chapter: www.ilug-cal.org
* Prof.Venkatesh Hariharan is with the Indian Institute of Information Technology, Bangalore. He can be reached venky@indlinux.org

Sankarshan Mukhopadhyay is a Free Software enthusiast and a member of the Indian Linux Users Group-Kolkata Chapter http://www.ilug-cal.org/. His blog ‘Random Thoughts’ is at http://sankarshan.blogspot.com/ He can be reached at sankarshanm@softhome.net.

Leave a Reply

You must be logged in to post a comment.