LTRC Home      Online Services     Downloads      Literature     Anusaaraka     Publications     NLP Intranet     Help



LTRC site has been moved to http://ltrc.iiit.net

1. Goal and Background

Language Technologies Research Centre (LTRC) was established in October 1999 at IIIT Hyderabad, when the anusaaraka group known for their work in machine translation moved into LTRC (from IIT Kanpur/Satyam). The main support for LTRC has come from Satyam Computers Services Ltd. to the tune of Rs.1 crore (Rs.10. million/US $250,000. approx.) in the first year.

Goal of LTRC is to develop technologies dealing with language. It includes technologies pertaining to translation and other NLP areas, speech processing, optical character recognition, etc.

There are two wings in LTRC:

  • Free Software Wing : The software produced by this wing is OPEN (and it may also be available to everybody free of COST).
  • Sponsored Software Wing : Software being especially produced for a company or organization.


Satyam is supporting both wings. Further support under the Free wing has also been received from Ministry of Human Resource Development, Govt. of India for a major research project.

There is also a Memorandum of Understanding between IIIT Hyderabad and Carnegie Mellon University's Language Technology Institute, for sharing software and database resources, and expertise.

2. OnGoing Activities

The major ongoing activities are described area-wise below.

2.1 Machine Translation

Major aim here is to develop aids for overcoming the language barrier. Of particular interest is the development of language accessors or anusaarakas which allow electronic content on web or other media to be accessible to readers across languages. There are two areas which are being pursued:

  1. Development of English to Indian language accessor(s). These system(s) allow a person who knows one Indian language to read and understand English documents, possibly after some training. The difficult requirement placed on the technology is that it should work in any subject domains, as opposed to narrow subject domains. This is a HIGH PRIORITY task which is being pursued with vigour as it is extremely important from the national viewpoint.

    Theory and technology developed with the anusaaraka for Indian languages is being adapted while building such a system. In addition, existing technology for English parsing is being explored from CMU and other institutions.

    Another important aspect of this effort is to involve thousands of people across the country to participate in the development of lexical resources for Indian languages and between English and Indian languages. Work done in this area is "freely" available under GPL. (Funds are being sought from Govt., public bodies, and industry to support this "free" public interest activity.)
  2. Development of language accessors from one Indian language to another. Five existing "free" anusaarakas developed by the anusaaraka group which were developed with funds from MoIT/DOE, Govt. of India available under GPL are being developed further. Emphasis is on productising some of these.

2.2 Web Search Engines For Indian Languages

Aim here is to develop search engines for Indian languages. The development makes use of the basic technology already developed as part of the anusaaraka effort, pertaining to word analyzers, dictionaries, corpus processing etc. There is a need to develop thesauri, subject index, document classification algorithms, etc.

2.3 Information Extraction and Summarization for English

The major aim here is to develop technology for extraction of information from text, say, from business manuals. The extracted information can be loaded in a datbase or other relevant software. (This activity benefits from the basic work being done on English analysis as part of machine translation.)

Another area of immediate interest for English pertains to summarization of documents. This can be used to produce summaries of documents.

2.4 Knowledge Matching

The attempt here is to match extracted information from documents with given specifications. For example, a given job requirement could be matched with resumes (say, after information is extracted from them).

2.5 Natural Language Interface For English

This is an experimental project to allow a user to interact with the machine in English for specified applications.

2.6 Display and Use of Indian Scripts Over the Internet

A major development of the Centre has been the development of plug-ins which allow a text in an Indian language on the internet to be viewed in local fonts, independent of browser, or platform, or font. (See under resources for download details.)

2.7 Basic Research: Sanskrit and Computers

Besides the above activities, basic research is being conducted in relating Western theories of language with the Indian traditional theories of language, in particular, pertaining to vyakarana, navya-nyaya, and mimamsa. This is being done in the concrete setting of applications, and not just at the theoretical-philosophical level.

A major project has been granted by Ministry of Human Resource Development to run a 2-year research programme exploring the relations between the traditional Sanskrit and the modern linguistics and information technology. Ten scholars are proposed to be admitted from 1 August 2000 in the programme.

2.8 Extension: Sanskrit and Computers

Training and teaching workshops are also organized from time to time, to train Sanskrit scholars in the new developments of NLP/MT and how their knowledge can be related to modern developments. In particular, how they can contribute to the development of NLP/MT technology.

For details of a recently held 2-week Workshop on Sanskrit and Computers see:
Events)

3 Future Activities

Future activities are planned in the areas of speech generation and recognition, optical character recognition, etc.

4 Relevant Activities


  1. Grammar Based Processing.
    1. Word Analysers.
    2. Sentence Analysers.
  2. Statistical Processing of NLP.
  3. Dictionary Building.
    1. Among Indian Languages.
    2. Between Indian and English Languages.
  4. Thesauri, Lexical Resources, and Knowledge-Base (KB)
    1. For Indian Languages.
    2. Dealing with English.
    3. Application specific.
  5. Inference Methods.
  6. User Interfaces.

5 Contributors of Free Wing of LTRC


					Free software wing of LTRC on problems of social significance and makes the results available under GPL.
					

					Current contributors
					

					(Legend: CS = Computer Science/IT)
					

Prof Rajeev Sangal (CS) [Director, LTRC]

		Volunteers
	      Ms. Nisha Sangal       (CS)
	      Ms. Surekha Raman      (CS)
	      Ms. D. Vijaya Bharathi (CS)

  Language		Linguistics		Computer Science
Ms. Nandini		Ms. Dipti M. S.		Dr. Vineet Chaitanya
Ms. Rama J.		Ms. Mona P.		Ms. Amba Kulkarni
Ms. Sarita S.		Ms. Pranjali K.		Ms. Niva Das
Ms. Prabha

7 Contact Address

Language Technologies Research Centre,
International Institute of Information Technology,
Gachibowli, Hyderabad 500 019

E-mail: rambabu@iiit.net
Phone: +(91) (40) 3001412
Fax: +(91) (40) 3001413