The Kamusi Project

Parser

Thanks to hard work from Martin and Andrew the beta version of the parser is now live...

So if you type say "anachotaka" and ask for a Swahili-English translation, it will recognise this as coming from verb "-taka".

It checks words against 42 different tenses - something like 30,000 possible permutations of subject prefixes and tense, object and relative infixes.

It is a beta version and not perfect - please try it out and post here if you find any serious bugs. If the project can raise more money the hope is to add the ability to recognise derived verbs such as "pigwa" "pigana" "pigania" which all come from "piga".

Parser - and thanks to Jacob!!

Jacob, you're too modest.

Jacob has performed an extraordinary public service helping with the development of the parser. He did most of the work of converting the verb rules that we identified into coherent computer code, which Andrew was then able to adapt for the Kamusi engine. Without Jacob's persistence and assistance, the Kamusi Parser would have remained a diagram for many more months or years, rather than becoming a working software program.

Tunakushukuru, ndugu!
--Martin

Parser beta version

Mama yangu!!! Mradi huu ni wa kuvutia sana! Wapi naweza kupata nakala ya toleo hili la Beta nijionee?

So how does the parser work?

So how does the parser work? I'm familiar with several morphological parsing engines, such as Xerox's Finite State Toolkit. But I presume you coded this in some generic programming language, such as perl?

Do you make the source code for the parser open source?

Mike Maxwell
CASL/ U Md

to answer my own question...

OK, I just ran into the answer to my questions, over on the "Swahili Translation Software" forum (http://research.yale.edu/swahili/learn/?q=en/node/197). I'm not a Java programmer, but it's easy enough to see what you're doing. I know your project is strapped for $, but you might want to consider using a more generic morphological parsing engine, that would allow you to separate the declarative linguistic knowledge (Swahili morphology) from the procedural code...The result would be, IMHO, much easier to maintain.

The Xerox tools are a good choice (http://csli-publications.stanford.edu/site/1575864347.html). $40 for the book + the program (but a new edition is due to come out any day, so you might want to wait). No, I don't get any $ from that, I'm just a happy user :-!.

Mike Maxwell
CASL/ U Md

Thanks for your

Thanks to hard work from Martin and Andrew the beta version of the parser is now live...

So if Mb Star C3 Pro
Launch X431
Lexia-3 you type say "anachotaka" and ask for a Swahili-English translation, it will recognise this as coming from verb "-taka".

 | 

Search kamusi.org text pages


to

User login

Swahili Bookstore Books, Music, Movies, More!

Pour les francophones:
Le swahili simplifié

Languages