
To achieve the Kamusi Project's ambitions of documenting every word in Africa, we are working with numerous partners around Africa and beyond, on a variety of linguistic and technical fronts. The modular design of the Kamusi Project architecture means that we can work on many components simultaneously, for different languages and different types of terminology. We have partners and plans for several new projects that can get underway rapidly once funding is available. Among these:
- Language for Health . This project will bring together more than two dozen partners in the production of health-related terminology, document translation, and software tools. With an initial focus on the pressing public health needs surrounding maternal health and HIV/AIDS, partners in at least ten countries will develop an open dataset that will adhere to the standards of the World Health Organization and the International Standards Organization. Results will be widely distributed through KamusiTERMS (Kamusi for Technology, Economy, Rights, Medicine, and Science), our initiative for coordinated participatory terminology development throughout Africa in numerous technical domains.
- Taifa Leo Corpus Mining . Kenya's oldest Swahili-language daily newspaper will make available digitized versions of its entire archive, dating back to 1958. Working with the University of Nairobi, we propose to establish a research database containing the full corpus. Using advanced linguistic analysis to mine the corpus, we will be able to harvest for the Swahili dictionary every word that has appeared in the newspaper during more than 50 years, along with usage examples and time trend data.
- SALT Across the Sahara (Songhay and Amazigh Lexicography and Terminology). We intend to join forces with The Dictionary of the Amazigh Language that is aimed at creating a comprehensive lexicon for the Tamazight language family (a.k.a. Berber) spoken by North African minorities from Morocco to Egypt, and Songhay.org that is documenting the Songhay languages of Mali and Niger in the central Sahara. We will make the systems of our three projects interoperable and synchronize the databases. This unified dictionary will follow the path of the great salt caravans that for centuries bridged the Sahara to facilitate the flow of trade and knowledge.

- E-Gikuyu, E-Tswana, E-Kabuverdianu . We are currently developing PALDO, the Pan-African Living Dictionary Online, which will interlink numerous languages through the dictionary building and dissemination systems of the Kamusi Project. Work on any one language can proceed independently, as partners step forward. We currently have strong partners with whom we have developed solid proposals for Gikuyu (one of Kenya's major languages), Setswana (spoken throughout Botswana, and also a major official language in South Africa) and the Kabuverdianu language of Cape Verde. E-Gikuyu will additionally involve academic collaboration with the University of Nairobi and the School of Oriental and African Studies in London on the development of computer assisted translation tools and their use in trilingual translation among Gikuyu, Swahili, and English, and will serve as a model to work with several more languages of Kenya. E-Tswana will include advanced development of morphological analysis and speech recognition tools for Bantu languages. E-Kabuverdianu will additionally provide a much-needed Portuguese component that will enable future work in Lusophone Africa. PALDO components for many other languages are also waiting in the wings.
- SMS to Extend Reach and Quality . Africa had 508.6 million mobile telephone subscribers as of October 2010, meaning at least half the population has access to text messages. We have a working prototype system to deliver dictionary data via SMS. However, people would not use such a service if they risked getting poor results, so we have also devised a model whereby queries that are not in the Kamusi database will be sent to team members for rapid completion. The database will thus be improved for future queries, and team members will be rewarded with mobile airtime. Using SMS to provide access to mobile subscribers is a neat technical solution to overcome Africa's low rate of internet penetration, but involves unavoidable telecom costs.
- The Universal Compendium of Recurring Concepts (UnCorc). No matter the language, people repeatedly seek certain types of information. This information is often available online through databases, news outlets, or other informative websites – but usually only in one or a limited selection of languages. For example, many people search for current weather information, and most of what they seek can be described by a finite list of terms, such as “high temperature,” “wind speed,” and “occasional rain showers.” If those concepts were stored in a multilingual database, content producers could link their data to the concept, and users would see the concept displayed in their preferred language. Taking localization to a new level, data producers will be able to provide their core data in any language without ever attempting their own translations. UnCorc will use the cooperative systems developed for PALDO and KamusiTERMS to produce a range of vocabulary lists that can be queried by remote websites, for categories ranging from banking to transportation to sports. While the idea for UnCorc grows from a desire to make recurring data available to speakers of African languages, linguistic walls similarly block all sorts of data flows within Europe, Asia, and the Americas. UnCorc will therefore be open to any language for which participants are available to help develop the data.

- Community Dictionary Building: Word-a-Day for Your Language . With 2000 languages spoken in Africa, it will be a long time before we can launch formal components for each one. However, we can provide the space for languages to incubate their own dictionaries within PALDO, using the social power of the cloud. Through networks such as the African Languages Group on Facebook, we can reach thousands of speakers of African languages. We propose a system for volunteers to register for their language. They will be sent one word a day in a transnational language, and provide the equivalent term and definition in their mother tongue. If they do not reply within 24 hours, the term will be returned to the general pool.
- Translation Tools Push/Pull . Our partners at translate.org.za have developed sophisticated free and open software for computer assisted translation (CAT), called Virtaal and Pootle. We propose to integrate these tools tightly with our dictionary and terminology databases. When a term within their document is in our system, the translator will see the dictionary entry in a sidebar. When the term is not in our system, the translator will have the option to provide the information, which will then be processed by project editors. Documents that are translated with an open license will be included in translation memory for future users of the CAT tools, added to a corpus for machine translation development, and sentences will be harvested as usage examples of particular terms for the online dictionaries.
- Also on the horizon: Video dictionaries of African sign languages, multilingual mathematics primers, automatic machine translation among Bantu languages…
