Creating Custom Alphabets
If your language is not already supported in Dasher, you can add support by creating an alphabet file and providing training text. This guide will walk you through the process.
Overview
Dasher uses two main components to support a language:
- Alphabet file (
alphabet.xml) - Defines the characters and their order - Training text - A sample of natural writing (300K or more) to teach Dasher character probabilities
Step 1: Create the Alphabet File
The alphabet file is an XML file that defines all characters in your language and their display order in Dasher.
Basic Alphabet File Structure
<?xml version="1.0" encoding="UTF-8"?> <alphabet name="MyLanguage"> <!-- Define character groups --> <group label="Lowercase"> <char d="a" /> <char d="b" /> <!-- more characters... --> </group><group label=“Uppercase”> <char d=“A” /> <char d=“B” /> <!— more characters… —> </group>
<group label=“Numbers”> <char d=“0” /> <char d=“1” /> <!— more characters… —> </group>
<group label=“Punctuation”> <char d=” ” /> <!— space —> <char d=”.” /> <char d=”,” /> <!— more punctuation… —> </group> </alphabet>
Character Attributes
Characters can have various attributes:
d- The character itself (display)t- Text output (if different from display)colour- Color for the character boxlabel- Display label for character groups
Combining Characters
For languages with combining characters (like Thai), use special handling:
Dasher can generate complicated multi-part characters by combining Unicode components. Define base characters and combining marks separately in the alphabet file.
Step 2: Prepare Training Text
Training text helps Dasher learn the probability distribution of characters in your language.
Requirements
- Size: At least 300KB of text (more is better)
- Content: Natural writing in your target language
- Format: Plain text file, UTF-8 encoded
- Quality: Representative of typical usage
Sources for Training Text
Public Domain Books
Project Gutenberg, public domain literature, government documents
News Articles
News websites (check copyright), press releases
Wikipedia
Dump files available for many languages
Corpora
Existing language corpora for linguistics research
Creating Your Own Training Text
For best results, create training text that matches your personal writing style. Collect emails, documents, or other text you’ve written in the target language.
Step 3: Install the Files
Windows
- Place
alphabet.xmlin:C:\Program Files\Dasher\alphabets\ - Place training text in:
C:\Program Files\Dasher\training\ - Restart Dasher
- Select Options → Alphabet and choose your language
Linux
- Place
alphabet.xmlin:/usr/share/dasher/alphabets/ - Place training text in:
/usr/share/dasher/training/ - Or use
~/.dasher/for user-specific files - Restart Dasher
- Select Options → Alphabet and choose your language
macOS
- Right-click Dasher.app and select "Show Package Contents"
- Navigate to
Contents/Resources/ - Place files in
alphabets/andtraining/subdirectories - Restart Dasher
- Select Options → Alphabet and choose your language
Step 4: Test and Refine
Testing Your Alphabet
- Start Dasher and select your new alphabet
- Try writing some sample text
- Check that all characters appear correctly
- Verify character order makes sense for your language
Troubleshooting
Characters not appearing
Check that your font supports the characters. Install a Unicode font for your language if needed.
Predictions seem wrong
Add more training text, or ensure it's representative of natural writing in your language.
File not loading
Verify the XML is well-formed. Check for encoding issues (should be UTF-8).
Wrong character order
Adjust the order of characters in the alphabet file to match your language's conventions.
Advanced Topics
Context-Dependent Characters
Some languages have characters that change form based on context. Dasher can handle this through special XML attributes and context rules.
Multiple Input Methods
For languages with multiple input methods (like different keyboard layouts), you can
create multiple alphabet files with different context attributes.
Sharing Your Alphabet
If you create an alphabet for a language not yet supported, please consider contributing it to the Dasher project!
Resources and References
Unicode Resources
Unicode Consortium - Official Unicode charts and standards
Alphabet Examples
Dasher GitHub - View existing alphabet files in the repository
Font Information
Alan Wood's Unicode Fonts - Information about Unicode fonts for various languages
Training Text Corpora
Project Gutenberg - Free public domain books in many languages
Need Help?
If you need help creating an alphabet or want to contribute one you've made, please contact us on GitHub Discussions.