Skip to main content

Creating Custom Alphabets

If your language is not already supported in Dasher, you can add support by creating an alphabet file and providing training text. This guide will walk you through the process.

Overview

Dasher uses two main components to support a language:

  1. Alphabet file (alphabet.xml) - Defines the characters and their order
  2. Training text - A sample of natural writing (300K or more) to teach Dasher character probabilities

Step 1: Create the Alphabet File

The alphabet file is an XML file that defines all characters in your language and their display order in Dasher.

Basic Alphabet File Structure

<?xml version="1.0" encoding="UTF-8"?>
<alphabet name="MyLanguage">
  <!-- Define character groups -->
  <group label="Lowercase">
    <char d="a" />
    <char d="b" />
    <!-- more characters... -->
  </group>

<group label=“Uppercase”> <char d=“A” /> <char d=“B” /> <!— more characters… —> </group>

<group label=“Numbers”> <char d=“0” /> <char d=“1” /> <!— more characters… —> </group>

<group label=“Punctuation”> <char d=” ” /> <!— space —> <char d=”.” /> <char d=”,” /> <!— more punctuation… —> </group> </alphabet>

Character Attributes

Characters can have various attributes:

  • d - The character itself (display)
  • t - Text output (if different from display)
  • colour - Color for the character box
  • label - Display label for character groups

Combining Characters

For languages with combining characters (like Thai), use special handling:

Dasher can generate complicated multi-part characters by combining Unicode components. Define base characters and combining marks separately in the alphabet file.

Step 2: Prepare Training Text

Training text helps Dasher learn the probability distribution of characters in your language.

Requirements

  • Size: At least 300KB of text (more is better)
  • Content: Natural writing in your target language
  • Format: Plain text file, UTF-8 encoded
  • Quality: Representative of typical usage

Sources for Training Text

Public Domain Books

Project Gutenberg, public domain literature, government documents

News Articles

News websites (check copyright), press releases

Wikipedia

Dump files available for many languages

Corpora

Existing language corpora for linguistics research

Creating Your Own Training Text

For best results, create training text that matches your personal writing style. Collect emails, documents, or other text you’ve written in the target language.

Step 3: Install the Files

Windows

  1. Place alphabet.xml in: C:\Program Files\Dasher\alphabets\
  2. Place training text in: C:\Program Files\Dasher\training\
  3. Restart Dasher
  4. Select Options → Alphabet and choose your language

Linux

  1. Place alphabet.xml in: /usr/share/dasher/alphabets/
  2. Place training text in: /usr/share/dasher/training/
  3. Or use ~/.dasher/ for user-specific files
  4. Restart Dasher
  5. Select Options → Alphabet and choose your language

macOS

  1. Right-click Dasher.app and select "Show Package Contents"
  2. Navigate to Contents/Resources/
  3. Place files in alphabets/ and training/ subdirectories
  4. Restart Dasher
  5. Select Options → Alphabet and choose your language

Step 4: Test and Refine

Testing Your Alphabet

  1. Start Dasher and select your new alphabet
  2. Try writing some sample text
  3. Check that all characters appear correctly
  4. Verify character order makes sense for your language

Troubleshooting

Characters not appearing

Check that your font supports the characters. Install a Unicode font for your language if needed.

Predictions seem wrong

Add more training text, or ensure it's representative of natural writing in your language.

File not loading

Verify the XML is well-formed. Check for encoding issues (should be UTF-8).

Wrong character order

Adjust the order of characters in the alphabet file to match your language's conventions.

Advanced Topics

Context-Dependent Characters

Some languages have characters that change form based on context. Dasher can handle this through special XML attributes and context rules.

Multiple Input Methods

For languages with multiple input methods (like different keyboard layouts), you can create multiple alphabet files with different context attributes.

Sharing Your Alphabet

If you create an alphabet for a language not yet supported, please consider contributing it to the Dasher project!

Resources and References

Unicode Resources

Unicode Consortium - Official Unicode charts and standards

Alphabet Examples

Dasher GitHub - View existing alphabet files in the repository

Font Information

Alan Wood's Unicode Fonts - Information about Unicode fonts for various languages

Training Text Corpora

Project Gutenberg - Free public domain books in many languages

Need Help?

If you need help creating an alphabet or want to contribute one you've made, please contact us on GitHub Discussions.