IImeActiveDict

 

August 2003

Microsoft Corporation

Summary: This document describes the program dictionary for Microsoft IME 2003, Japanese version. The program dictionary uses the Component Object Model (COM) and contains the IImeActiveDict interface. The IImeActiveDict interface and its sub-class interfaces communicate with the IME dictionary kernel. (10 printed pages)

Contents

Registering the Program Dictionary
Obtaining the IImeActiveDict Interface Pointer
Data Structure
The IUnknown Interface
The IImeActiveDict Interface
Notes on Dictionary Implementation

Registering the Program Dictionary

The program dictionary will have a .DIC extension, just like any other IME dictionary. There are two ways to register a new dictionary:

  1. The user can add a new dictionary from the dictionary property dialog box by clicking the Add button in the system dictionary group.
  2. A new dictionary put into the default system dictionary folder is automatically scanned by Microsoft® IME and added to the registry. This scanning occurs when the property dialog is opened.

Obtaining the IImeActiveDict Interface Pointer

First, Microsoft IME will try to load the current active dictionary as a DLL. If it succeeds to load a DLL, it will try to get a predefined function through GetProcAddress(). This process occurs whenever Microsoft IME is initialized.

HRESULT CreateIImeActiveDictInstance(
VOID *ppvObj,
int  nid
);
Parameters Description
VOID **ppvObj (out) The function instantiates the IImeActiveDict interface and stores the interface pointer in ppvObj
int nid (in) Reserved. Currently ignored.
Return Value
HRESULT
  • S_OK on success
  • E_FAIL otherwise
  • This function is case sensitive and must be exported from the program dictionary.
  • One DLL can handle only one dictionary. This means one dictionary path maps to one dictionary.
  • This process eliminates the use of class and interface IDs provided by the COM library.

Data Structure

The following structures are used to obtain dictionary information.

IMESHF2

The IMESHF2 describes a program dictionary. For example:

//Shared Header dictionary File
typedef struct {
   WORD   cbShf;               //size of this structure
   WORD   verDic;               //dictionary version
   WCHAR   wszTitle[48];         //dictionary title
   WCHAR   wszDescription[256];   //dictionary description
   WCHAR   wszCopyright[128];   //copyright information
} IMESHF2;
Field Name Description
verDic The user-defined version number. The higher byte is the major version number and the lower byte is the minor version number.
szTitle The dictionary title.
szDescription The dictionary description.
szCopyright Copyright information, which appears in the dictionary property dialog of Microsoft IME

IMEDINFO

The IMEDINFO contains all of the information required to handle the dictionary. For example:

//Dictionary Info
Typedef struct _IMEDINFO
{
   IMESHF2   shf;      //header
   DWORD      ver;      //IImeActiveDict version number
   IMEDDISC   ddisc;      //disclosure permission type
   FILETIME   filestamp;  //file stamp at creation
   IMEDICAT   dicat;      //dictionary category
   IMEIDXTP   idxtp;      //index type
   WCHAR      wszHeading[cwchHeadingLen];   // Heading string which 
                                          // will be shown in candidate
                                          // box.
   BOOL      fSkipInitialSearch;   // If TRUE, this dictionary will be
                                 // skipped during initial search.
} IMEDINFO;
Field Name Description
ver This field must be set to verIImeActiveDict, which indicates the version of the program dictionary interface. Microsoft IME looks at this version to see if the interface is compatible with the current implementation.
ddisc Allows the creator of a program dictionary to decide if the dictionary data may be disclosed to the user (ddiscNone, ddiscAll, or ddiscPartial).
filestamp Distinguishes this dictionary from other dictionaries of the same name, or distinguishes between identical dictionaries with different names. It is the file stamp obtained at the creation of the dictionary using the GetSystemTimeAsFileTime() function. The program dictionary must store this file stamp and never change it.
dicat This field specifies the category of words in the dictionary. Microsoft IME will use this information to switch dictionaries when the conversion mode changes. If the dictionary does not fit into any category, it should specify dicatGeneral, which is the default category.
idxtp Specifies which type of index the dictionary has. Namely, idxHiraKanji or idxKanjiHira, or a combination of both.
wszHeading If this is not a null string, this string may be shown in the candidate list, when the dictionary has data for corresponding input, but the data is not yet shown.
fSkipInitialSearch If TRUE, this dictionary will be skipped during an initial search. This means this dictionary is not used for clause breaking.

IMEPDT

The following data structure is used to obtain words from the dictionary:

#define cwchWordMax         64
typedef DWORD      IMESTMP;         //word stamp
//Program Dictionary Tango
Typedef struct _IMEPDT
{
   IMEIDXTP   idxtp;   //index type
   int   cwchInput;   //input string length
   int   cwchOutput;   //output string length
   WCHAR   wszInput[cwchWordMax];   //input string
   WCHAR   wszOutput[cwchWordMax];   //output string
   DWORD   nPos;   //part of speech
   IMESTMP   stmp;   //word time stamp
   union{
       DWORD   dwWordId;
       struct{
           DWORD   WordId : 20;   //word ID
           DWORD   fComment : 1;   //TRUE when this entry has word
         //comment.
           DWORD   dwFree : 11;
       };
   };
   WORD   wMojiKind;   //moji kind
} IMEPDT;
Field Name Description
Idxtp Specifies which index to use in looking up words.
cwchInput Specifies the length of wszInput in Unicode characters.
cwchOutput Specifies the length of wszOutput in Unicode characters.
wszInput Specifies the input string in Unicode characters.
wszOutput Specifies the output string in Unicode characters.
nPos Specifies the part of speech of the returned word, as defined in the header file named, msime.h. Since part of speech is language dependent, each language will have its own msime.h. Microsoft IME for the Japanese language supports the part of speech listed in the Appendix A. Please contact an IME program manager if a particular part of speech is not supported.
stmp A time stamp to specify when the word is generated. This value is dictionary specific and can include "0."
WordId Specifies a dictionary-specific word ID. Each word should have a unique ID. This value will be used in GetComment to specify the word to be searched for.
fComment When TRUE, the word has a comment that can be traversed by GetComment.
wMojiKind mojikindUNICODE applies when the word contains only UNICODE characters; otherwise mojikindSJIS applies.

The IUnknown Interface

The IImeActiveDict interface follows the COM specification. The COM interface must be binary compatible and can be written by any computer language that supports function pointers. C++ is the recommended language, because it is object oriented and supports inheritance.

A COM interface must support the IUnknown interface, which consists of three basic functions: QueryInterface(), AddRef(), and Release().

IUnknown::QueryInterface

Not used.

HRESULT IUnknown::QueryInterface(
REFIID riid,
VOID **ppv
);
Parameters Description
REFIID riid
(in) interface id
VOID **ppv
(out) interface pointer
Return Values
HRESULT E_NOINTERFACE always returns this value

IUnknown::AddRef

This function will keep track of the number of clients using this object. The client should call AddRef() whenever it makes a copy of the interface pointer. This way, the object knows how many copies of the object pointer are in use.

ULONG IUnknown::AddRef(VOID);
Return Value
ULONG The return value is the updated reference count.

IUnknown::Release

The client should call this method when it is finished with an interface pointer. The reference count is decrement, and the object should destroy itself when the count reaches zero.

ULONG IUnknown::Release(VOID);
Return Value
ULONG The return value is the updated reference count.

The IImeActiveDict Interface

Microsoft IME will use the interface pointer obtained fromCreateIImeActiveDictInstance()to access the following interface functions. These are specific to this interface and must be supported by the program dictionary.

IImeActiveDict::Inquire

This function can be called before opening a program dictionary. Microsoft IME may use the dictionary information returned by this function to decide whether the dictionary should or should not be used.

HRESULT IImeActiveDict::Inquire(
IMEDINFO *pdinfo
);
Parameters Description
IMEDINFO *pdinfo (out) dictionary attributes are placed in IMEDINFO fields
Return Value
HRESULT S_OK on success
E_FAIL otherwise

IImeActiveDict::Open

Microsoft IME will call this function just before accessing this program dictionary to obtain dictionary information. The pdinfo parameter should be returned with all fields filled.

HRESULT IImeActiveDict::Open(
IMEDINFO *pdinfo
);
Parameters Description
IMEDINFO *pdinfo (out) dictionary attributes are placed in IMEDINFO fields.
Return Value
HRESULT S_OK on success
E_FAIL otherwise

IImeActiveDict::Close

Microsoft IME will call this function when an opened program dictionary is no longer needed. All resources attached with the program dictionary may be freed at this time. Since Microsoft IME may still keep the interface pointer, do not destroy the object instance at this time.

HRESULT IImeActiveDict::Close(void);
Return Value
HRESULT S_OK on success
E_FAIL otherwise

IImeActiveDict::SearchWord

This function searches for a word with a given input string. Microsoft IME calls this repeatedly until all words with the same input string are returned. It is up to the program dictionary to cache the searching state to increase performance. It is important to check for longer words with the input string as the prefix, and return IDICT_LONGER_WORD, in order to increase the performance of Microsoft IME.

HRESULT IImeActiveDict::SearchWord(
IMEPDT *ppdt,
BOOL fFirst,
BOOL fWildCard
);
Parameters Description
IMEPDT *ppdt

(in/out) The cwchInput, wszInput, and idxtp fields are set before this function is called. The program dictionary fills the cwchOutput, wszOutput, nPos, and stmp fields when a matching word is found in the dictionary.
BOOL fFirst (in) fFirst is set to TRUE when a word is being searched the first time. fFirst is set to FALSE if Microsoft IME wants the program dictionary to search for the same input string as the previous call to this function.
BOOL fWildCard (in) When FALSE, exact match will be performed. When TRUE, all words starting with a given substring will be acquired.
Return Values
HRESULT S_OK—A word is found in the dictionary.
IPRG_S_LONGER_WORD—The word with the given input string is exhausted, but there are longer words with the input string as the prefix. If a program dictionary does not support this feature, it must return IPRG_S_NO_ENTRY so that Microsoft IME will search for longer words.
IPRG_S_NO_ENTRY—The word with the given input string is exhausted and there are no longer words with the input strings as the prefix.
E_FAIL—The function failed.

IImeActiveDict::GetComment

Obtains a comment for a tango. The client obtains the size of the comment (in bytes) by calling this routine with pv set to NULL.

HRESULT IImeActiveDict::GetComment(
DWORD dwWordId,
VOID *pv,
DWORD *pdwSetID,
int *pcb,
IMEUCT *puct,
WORD *pwMojiKind
);
Parameters Description
DWORD dwWordId (in) word id of tango with the user comment
VOID *pv (out) pointer to buffer, can be NULL.
DWORD *pdwSetID (out) Group ID of comment. If many words with same reading have comments, if group id is specified, not all but comments which have same group ID, will be shown at same time. Or you can just specify 0 to this value, resulting the comment always shown.
int *pcb (in) if pv is not NULL, *pcb is the size of buffer
(out) if pv is NULL returns size of buffer needed, otherwise returns the size of comment placed in *pv
IMEUCT *puct (out) returns the type of user comment
WORD *pwMojiKind (out) mojikindUNICODE if comment string contains UNICODE specific character. Otherwise mojikindSJIS.
Return Value
HRESULT S_OK on success
IDICT_E_NOT_ENOUGH_BUFFER
E_FAIL otherwise

IImeActiveDict::OpenProperty

When this function is called, the program dictionary can create a window and/or a dialog box in which extra properties for the dictionary are shown and manipulated. The property window/dialog box should be modal, and it should be created in the same process as the program dictionary.

HRESULT IImeActiveDict::OpenProperty(
HWND hwnd
);
Parameter Description
HWND hwnd (in) This is the window handle of the parent window. The property window/dialog should be a child of this window.
Return Value
HRESULT S_OK on success
E_FAIL otherwise

Notes on Dictionary Implementation

  • The program dictionary must be read only.
  • A program dictionary will be included in the dictionary set used by the IME kernel during morphological analysis. About 50 percent of the processing time is occupied by accessing the dictionary. It is important that the program dictionary's word retrieval routines are as efficient as possible.
  • One way of improving performance is to use internal caching, which stores data in memory to avoid disk accesses. It should be noted that IME is attached to each application, and dictionaries are opened when IME is being attached. This indicates that the program dictionary's data space is not shared among applications. A program dictionary would use a great deal of memory if it allocated a caching memory for each process.
  • In order to avoid this scenario, Win32 provides a memory-sharing mechanism among processes, called a memory-mapped file. It is desirable that program dictionaries use memory-mapped files to implement dictionary caching. APIs involved are CreateFileMapping(), MapViewOfFile(), UnmapViewOfFile(), and CloseHandle(). Please look at Win32 reference for detailed information on memory-mapped files.