IImeActiveDict
August 2003
Microsoft Corporation
Summary: This document describes the program dictionary for Microsoft IME 2003, Japanese version. The program dictionary uses the Component Object Model (COM) and contains the IImeActiveDict interface. The IImeActiveDict interface and its sub-class interfaces communicate with the IME dictionary kernel. (10 printed pages)
Contents
Registering the Program Dictionary
Obtaining the IImeActiveDict Interface Pointer
Data Structure
The IUnknown Interface
The IImeActiveDict Interface
Notes on Dictionary Implementation
Registering the Program Dictionary
The program dictionary will have a .DIC extension, just like any other IME dictionary. There are two ways to register a new dictionary:
- The user can add a new dictionary from the dictionary property dialog box by clicking the Add button in the system dictionary group.
- A new dictionary put into the default system dictionary folder is automatically scanned by Microsoft® IME and added to the registry. This scanning occurs when the property dialog is opened.
Obtaining the IImeActiveDict Interface Pointer
First, Microsoft IME will try to load the current active dictionary as a DLL. If it succeeds to load a DLL, it will try to get a predefined function through GetProcAddress()
. This process occurs whenever Microsoft IME is initialized.
HRESULT CreateIImeActiveDictInstance(
VOID *ppvObj,
int nid
);
Parameters | Description |
---|---|
VOID **ppvObj |
(out) The function instantiates the IImeActiveDict interface and stores the interface pointer in ppvObj |
int nid |
(in) Reserved. Currently ignored. |
Return | Value |
---|---|
HRESULT |
|
- This function is case sensitive and must be exported from the program dictionary.
- One DLL can handle only one dictionary. This means one dictionary path maps to one dictionary.
- This process eliminates the use of class and interface IDs provided by the COM library.
Data Structure
The following structures are used to obtain dictionary information.
IMESHF2
The IMESHF2 describes a program dictionary. For example:
//Shared Header dictionary File
typedef struct {
WORD cbShf; //size of this structure
WORD verDic; //dictionary version
WCHAR wszTitle[48]; //dictionary title
WCHAR wszDescription[256]; //dictionary description
WCHAR wszCopyright[128]; //copyright information
} IMESHF2;
Field Name | Description |
---|---|
verDic |
The user-defined version number. The higher byte is the major version number and the lower byte is the minor version number. |
szTitle |
The dictionary title. |
szDescription |
The dictionary description. |
szCopyright |
Copyright information, which appears in the dictionary property dialog of Microsoft IME |
IMEDINFO
The IMEDINFO contains all of the information required to handle the dictionary. For example:
//Dictionary Info
Typedef struct _IMEDINFO
{
IMESHF2 shf; //header
DWORD ver; //IImeActiveDict version number
IMEDDISC ddisc; //disclosure permission type
FILETIME filestamp; //file stamp at creation
IMEDICAT dicat; //dictionary category
IMEIDXTP idxtp; //index type
WCHAR wszHeading[cwchHeadingLen]; // Heading string which
// will be shown in candidate
// box.
BOOL fSkipInitialSearch; // If TRUE, this dictionary will be
// skipped during initial search.
} IMEDINFO;
Field Name | Description |
---|---|
ver |
This field must be set to verIImeActiveDict, which indicates the version of the program dictionary interface. Microsoft IME looks at this version to see if the interface is compatible with the current implementation. |
ddisc |
Allows the creator of a program dictionary to decide if the dictionary data may be disclosed to the user (ddiscNone , ddiscAll , or ddiscPartial ). |
filestamp |
Distinguishes this dictionary from other dictionaries of the same name, or distinguishes between identical dictionaries with different names. It is the file stamp obtained at the creation of the dictionary using the GetSystemTimeAsFileTime() function. The program dictionary must store this file stamp and never change it. |
dicat |
This field specifies the category of words in the dictionary. Microsoft IME will use this information to switch dictionaries when the conversion mode changes. If the dictionary does not fit into any category, it should specify dicatGeneral , which is the default category. |
idxtp |
Specifies which type of index the dictionary has. Namely, idxHiraKanji or idxKanjiHira, or a combination of both. |
wszHeading |
If this is not a null string, this string may be shown in the candidate list, when the dictionary has data for corresponding input, but the data is not yet shown. |
fSkipInitialSearch |
If TRUE, this dictionary will be skipped during an initial search. This means this dictionary is not used for clause breaking. |
IMEPDT
The following data structure is used to obtain words from the dictionary:
#define cwchWordMax 64
typedef DWORD IMESTMP; //word stamp
//Program Dictionary Tango
Typedef struct _IMEPDT
{
IMEIDXTP idxtp; //index type
int cwchInput; //input string length
int cwchOutput; //output string length
WCHAR wszInput[cwchWordMax]; //input string
WCHAR wszOutput[cwchWordMax]; //output string
DWORD nPos; //part of speech
IMESTMP stmp; //word time stamp
union{
DWORD dwWordId;
struct{
DWORD WordId : 20; //word ID
DWORD fComment : 1; //TRUE when this entry has word
//comment.
DWORD dwFree : 11;
};
};
WORD wMojiKind; //moji kind
} IMEPDT;
Field Name | Description |
---|---|
Idxtp |
Specifies which index to use in looking up words. |
cwchInput |
Specifies the length of wszInput in Unicode characters. |
cwchOutput |
Specifies the length of wszOutput in Unicode characters. |
wszInput |
Specifies the input string in Unicode characters. |
wszOutput |
Specifies the output string in Unicode characters. |
nPos |
Specifies the part of speech of the returned word, as defined in the header file named, msime.h. Since part of speech is language dependent, each language will have its own msime.h. Microsoft IME for the Japanese language supports the part of speech listed in the Appendix A. Please contact an IME program manager if a particular part of speech is not supported. |
stmp |
A time stamp to specify when the word is generated. This value is dictionary specific and can include "0." |
WordId |
Specifies a dictionary-specific word ID. Each word should have a unique ID. This value will be used in GetComment to specify the word to be searched for. |
fComment |
When TRUE, the word has a comment that can be traversed by GetComment. |
wMojiKind |
mojikindUNICODE applies when the word contains only UNICODE characters; otherwise mojikindSJIS applies. |
The IUnknown Interface
The IImeActiveDict interface follows the COM specification. The COM interface must be binary compatible and can be written by any computer language that supports function pointers. C++ is the recommended language, because it is object oriented and supports inheritance.
A COM interface must support the IUnknown interface, which consists of three basic functions: QueryInterface(), AddRef(), and Release().
IUnknown::QueryInterface
Not used.
HRESULT IUnknown::QueryInterface(
REFIID riid,
VOID **ppv
);
Parameters | Description |
---|---|
|
(in) interface id |
|
(out) interface pointer |
Return | Values |
---|---|
HRESULT |
E_NOINTERFACE always returns this value |
IUnknown::AddRef
This function will keep track of the number of clients using this object. The client should call AddRef()
whenever it makes a copy of the interface pointer. This way, the object knows how many copies of the object pointer are in use.
ULONG IUnknown::AddRef(VOID);
Return | Value |
---|---|
ULONG |
The return value is the updated reference count. |
IUnknown::Release
The client should call this method when it is finished with an interface pointer. The reference count is decrement, and the object should destroy itself when the count reaches zero.
ULONG IUnknown::Release(VOID);
Return | Value |
---|---|
ULONG |
The return value is the updated reference count. |
The IImeActiveDict Interface
Microsoft IME will use the interface pointer obtained fromCreateIImeActiveDictInstance()
to access the following interface functions. These are specific to this interface and must be supported by the program dictionary.
IImeActiveDict::Inquire
This function can be called before opening a program dictionary. Microsoft IME may use the dictionary information returned by this function to decide whether the dictionary should or should not be used.
HRESULT IImeActiveDict::Inquire(
IMEDINFO *pdinfo
);
Parameters |
Description |
---|---|
IMEDINFO *pdinfo |
(out) dictionary attributes are placed in IMEDINFO fields |
Return | Value |
---|---|
HRESULT |
S_OK on success E_FAIL otherwise |
IImeActiveDict::Open
Microsoft IME will call this function just before accessing this program dictionary to obtain dictionary information. The pdinfo parameter should be returned with all fields filled.
HRESULT IImeActiveDict::Open(
IMEDINFO *pdinfo
);
Parameters | Description |
---|---|
IMEDINFO *pdinfo |
(out) dictionary attributes are placed in IMEDINFO fie lds. |
Return | Value |
---|---|
HRESULT |
S_OK on success E_FAIL otherwise |
IImeActiveDict::Close
Microsoft IME will call this function when an opened program dictionary is no longer needed. All resources attached with the program dictionary may be freed at this time. Since Microsoft IME may still keep the interface pointer, do not destroy the object instance at this time.
HRESULT IImeActiveDict::Close(void);
Return | Value |
---|---|
HRESULT |
S_OK on success E_FAIL otherwise |
IImeActiveDict::SearchWord
This function searches for a word with a given input string. Microsoft IME calls this repeatedly until all words with the same input string are returned. It is up to the program dictionary to cache the searching state to increase performance. It is important to check for longer words with the input string as the prefix, and return IDICT_LONGER_WORD, in order to increase the performance of Microsoft IME.
HRESULT IImeActiveDict::SearchWord(
IMEPDT *ppdt,
BOOL fFirst,
BOOL fWildCard
);
Parameters | Description |
---|---|
IMEPDT *ppdt
|
(in/out) The cwchInput, wszInput, and idxtp fields are set before this function is called. The program dictionary fills the cwchOutput, wszOutput, nPos, and stmp fields when a matching word is found in the dictionary. |
BOOL fFirst |
(in) fFirst is set to TRUE when a word is being searched the first time. fFirst is set to FALSE if Microsoft IME wants the program dictionary to search for the same input string as the previous call to this function. |
BOOL fWildCard |
(in) When FALSE, exact match will be performed. When TRUE, all words starting with a given substring will be acquired. |
Return | Values |
---|---|
HRESULT |
S_OK—A word is found in the dictionary. IPRG_S_LONGER_WORD—The word with the given input string is exhausted, but there are longer words with the input string as the prefix. If a program dictionary does not support this feature, it must return IPRG_S_NO_ENTRY so that Microsoft IME will search for longer words. IPRG_S_NO_ENTRY—The word with the given input string is exhausted and there are no longer words with the input strings as the prefix. E_FAIL—The function failed. |
IImeActiveDict::GetComment
Obtains a comment for a tango. The client obtains the size of the comment (in bytes) by calling this routine with pv set to NULL.
HRESULT IImeActiveDict::GetComment(
DWORD dwWordId,
VOID *pv,
DWORD *pdwSetID,
int *pcb,
IMEUCT *puct,
WORD *pwMojiKind
);
Parameters | Description |
---|---|
DWORD dwWordId |
(in) word id of tango with the user comment |
VOID *pv |
(out) pointer to buffer, can be NULL. |
DWORD *pdwSetID |
(out) Group ID of comment. If many words with same reading have comments, if group id is specified, not all but comments which have same group ID, will be shown at same time. Or you can just specify 0 to this value, resulting the comment always shown. |
int *pcb |
(in) if pv is not NULL, *pcb is the size of buffer (out) if pv is NULL returns size of buffer needed, otherwise returns the size of comment placed in *pv |
IMEUCT *puct |
(out) returns the type of user comment |
WORD *pwMojiKind |
(out) mojikindUNICODE if comment string contains UNICODE specific character. Otherwise mojikindSJIS. |
Return | Value |
---|---|
HRESULT |
S_OK on success IDICT_E_NOT_ENOUGH_BUFFER E_FAIL otherwise |
IImeActiveDict::OpenProperty
When this function is called, the program dictionary can create a window and/or a dialog box in which extra properties for the dictionary are shown and manipulated. The property window/dialog box should be modal, and it should be created in the same process as the program dictionary.
HRESULT IImeActiveDict::OpenProperty(
HWND hwnd
);
Parameter | Description |
---|---|
HWND hwnd |
(in) This is the window handle of the parent window. The property window/dialog should be a child of this window. |
Return | Value |
---|---|
HRESULT |
S_OK on success E_FAIL otherwise |
Notes on Dictionary Implementation
- The program dictionary must be read only.
- A program dictionary will be included in the dictionary set used by the IME kernel during morphological analysis. About 50 percent of the processing time is occupied by accessing the dictionary. It is important that the program dictionary's word retrieval routines are as efficient as possible.
- One way of improving performance is to use internal caching, which stores data in memory to avoid disk accesses. It should be noted that IME is attached to each application, and dictionaries are opened when IME is being attached. This indicates that the program dictionary's data space is not shared among applications. A program dictionary would use a great deal of memory if it allocated a caching memory for each process.
- In order to avoid this scenario, Win32 provides a memory-sharing mechanism among processes, called a memory-mapped file. It is desirable that program dictionaries use memory-mapped files to implement dictionary caching. APIs involved are CreateFileMapping(), MapViewOfFile(), UnmapViewOfFile(), and CloseHandle(). Please look at Win32 reference for detailed information on memory-mapped files.