Memory marshalling in Windows CE
Posted by: Sue Loh
This article explains how memory access and memory passing is implemented in Windows CE 6 as well as previous versions of the OS. My intention is to explain the significant differences in CE6 by contrasting it against earlier OS versions. I structured this explanation to talk mostly about drivers, about how drivers used to work in CE5 and how they will work in CE6. That’s because it’s most urgent for our BSP and driver developers to understand how their code is going to have to change. But these explanations also cover system servers: how the implementations of APIs and services work. Drivers and servers work the same way.
Let’s begin with some quick definitions related to passing a pointer from client to server. Each term will be covered in more detail as we go.
- Pointer parameter: A pointer that’s passed as a parameter to an API.
- Embedded pointer: A pointer that’s passed to an API by storing it inside a buffer.
- Access Checking: Verifying that the caller process has privilege to access a buffer.
- Marshalling: Preparing a pointer that a server can use to access a caller’s buffer.
- Secure-copy: Making a copy of a buffer to prevent against asynchronous modification by the caller.
- Synchronous: Accesses during an API call, on the caller’s thread.
A pointer parameter is a pointer that’s passed as a parameter to an API. For example, the pBuffer parameter to the ReadFile() API is a pointer parameter.
ReadFile (hFile, pBuffer, dwBufferSize, ...);
An embedded pointer is a pointer that’s passed to an API by storing it inside a pointer parameter, or nested inside another embedded pointer. For example, while the pMyStruct parameter to the following DeviceIoControl() call is a pointer parameter, the pEmbedded pointer that is stored inside MyStruct is an embedded pointer.
struct MyStruct {
BYTE *pEmbedded;
DWORD dwSize;
};
DeviceIoControl (hFile, pMyStruct, sizeof(MyStruct), ...);
Pointers that are passed by other means, for example by storing them inside shared memory or by using SetEventData() to attach them to an event, end up having all the same properties as embedded pointers and so should be treated as such.
Access checking is verifying that the caller of an API has enough privilege to access a buffer that it passed to the API. (Access checking is not limited to memory, but in this case I’m only defining it with regard to memory.) The reason access checking is necessary is to prevent malicious applications from being able to induce driver code to perform actions on their behalf. Drivers have a lot of privilege, and can access a lot of system data. Applications can not. If a malicious application could cause a driver to read or write system memory on its behalf, then that driver is essentially granting the malicious application access to data it should not. Proper access checking inside the driver can protect system memory.
In CE5:
- Drivers used MapCallerPtr() to access-check pointer parameters and embedded pointers. The CE5 kernel also redundantly access-checked pointer parameters, but had no way to know the size of the buffers being passed. So it only checked the caller’s access to a single byte of the buffer.
- The access was granted or denied based on the “trust level” of the caller process.
In CE6:
- The API call definitions were changed to also include the sizes of pointer parameters. So the kernel now performs a full access check on pointer parameters. (I will explain this in more detail when I post about how API calls are implemented in CE6.)
- Drivers only need to access check embedded pointers, and they do this using the new API CeOpenCallerBuffer(). This API is also responsible for marshalling the data, as explained below.
- The access is granted or denied based on whether the caller is the kernel or a user-mode process. (It may change to a more granular determination in the future, based on privilege levels.)
Synchronous memory access is done during an API call, on the caller’s thread. If a driver has a thread which accesses the other process’ memory after the API call returns, that’s asynchronous access. But just as significantly, if the driver has a thread which is guaranteed to access the other process’ memory during the course of the API call – before it returns – for the purpose of this discussion, that access is asynchronous too.
Pointer mapping or marshalling is the preparation of a pointer that a driver can use to access a caller’s buffer. Drivers run inside a different process than the application which calls them. The virtual memory space of every process is, by default, protected against access by other processes. A driver must do some work in order to access a buffer inside another process’ memory.
In CE5, all processes shared a common address space. To obtain a pointer to its caller’s memory, a driver would have to “map” the pointer into that process’ address space. “Mapping” was a simple transformation of the pointer value, to make it point at the other process “slot” inside the common address space. The following picture shows device.exe accessing data in-place inside its caller.
In CE6, each process has its own unique address space. Marshalling memory cannot be as simple as a pointer transformation. Either the memory must be copied from one process to another (duplication) or a new virtual address must be allocated in the driver process and pointed at the same physical memory the caller was using (aliasing). Either way, resources are allocated inside the driver process, and must be freed when the driver is done with them. The following pictures show a marshalled version of the caller’s buffer being created inside the kernel (for kernel-mode drivers) or udevice.exe (for user-mode drivers.)
The CE6 marshalling is also more formalized about declaring whether the buffer is in-only, in/out or out-only. Based on these settings, the marshalling helpers will ensure that copy-in and copy-out happen at the appropriate times. They are also used for access checking, for example a user-mode application cannot pass a shared heap address (which is read-only to applications) as an in/out or out-only parameter.
To explain what drivers must do to marshal memory, it is simpler to examine synchronous and asynchronous accesses separately. First, for synchronous access:
- The kernel automatically maps or marshals pointer parameters.
- The driver must take care of embedded pointers. In CE5, drivers used MapCallerPtr() for this. In CE6, drivers use CeOpenCallerBuffer() to marshal embedded pointers, and CeFreeCallerBuffer() when they are done.
Both MapCallerPtr and CeOpenCallerBuffer have the added benefit that they access-check the buffer as they prepare it for use.
Asynchronous accesses are more complicated. In CE5, Additional work must be done to access the caller’s memory on a different thread. Each process “slot” was protected from access by other processes. Each thread had a its own set of “permissions” to access the various process slots. As the caller’s thread jumped into the driver, it carried with it permission to access its owner process slot. So accesses to caller’s memory would succeed as long as they were done on that thread. Other threads would first have to obtain permission to access to the other process slot.
In CE6, like CE5, additional work must be done to access the caller’s memory on a different thread. The reasons are different, and not as easy to explain. The way memory is marshalled differs between kernel mode and user mode, and differs between pointer parameters and embedded pointers. The only way to guarantee that the driver code is going to work properly in all modes is to prepare buffers for asynchronous access before accessing them on another thread.
For asynchronous access, pointer parameters and embedded pointers are handled the same way. Assuming that we start with a buffer that is already mapped or marshalled for synchronous access, the steps a driver must take in order to access it asynchronously are:
- In CE5, a driver must call SetProcPermissions() on its asynchronous thread, in order to access a buffer in a different process.
- In CE6, a drivers must call CeAllocAsynchronousBuffer() to prepare an “asynchronous ready” version of the buffer that is already prepared for synchronous use. That call must be made synchronously, before passing the buffer to the asynchronous thread. When the thread is done with the buffer, it calls CeFreeAsynchronousBuffer() to release the resources associated with it.
Also, unfortunately, not all asynchronous cases are supported for user-mode drivers. What a user-mode driver cannot do is asynchronously write back to a pointer parameter. Kernel-mode drivers always work, embedded pointers always work, and read-only pointers (no write-back to the caller) always work fine too. I personally feel more comfortable saying that we simply don’t support asynchronous access in user-mode drivers. If people listen to that, they can never get into trouble. If your driver needs asynchronous access to caller buffers, in CE6 you should run it in kernel mode. (Or if it’s an option, rearchitect your protocol so that caller memory access is never asynchronous, eg. notify the caller that data is ready and have them call back into your driver to retrieve it.)
Other details for production quality drivers
You may say that the following two topics, secure copy and exception handling, are not part of memory marshalling. But they are required in today’s world for safely receiving memory from other processes, and I believe that any discussion of memory passing is not complete without covering them.
There is a security risk a lot of developers are not aware of: callers can modify the buffers they pass, while a driver is still using it. The caller application could have a secondary thread which manipulates the data in a buffer while the primary thread is inside a driver call. Malicious applications could manipulate embedded pointers to get access to memory they shouldn’t, or cause buffer overruns by manipulating buffer sizes, or cause other problems like exceptions and leaks. To prevent against this class of attacks, drivers must make a copy of the caller’s data, called a secure copy, to prevent the caller from modifying it asynchronously.
For my first example of an attack that can be prevented using secure copies, imagine that the caller passes an embedded pointer to a driver. The driver uses MapCallerPtr (in CE5) or CeOpenCallerBuffer (in CE6) to access check the pointer and map/marshal it for use. If the driver continues to store that pointer into the caller’s buffer, the caller could later manipulate it to point at other memory, and the driver would access the wrong memory. Drivers must make copies of the pointers they receive from callers to prevent asynchronous modification. Similarly, drivers must make copies of buffer size values they get from callers.
So, always copy embedded pointers to a local variable. This is easily accomplished as part of mapping/marshalling since you have to call MapCallerPtr or CeOpenCallerBuffer anyway. Never store the mapped/marshalled pointer back to the caller’s buffer. Never use the pointer in the caller’s buffer after it has been mapped/marshalled. Treat buffer size and length variables with the same caution, so that callers cannot manipulate sizes any more than they can manipulate pointers.
My second example of why secure copy is necessary involves file names. The CreateFile API, which takes a file name, validates that the caller is allowed to access that file. Suppose CreateFile read the file name, checked access, then used the file name to open the file when the access check passed. If the caller passes the name of a file it can access, then asynchronously changes it to a file name the caller is NOT supposed to be able to access, then there is a small window of time in which the caller could trick CreateFile into opening a file it’s not supposed to. Perhaps it would only be able to get access 1% of the tries, but a hacker program could keep trying and trying until the trick worked. It only has to work once in order to compromise system security. The way to protect against this type of attack is that CreateFile must make a copy of the filename, in memory that the caller cannot access, before validating the caller’s access to that file. (By the way, the OS already does a secure-copy of the file name before passing it to a driver’s CreateFile in CE6, this is just a thought experiment.)
You should make a copy of any data that requires validation, to prevent asynchronous modification after the validation is done. Making a secure copy can be as simple as copying a buffer or pointer into a stack variable. Or you could make a temporary heap allocation to copy the caller’s data into. You will notice that CeOpenCallerBuffer has a ForceDuplicate parameter you can use to guarantee that you get a secure copy of an embedded buffer. We’ve also created a CeAllocDuplicateBuffer helper function that you can choose to use. (It is basically a heap alloc, with memcpy as necessary for copy-in or copy-out.) It does not matter how you make the secure copy, as long as you do something to protect the data you take from callers.
Similar to secure copy is how drivers must use exception handling to protect their access of caller memory. It is important to note that, even if a caller has access to an address, that address may not refer to valid memory. An application can pass a pointer to a user-mode address that was never allocated. Or it could asynchronously free the buffer. So, drivers should always surround user buffer accesses with try/except blocks, and clean up resources during __except or __finally. For example, make sure to free memory that was allocated during the call, and release any critical sections, before returning to the caller.
In Summary
As you can see, passing memory between processes is a complicated matter. But don’t despair. There are relatively simple rules governing drivers, as covered in the following table.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
… and remember, always use try/except so you can clean up properly if you get exceptions on caller memory!
One other tip: CE6 has some helper C++ classes to simplify your usage of these APIs. In public\common\oak\inc\marshal.hpp you will find:
- MarshalledBuffer_t: wrapper for CeOpenCallerBuffer, CeAllocAsynchronousBuffer, and their cleanup functions. Use for all of your embedded pointers.
- DuplicatedBuffer_t: wrapper for CeAllocDuplicateBuffer and its free. Use for pointer parameters that need a secure copy.
- AsynchronousBuffer_t: wrapper for CeAllocAsynchronousBuffer and its free. Use for pointer parameters you need to access asynchronously.
The C++ version of the table then becomes:
Use Case |
What the driver must do in CE6 |
Parameter – used synchronously |
If a secure copy is necessary, use DuplicatedBuffer_t. Otherwise just use the pointer. |
Parameter – used asynchronously |
If a secure copy is necessary, use DuplicatedBuffer_t. Otherwise use AsynchronousBuffer_t. |
Embedded Pointer |
Use MarshalledBuffer_t. |
… and always use try/except!
Juggs Ravalia did a Channel 9 interview on this topic – if you don’t like my explanation, maybe you’ll like his better. https://channel9.msdn.com/Showpost.aspx?postid=233119
Comments
- Anonymous
November 09, 2006
Posted by: Sue Loh One of the biggest concerns people have about the new CE6 release is backward compatibility. - Anonymous
November 09, 2006
Great article! Thank you! - Anonymous
November 10, 2006
You are very welcome. :-)Sue - Anonymous
November 14, 2006
Posted by: Upender Sandadi One of the goals for Windows CE 6.0 design was full backward compatibility - Anonymous
November 19, 2006
Hello, I have worked with variety of Operating Systems for long time and "buffer duplication" that has been introduced in CE 6 is something that I haven't seen elsewhere. I also feel that CE 6 has done a very good job with "API Registration" which also includes "Signature" identification with "Pointer parameters" access checking being taken over inside trap-handlers. That certainly avoids lot of code duplication and potential mistakes on part of driver developers. No other OS provides such elaborate and general mechanims which can be seen in "RPC" ( Remote Procedure Call ) area, but not in system software. So a great job.At the same time I feel that CE 6 trap-handlers taking care of "pointer parameters" is going to make some people feel that "no validation" is needed and all pointers are always valid inside drivers. Of course it's not your fault. But doing things "under the hood" may make some DD programmer's forget that "someone" has to do it.Regards,Asang.. - Anonymous
November 20, 2006
Hello Asang, thanks for your feedback!You're right, by automatically access checking pointer parameters, we run the risk that people will mistakenly assume they don't have to access check embedded pointers. On the other hand, if we didn't automatically do those access checks, we'd be completely trusting everyone (including ourselves) to get buffer usage 100% right. This way we at least close off one possible mistake. It's definitely not perfect but I think it's the best choice to make.One thing that makes it slightly better is that user-mode drivers cannot use embedded pointers at all without first calling CeOpenCallerBuffer. Hopefully that will be enough to get people into the habit of using CeOpenCallerBuffer on embedded pointers, even in kernel-mode drivers.Sue - Anonymous
November 20, 2006
Hello Sue, I agree with you completely. Overall, I am really happy to see the manner in which these changes were made in CE 6.0. At the same time, I am afraid MSDN documentation about CE 6 appears to be quite inadequate as of now. Secondly, there need to be one sample for each scenario for most people to make sense - of course I can write some samples myself, but if it comes from your team it's more authentic.Regards,Asang.. - Anonymous
November 21, 2006
As to the documentation, I am told that there is a doc update shortly forthcoming. So look forward to lots of improvements in the near future. (You should get one of those notifications that updates are available.)I am sure our drivers team has good sample drivers for users to look at as examples of how to do the memory marshalling. I would have to ask them for good examples though. That is an excellent suggestion for another post; I hope to have something soon.Sue - Anonymous
November 21, 2006
Good , good ,... articles!! Thanks Sue!!From your introduction, "duplication" is one of the approaches to marshall memory as weell as "alias". Obviously, "alias" has higher performance than "duplication" for sparing memory-copy action. Therefore, in WinCE 6, "alias" is the de facto way to implement memory marsharl?Besides, I cannot refer to the details for some APIs , such as CeAllocAsynchronousBuffer, in MSDN. Could u give me any suggestion about where to look up it? Thanks a lot!! - Anonymous
November 21, 2006
'Secure copy' is the specific instance of "duplication" , right? - Anonymous
November 22, 2006
Thanks!Actually you might be surprised to learn that we found that aliasing is NOT always faster than duplication. On ARM at least, for small buffers (<8KB or so) duplication is faster than aliasing. ARM has added issues that we have to make the source and dest buffer uncached while aliasing, to avoid cache coherency problems. (Most ARM devices today have a virtually tagged cache, and memory must be uncached if you're going to alias it to two different virtual addresses.) Maybe the boundary is smaller on other CPUs, but I believe that for pretty small buffers we may actually do best by duplicating, not aliasing. So please don't make any assumptions about CeAllocAsynchronousBuffer. It is the OS' job to make the right trade offs for performance and security in these marshalling APIs.I would say that 'secure copy' is duplication for the purpose of security, rather than for the purpose of marshalling.I am sorry to hear that some of the APIs did not end up being documented properly in our shipping documentation. I've heard other people say that as well. What I also heard is that there is a documentation update coming soon -- it will show up as a notification that updates are available inside Platform Builder / VS. - Anonymous
November 28, 2006
When rewriting services.exe for the CE 6.0 kernel , we tried to make it so that as few changes as possible - Anonymous
December 04, 2006
http://www.windowsfordevices.com/articles/AT5831219184.html - Anonymous
December 08, 2006
Windows Embedded CE 6.0 introduced a new, "third generation" kernel. Unsurprisingly, changes to the kernel - Anonymous
January 02, 2007
The comment has been removed - Anonymous
January 11, 2007
Posted by: Sue Loh I am occasionally asked whether I know any good books or other resources to help learn - Anonymous
April 04, 2007
Dans la série des excellents articles de Sue Loh , voici celui qui présente le marshalling mémoire dans - Anonymous
June 06, 2008
Posted by: Sue Loh Hello out there, it's been a long time since I posted anything real, and I feel sorry