How can I find out which ranges of a memory-mapped file are "hot"?

Joe Amenta 1 Reputation point
2021-01-26T23:11:20.537+00:00

Background

We've developed an application that demands fast access to data that's contained within gigantic static files (the largest individual files are dozens of GB in size, and there are multiple different sets of these files).

For various reasons, we know that only a fraction of this data will really need to be "hot", but there's no clear way for us to determine what this subset is, and it could change over time.

So we map read-only views of these files into virtual memory and access the data through pointers. The first time a page is read, the page fault handler will (slowly) load it into physical memory so that subsequent accesses to that data will be fast.

Over time, the operating system will "learn" which pages should stay in physical memory throughout the lifetime of the process. However, when the process terminates (perhaps because the machine needs to restart to apply updates), all that information becomes "unlearned".

What I Really Need

Ideally, what I would like to do is to be able to persist this "learned" information on a timer so that whenever the application does a cold start, it can prefetch the same data that was resident in physical memory last time that this was run, without affecting the speed of the rest of the application (I'm OK with slowing down requests that come in while we're still doing this initial prefetch).

Edit: It would also be fantastic if we could prefetch this data in along with any internal state that NT / Win32 might use in order to choose which pages to evict from physical memory when it needs to, but I recognize that this is probably a really tough ask.

Things I've Looked For

  1. Something like VirtualQuery would be great, but MEMORY_BASIC_INFORMATION doesn't appear to include the details that I can act on.
  2. MmIsAddressValid (from ntddk.h) sounds like it would be viable if I had access to it, but it looks like this is only for drivers, and it has a bunch of really scary warnings around it.
  3. Interestingly enough, the SysInternals tool "RamMap" does present exactly the information that I need, but it appears to rely on internal undocumented APIs that can change (and, apparently, have changed), so this is a non-starter for me, I think
  4. The best viable option I can come up with is to try to figure out how to use ETW to watch for the page faults that I care about, record which pages are being brought into physical memory, and replay them in some order at the next restart. This sounds horrible to get up and running, and a royal pain to manage especially when this happens more than once, but I think it would get us some facsimile of the behavior that we're looking for, and it may even be better than doing nothing (though I haven't built out a proof-of-concept yet).
Windows API - Win32
Windows API - Win32
A core set of Windows application programming interfaces (APIs) for desktop and server applications. Previously known as Win32 API.
2,651 questions
0 comments No comments
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.