SharePoint and SID History not playing well together

Hi,

I struck a problem at a custom and the impact, while it seemed minor on the surface, was actually a big deal for their migration project. In fact, the large team they had assembled to migrate users from one forest to a new forest had stopped while this issue was investigated.

It relates to SID History and the way Windows queries for and caches Name-to-SID and SID-to-Name lookups from AD. This cache was causing SharePoint to think that a user who wanted to logon was actually a user from the wrong domain, and would create that person a new identity for that person within SharePoint for them.

The scenario is actually very close to this one:

https://blogs.technet.com/b/rgullick/archive/2010/05/15/sharepoint-people-picker.aspx

But the workaround that we found would resolve the problem while they were migrating was pretty cool, so I thought I’d save it for all eternity here as a blog.

It boils down to this:

The LsaCache stores the previously looked-up domain user names and their SIDs. By asking a DC which has users that have both the new SID and the migrated SID on them at the same time, the DC always links the migrated SID to the new user name, not the old user name. If we can artificially fill the LsaCache with mappings for OLD USERNAME = OLD SID in our servers, then we can act as though no resources have migrated yet.

Here’s the scenario where users were migrated with SID History from child1.domainA.com to domainB.com

image 

  1. CHILD1\bob logs onto a workstation in CHILD1 and opens the SPS site in DOMAINB (intranet.domainB.com)
  2. SPS asks IIS, which asks Windows for a local DC to resolve a remote SID: S-1-5-21-[SID_for_CHILD1]-1010
  3. The local DC finds the SID assigned to the migrated user in the global catalog
  4. The local DC returns the account name of the migrated user, DOMAIN2\bob
  5. The SPS server adds the result to its LsaCache as a mapping for this SID to the DOMAIN2 account

So we can see from the picture above that the LsaCache (the table in the bottom right of the drawing) has a mapping for NEW USERNAME = OLD SID but we want OLD USERNAME = OLD SID

So, let’s warm up the LsaCache so it looks the way we’d like it to:

image

  1. SPS constantly runs a script to query for the name CHILD1\bob
  2. The local DC queries its Global Catalog and does NOT have a record for this username
  3. The local DC must do its own LSA query to a DC in the domain CHILD1 for this name
  4. The remote DC in CHILD1 finds the user and replies with the SID: S-1-5-21-[SID_for_CHILD1]-1010
  5. The CHILD1 DC returns this to the DOMAINB DC (the DOMAINB DC caches this result in its own LsaCache)
  6. The local DC returns this result to the SPS server
  7. The SPS server adds this entry to its LsaCache

Ah ha! Now our cache looks the way we’d like it, where OLD USERNAME = OLD SID. This way when a query for OLD SID is made, the result from cache will return OLD USERNAME.

image 

  1. CHILD1\bob logs onto a workstation in CHILD1 and opens the SPS site in DOMIANB (intranet.domainB.com)
  2. SPS does NOT ask the local DC for the remote SID, it uses its LsaCache
  3. The LsaCache on SPS replies back with the username which relates to the SID: S-1-5-21-[SID_for_CHILD1]-1010 is CHILD1\bob

The important step here is the red X where there IS NO STEP. What I mean is that the SharePoint server never talked to the DC to get the OLD SID lookup to return a result, meaning that we relied totally on the warmed up cache on the SPS alone.

This relies on the LsaCache on the SPS server ALWAYS having the entry for the SID from the CHILD1 domain matching the CHILD1 username, and never matching the DOMAINB username. The only way to ensure this is:

  1. Constantly query from the SPS server for the name CHILD1\username for every user in DOMAINB which has been migrated from CHILD1 and has its SIDHistory migrated with it. Use a tool which invokes LookupAccountName() to locate the SID for the username: CHILD1\username. LookupAccountName is explained here: https://msdn.microsoft.com/en-us/library/aa379159(v=vs.85). I had access to a private tool which would do these queries for us. I suspect that PsGetSid from Sysinternals would be able to help out here too, but we never tried it.
  2. The LsaCache on SPS must be large enough to sure that the entries which are queried are never overwritten by entries from DOMAINB. Set the reg value HKLM\System\CurrentControlSet\Contol\Lsa\LsaLookupCacheMaxSize = (DWORD) = 0x2000 (8192 decimal). If this value does not exist the system uses a default cache size of 128 entries, which is overwritten too quickly on the busy SPS servers. 8192 entries on a pair of load balanced servers should be able to hold all SIDs for all users accessing the SPS site in the 2 forests (if your forest has more users, you’ll need to increase this.
  3. This is a workaround. The real fix is to have the users who are migrated from CHILD1.domain.com to domainB.com with SIDHistory should use their migrated accounts immediately. After the migration, their CHILD1 accounts should be disabled/deleted and SIDHistory should be removed from the DOMAINB accounts. This is an operationally very difficult action to do as it does not allow for an easy testing path or roll-back path.

To view the actions as they are performed by LSA Lookups, add these 2 DWORDs to the registry under HKLM\System\CurrentControlSet\Control\Lsa\:

  • LspDbgTraceOptions = 0x1 (1 means “log to a file”, the file is C:\Windows\Debug\Lsp.log)
  • LspDbgInfoLevel = 0x88888888 (all 8‘s in hex means “log as verbose as possible”)

These keys are explained here:

https://technet.microsoft.com/en-us/library/ff428139(v=ws.10).aspx

So, all in all a little complicated, but the workaround to increase the value for LsaLookupCacheMaxSize and constantly running a script on the SPS server to query for the SID for usernames in CHILD1 (with a filter to target only users which had been migrated to domainB) worked well for the customer.

Comments

  • Anonymous
    January 01, 2003
    That is great news Brandon. Thanks for sharing it. I've let my customer know that the stsadm tool has been updated with this workaround.

  • Anonymous
    January 01, 2003
    @Brandon: Yes, the idea is to keep the cache flooded with the entries you want, so "constantly" is up to you to decide. If you point the people who are working on your Premier case to this blog and give them my name, I may be able to help them along if needed. I won't be able to help directly just now though.

  • Anonymous
    January 01, 2003
    @MarkM: Yes.

  • Anonymous
    January 01, 2003
    @Bruce - Would you like to contact me so you can be that customer and start that process up?

  • Anonymous
    January 01, 2003
    @Jack Fruh: I'm no SharePoint person, but according to comments above, this entire workaround has been superseded by the August 2012 Cumulative Update. The link to this for SharePoint 2007 is here: http://support.microsoft.com/kb/2687330

    You run this command: stsadm.exe -o setproperty -propertyname "HideInactiveProfiles" -propertyvalue "true"

  • Anonymous
    January 02, 2013
    When you say the process that must be run to constantly query for the SID of CHILD1 users, what do you mean by "constantly"?  Just long enough to keep them in cache?  What if you have an environment where a lot of lookups are done?  I think I am having this same problem on one of our customer's SharePoint farms, during a long-timeframe domain migration.  There are disabled accounts in the new domain due to Exchange migration.  I currently have an open Premier support case for this.  Can you help?

  • Anonymous
    April 03, 2013
    The comment has been removed

  • Anonymous
    May 17, 2013
    The comment has been removed

  • Anonymous
    March 18, 2014
    Thanks so does that mean disregard teh workaround in the article and go for the stsadm.exe setting as described above?

  • Anonymous
    May 02, 2014
    The comment has been removed

  • Anonymous
    May 06, 2014
    The comment has been removed

  • Anonymous
    May 12, 2014
    Pingback from Active Directory Migration Woes (Part 2) | Jack Fruh's SharePoint blog

  • Anonymous
    June 08, 2016
    This sounds like a workaround we need, but it is unclear to me what you actually did to achieve this. You used some private tool or something? How can we "warm up" the LsaCache?

    • Anonymous
      June 13, 2016
      @Erik: From the text in the blog post: "I had access to a private tool which would do these queries for us. I suspect that PsGetSid from Sysinternals would be able to help out here too, but we never tried it."But check the comments above as this was fixed via a customisable option in a Cumulative Update.
  • Anonymous
    September 22, 2016
    Hi Craig,How big you set your LsaLookupCacheMaxSize? Is any issue if you set it to like 50,000

  • Anonymous
    December 09, 2016
    Hi Craig - we ran into similar issues. However the issue to our understanding never happened during Pilot phase. Just on launch day which made us think the load of users is one of the root causes. However, what we are not clear on is - we rebooted servers couple of times (which would clear LSA cache) during Pilot phase and no issues. As this would clear the LSA cache as well and starting to fill it from 0. Why didn't it pull in the wrong accounts during that point in time. For us it's not just a couple users existing in multiple domains but we are on the way to merge legacy domains into one new domain (basically all users are existing twice and share SID history) - we cannot inactivate users in future domain, as other applications are already leveraging those accounts.We now recreated the issue on test server by setting LSA cache max size to 1. However it's not randomly but always pulling the "wrong" account and SharePoint gets completely confused. Excuse my technical description - I am not a subject matter expert but just the PM. Any thought would be highly appreciated and thanks for the great post!

    • Anonymous
      December 15, 2016
      @Mark Dangel. I'm no SharePoint SME either I'm afraid. But the responses in the comments from people who are assure us that there is a fix within SharePoint itself now and the workarounds described in the blog post here aren't needed any longer. Have a read through the comments.