SQL Server on Linux: Kerberos Troubleshooting, Hints and Tips and Hard Code Debugging

SQL Server on Linux uses the GSSAPI and SSSD service for Active Directory (AD) authentication activities.   Thus, Kerberos is the path for success for AD authentication and just in case you have to troubleshoot a problem I have a few tips.

My Short Story

I was encountering an issue on my Ubuntu 16.04 system but not on my Centos 7 system joined to the same domain (realm.)   I followed the common guidance for using Active Directory with SQL Server on Linux – no luck.  In fact, at times I could get it working, reboot the machine and then it would fail.  I found I did almost everything I could wrong.

  • My host name was mixed case: RDORRLinux but entries in keytab all upper and lower, entries in krb5.conf upper and lower
  • Default DNS naming was incorrectly set causing the system to appear at RDORRLinux.DOMAINNAME.guest.microsoft.com instead of RDORRLinux.DOMAINNAME
  • I reversed the fully qualified and alias in my /etc/hosts entry so lookup of RDORRLinux would not qualify to RDORRLinux.DOMAINNAME
  • I mistyped the loopback address as 127.0.1.1 instead of 127.0.0.1
  • kvno kept changing as I was troubleshooting and running various commands in an attempt to resolve the problem
  • FILE cache type was the default leading to incorrect principal errors

Hopefully your experience is better but just in case you are having a problem here are ways to solve it.


Discovery

export KRB5_TRACE=/dev/stdout  (https://web.mit.edu/kerberos/krb5-current/doc/krb_admins/env_variables.html)

The environment variable ‘KRB5_TRACE’ can be set (be sure to set it for the SQL Server service) to output GSSAPI, Kerberos information. The following shows the output from kinit when the trace is enabled.  Notice that you can see the communication with the KDC, ports in use and other helpful information.

Hint: /dev/stdout can be any file (/var/opt/mssql/errorlog/krb5.log)

kinit username@DOMAINNAME

[54814] 1517843475.867862: Getting initial credentials for username@DOMAINNAME
[54814] 1517843475.868170: Sending request (175 bytes) to DOMAINNAME
[54814] 1517843475.868485: Sending initial UDP request to dgram 99.99.99.99:88
[54814] 1517843475.870309: Received answer (197 bytes) from dgram 99.99.99.99:88
[54814] 1517843475.870394: Response was from master KDC
[54814] 1517843475.870441: Received error from KDC: -1765328359/Additional pre-authentication required
[54814] 1517843475.870496: Processing preauth types: 16, 15, 19, 2
[54814] 1517843475.870525: Selected etype info: etype aes256-cts, salt "DOMAINNAMEusername", params ""
Password for username@DOMAINNAME:
[54814] 1517843478.777949: AS key obtained for encrypted timestamp: aes256-cts/50D4
[54814] 1517843478.778056: Encrypted timestamp (for 1517843478.695644): plain 301AA011180F32303138303230353135313131385AA10502030A9D5C, encrypted 8F0991BA378FA2BA109A3F16C15EBC78D42F3A75F98D1B7801A95EF5AF69798A75919C9548397C458B7C3A355661ADB431841F795EB26A00
[54814] 1517843478.778106: Preauth module encrypted_timestamp (2) (real) returned: 0/Success
[54814] 1517843478.778124: Produced preauth for next request: 2
[54814] 1517843478.778165: Sending request (253 bytes) to DOMAINNAME
[54814] 1517843478.778288: Sending initial UDP request to dgram 99.99.99.99:88
[54814] 1517843478.780566: Received answer (104 bytes) from dgram 99.99.99.99:88
[54814] 1517843478.780646: Response was from master KDC
[54814] 1517843478.780686: Received error from KDC: -1765328332/Response too big for UDP, retry with TCP
[54814] 1517843478.780707: Request or response is too big for UDP; retrying with TCP
[54814] 1517843478.780716: Sending request (253 bytes) to DOMAINNAME (tcp only)
[54814] 1517843478.780769: Initiating TCP connection to stream 99.99.99.99:88
[54814] 1517843478.781592: Sending TCP request to stream 99.99.99.99:88
[54814] 1517843478.783394: Received answer (1550 bytes) from stream 99.99.99.99:88
[54814] 1517843478.783424: Terminating TCP connection to stream 99.99.99.99:88
[54814] 1517843478.783506: Response was from master KDC
[54814] 1517843478.783577: Processing preauth types: 19
[54814] 1517843478.783603: Selected etype info: etype aes256-cts, salt "DOMAINNAMEusername", params ""
[54814] 1517843478.783614: Produced preauth for next request: (empty)
[54814] 1517843478.783636: AS key determined by preauth: aes256-cts/50D4
[54814] 1517843478.783717: Decrypted AS reply; session key is: aes256-cts/7EFA
[54814] 1517843478.783756: FAST negotiation: unavailable
[54814] 1517843478.783797: Initializing KEYRING:persistent:0:0 with default princ username@DOMAINNAME
[54814] 1517843478.783874: Storing username@DOMAINNAME -> krbtgt/DOMAINNAME@DOMAINNAME in KEYRING:persistent:0:0
[54814] 1517843478.783991: Storing config in KEYRING:persistent:0:0 for krbtgt/DOMAINNAME@DOMAINNAME: pa_type: 2
[54814] 1517843478.784027: Storing username@DOMAINNAME -> krb5_ccache_conf_data/pa_type/krbtgt\/DOMAINNAME\@DOMAINNAME@X-CACHECONF: in KEYRING:persistent:0:0

klist

The klist utility is helpful to show things like the currently cached Kerberos information. Especially when encountering ‘principal does not match’ errors.

klist

Ticket cache: KEYRING:persistent:0:0
Default principal: username@DOMAINNAME

Valid starting       Expires              Service principal
02/05/2018 09:11:18  02/05/2018 19:11:18  krbtgt/DOMAINNAME@DOMAINNAME
        renew until 02/06/2018 09:11:15

klist is also helpful for displaying keytab information (/etc/krb5.keytab, /var/opt/mssql/secrets/mssql.keytab)

klist -e -k /etc/krb5.keytab

Keytab name: FILE:/etc/krb5.keytab
KVNO Principal
---- --------------------------------------------------------------------------
8 MACHINE$@DOMAINNAME (des-cbc-crc)
8 MACHINE$@DOMAINNAME (des-cbc-md5)
8 MACHINE$@DOMAINNAME (arcfour-hmac)
8 MACHINE$@DOMAINNAME (aes128-cts-hmac-sha1-96)
8 MACHINE$@DOMAINNAME (aes256-cts-hmac-sha1-96)

kvno

The version number of the entry maintained on the KDC vs the kvno stored on the local system must match.  As an example, many systems rotate the machine account password on a regular basis and changing of the password updates the version number on the KDC.

Hint: On testing systems (use with extreme caution) I often set the KDC policy to disable password updates for machine accounts.  This reduces maintenance of my local keytab entries.

kvno machine@DOMAINNAME

[54970] 1517844447.764573: Getting credentials username@DOMAINNAME -> machine@DOMAINNAME using ccache KEYRING:persistent:0:0
[54970] 1517844447.764904: Retrieving username@DOMAINNAME -> machine@DOMAINNAME from KEYRING:persistent:0:0 with result: -1765328243/Matching credential not found
[54970] 1517844447.765100: Retrieving username@DOMAINNAME -> krbtgt/DOMAINNAME@DOMAINNAME from KEYRING:persistent:0:0 with result: 0/Success
[54970] 1517844447.765128: Starting with TGT for client realm: username@DOMAINNAME -> krbtgt/DOMAINNAME@DOMAINNAME
[54970] 1517844447.765137: Requesting tickets for machine@DOMAINNAME, referrals on
[54970] 1517844447.765188: Generated subkey for TGS request: aes256-cts/91DF
[54970] 1517844447.765261: etypes requested in TGS request: aes256-cts, aes128-cts, des3-cbc-sha1, rc4-hmac, camellia128-cts, camellia256-cts
[54970] 1517844447.765446: Encoding request body and padata into FAST request
[54970] 1517844447.765555: Sending request (1680 bytes) to DOMAINNAME
[54970] 1517844447.765878: Initiating TCP connection to stream 99.99.99.99:88
[54970] 1517844447.767266: Sending TCP request to stream 99.99.99.99:88
[54970] 1517844447.770017: Received answer (1697 bytes) from stream 99.99.99.99:88
[54970] 1517844447.770049: Terminating TCP connection to stream 99.99.99.99:88
[54970] 1517844447.770137: Response was from master KDC
[54970] 1517844447.770179: Decoding FAST response
[54970] 1517844447.770329: FAST reply key: aes256-cts/2D0F
[54970] 1517844447.770394: TGS reply is for username@DOMAINNAME -> machine@DOMAINNAME with session key aes256-cts/C56F
[54970] 1517844447.770439: TGS request result: 0/Success
[54970] 1517844447.770446: Received creds for desired service machine@DOMAINNAME
[54970] 1517844447.770456: Storing username@DOMAINNAME -> machine@DOMAINNAME in KEYRING:persistent:0:0

machine@DOMAINNAME: kvno = 8

Case Sensitivity

Pay close attention to your use of casing.  A mixed case name (UserName) will often lead to a failure when compared to username or USERNAME in the keytab, cache, /etc/hosts, or /etc/krb5.conf files.   Mixed case is valid as long as all locations maintain the same casing.

SQL Server (SQLPAL) Logging (/var/opt/mssql/logger.ini)

SQL Server on Linux can be configured (dynamically) to log additional information.  Here is an example logger.ini to enable SQL Server on Linux, Kerberos tracing.  (Helpful Levels: Silent, Error, Warning and Debug)

[Output:Console]
Type = StdOut

[Output:File]
Type = File
Filename = /var/opt/mssql/log/trace.log

[Logger]
Level = Debug
Outputs = File

[Logger.security.kerberos]
Level = Debug
Outputs = Console

Example Logger Output

01/16/2018 21:46:30.858159060 Error [security.kerberos] <35352/0x3c4> Error acquiring credentials in GetUserCredHandle MAJOR: 0x000D0000 Unspecified GSS failure. Minor code may provide more information

01/16/2018 21:46:30.858227860 Error [security.kerberos] <35352/0x3c4> Error acquiring credentials in GetUserCredHandle MINOR: 0x025EA100 Principal in credential cache does not match desired name

Major Error Code (Value << type_offset)

  • Value << 24 are caller issues  (incorrect password for example)
  • Value << 16 are internal routine based failures (Can’t contact KDC for example)

Minor Error Code (Reference: map_error)

  • Minor Error Code is an encoded value (think HRESULT) value.  When the value is small it represents a system error (Example: ENOMEM) and when larger the defined GSS error.

journalctl –all

When using the trace facilities ask for all journal outputs.

SQL Server Ring Buffers

A select * from sys.dm_os_ring_buffers provides the security, ring buffer entries.   These can be helpful in determining the source of a problem.

select * from sys.dm_os_ring_buffers where ring_buffer_type like'%SEC%'

<Record id = "762845" type ="RING_BUFFER_SECURITY_ERROR" time ="209127296"><Error><SPID>63</SPID>
<APIName>LookupAccountSid</APIName><CallingAPIName>LookupAccountSidInternal</CallingAPIName>...

realm

The realm utility is used to join a machine into the domain.  The list and discovery options can be helpful to see what the system thinks.

realm list

DOMAINNAME
type: kerberos
realm-name: DOMAINNAME
domain-name: DOMAINNAME
configured: kerberos-member
server-software: active-directory
client-software: sssd
required-package: sssd-tools
required-package: sssd
required-package: libnss-sss
required-package: libpam-sss
required-package: adcli
required-package: samba-common-bin
login-formats: %U@domainname
  login-policy: allow-realm-logins

hostname

The hostname utility outputs the name of the local system used when making GSSAPI calls.

hostname
MACHINE

hostname -A
MACHINE.guest.DOMAIN.com deknox-ubuntu.DOMAIN.com

Using the facilities previously described I usually solve my problem quickly.


Tips and Tricks

No troubleshooting effort goes without uncovering additional tips or tricks.

Kerberos Cache = KEYRING

Make sure your Kerberos cache is KEYRING (DIR works as well) and not FILE or MEMORY.  For example, the default (/etc/krb5.conf) on Ubuntu set the cache to the type = FILE.  FILE and MEMORY do not allow multiple principle activities as KEYRING and DIR do, causing SQL Server on Linux to encounter ‘mismatched principal’ errors.  You can use klist to explore the current cache on your system.

Entry for KEYRING from my Ubuntu, /etc/krb5.conf file

[librarydefaults]
    default_ccache_name = KEYRING:persistent:%{uid}

KEYRING is generally considered more secure than FILE or DIR and can provide a bit of a performance boost for high traffic systems.

Reduce UDP Traffic

When tracing (KRB5_TRACE) I often noticed a UDP call to the KDC that returned a failure due to the length of data returned.  Then GSSAPI falls back to TCP and completed the request successfully.  You can remove many of the UDP attempts by setting the udp_perference_limit in the /etc/krb5.conf file, removing the traffic and improving performance.

[libdefaults]
udp_preference_limit = 0


Go Hardcore

Caution: Speaking from personal experience, is easy to change or damage your system when engaging in advanced debugging activates.  I strongly recommend you attempt this on test system!

Debugging GSSAPI 

If you are really into understanding GSSAPI you can build the library and use gdb, lldb or your favorite debugger to investigate.   I tend to use Eclipse with gdb so I have advanced visualizations, find and other features handy and full control of gdb via the command window.

Hint: You may be able to download/update your current GSSAPI package with symbols instead of building yourself.  Reference dbgsym packages such as krb5-user-dbgsym.

  1. Install byacc (ex: apt-get installl byacc)
  2. Download source
  3. cd to the SRC directory for KRB5
  4. ./configure
  5. make -j "CFLAGS=-ggdb -DDEBUG"
  6. mkdir /home/rdorr/krb5test
  7. make install DESTDIR=/home/username/krb5test
  8. export LD_LIBRARY_PATH=/home/username/krb5test/usr/local/lib
  9. Start debugging

Helpful Breakpoints

To break near the location of an error

  • map_error
  • krb5_set_error_message

Others

  • resolve_server
  • scan_ccache
  • gss_init_sec_context
  • krb5_unpase_name
  • krb5_cc_cache_match
  • krb5_trace_callback
  • krb5_kt_file_get_entry | TRACE_KT_GET_ENTRY
  • acquire_int_cred
  • get_cache_for_name
  • kbr5_cc_get_principal
  • krb5_principal_compare

Bob Dorr - Principal Software Engineer SQL Server