Syslog troubleshooting guide for Azure Monitor Agent for Linux
Caution
This article references CentOS, a Linux distribution that is End Of Life (EOL) status. Please consider your use and planning accordingly. For more information, see the CentOS End Of Life guidance.
Overview of Azure Monitor Agent for Linux Syslog collection and supported RFC standards:
- Azure Monitor Agent installs an output configuration for the system Syslog daemon during the installation process. The configuration file specifies the way events flow between the Syslog daemon and Azure Monitor Agent.
- For
rsyslog
(most Linux distributions), the configuration file is/etc/rsyslog.d/10-azuremonitoragent-omfwd.conf
. Forsyslog-ng
, the configuration file is/etc/syslog-ng/conf.d/azuremonitoragent-tcp.conf
. - Azure Monitor Agent listens to a TCP port to receive events from
rsyslog
/syslog-ng
. The port for this communication is logged at/etc/opt/microsoft/azuremonitoragent/config-cache/syslog.port
.Note
Before Azure Monitor Agent version 1.28, it used a Unix domain socket instead of TCP port to receive events from rsyslog.
omfwd
output module inrsyslog
offers spooling and retry mechanisms for improved reliability. - The Syslog daemon uses queues when Azure Monitor Agent ingestion is delayed or when Azure Monitor Agent isn't reachable.
- Azure Monitor Agent ingests Syslog events via the previously mentioned socket and filters them based on facility or severity combination from data collection rule (DCR) configuration in
/etc/opt/microsoft/azuremonitoragent/config-cache/configchunks/
. Anyfacility
orseverity
not present in the DCR is dropped. - Azure Monitor Agent attempts to parse events in accordance with RFC3164 and RFC5424. It also knows how to parse the message formats listed in this website.
- Azure Monitor Agent identifies the destination endpoint for Syslog events from the DCR configuration and attempts to upload the events.
Note
Azure Monitor Agent uses local persistency by default. All events received from
rsyslog
orsyslog-ng
are queued in/var/opt/microsoft/azuremonitoragent/events
if they fail to be uploaded.
Issues
You might encounter the following issues.
Rsyslog data isn't uploaded because of a full disk space issue on Azure Monitor Agent for Linux
The next sections describe the issue.
Symptom
Syslog data is not uploading: When you inspect the error logs at /var/opt/microsoft/azuremonitoragent/log/mdsd.err
, you see entries about Error while inserting item to Local persistent storeā¦No space left on device similar to the following snippet:
2021-11-23T18:15:10.9712760Z: Error while inserting item to Local persistent store syslog.error: IO error: No space left on device: While appending to file: /var/opt/microsoft/azuremonitoragent/events/syslog.error/000555.log: No space left on device
Cause
Azure Monitor Agent for Linux buffers events to /var/opt/microsoft/azuremonitoragent/events
prior to ingestion. On a default Azure Monitor Agent for Linux installation, this directory takes ~650 MB of disk space at idle. The size on disk increases when it's under sustained logging load. It gets cleaned up about every 60 seconds and reduces back to ~650 MB when the load returns to idle.
Confirm the issue of a full disk
The df
command shows almost no space available on /dev/sda1
, as shown in the following output. Note that you should examine the line item that correlates to the log directory (for example, /var/log
or /var
or /
).
df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 0 63G 0% /dev
tmpfs 13G 720K 13G 1% /run
/dev/sda1 29G 29G 481M 99% /
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda15 105M 4.4M 100M 5% /boot/efi
/dev/sdb1 251G 61M 239G 1% /mnt
tmpfs 13G 0 13G 0% /run/user/1000
You can use the du
command to inspect the disk to determine which files are causing the disk to be full. For example:
cd /var/log
du -h syslog*
6.7G syslog
18G syslog.1
In some cases, du
might not report any large files or directories. It might be possible that a file marked as (deleted) is taking up the space. This issue can happen when some other process has attempted to delete a file, but a process with the file is still open. You can use the lsof
command to check for such files. In the following example, we see that /var/log/syslog
is marked as deleted but it takes up 3.6 GB of disk space. It hasn't been deleted because a process with PID 1484 still has the file open.
sudo lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
none 849 root txt REG 0,1 8632 0 16764 / (deleted)
rsyslogd 1484 syslog 14w REG 8,1 3601566564 0 35280 /var/log/syslog (deleted)
Rsyslog default configuration logs all facilities to /var/log/
On some popular distros (for example, Ubuntu 18.04 LTS), rsyslog ships with a default configuration file (/etc/rsyslog.d/50-default.conf
), which logs events from nearly all facilities to disk at /var/log/syslog
. RedHat/CentOS family Syslog events are stored under /var/log/
but in a different file: /var/log/messages
.
Azure Monitor Agent doesn't rely on Syslog events being logged to /var/log/
. Instead, it configures the rsyslog service to forward events over a TCP port directly to the azuremonitoragent
service process (mdsd).
Fix: Remove high-volume facilities from /etc/rsyslog.d/50-default.conf
If you're sending a high log volume through rsyslog and your system is set up to log events for these facilities, consider modifying the default rsyslog config to avoid logging and storing them under /var/log/
. The events for this facility would still be forwarded to Azure Monitor Agent because rsyslog uses a different configuration for forwarding placed in /etc/rsyslog.d/10-azuremonitoragent-omfwd.conf
.
For example, to remove
local4
events from being logged at/var/log/syslog
or/var/log/messages
, change this line in/etc/rsyslog.d/50-default.conf
from this snippet:*.*;auth,authpriv.none -/var/log/syslog
To this snippet (add
local4.none;
):*.*;local4.none;auth,authpriv.none -/var/log/syslog
sudo systemctl restart rsyslog