Disk Full in SPS 6.12

Hi Safeguard experts,

We experienced 100% disk full in SPS and now it rejects all the incoming connections. Is there a way to quickly check what files are taking the space and free up the disk from the admin page? We have already defined a cleanup policy (to delete data after 4 days and scheduled daily) and enabled the cleanup if it reaches 90% of the disk capacity. However it still grows beyond that limit.  Any suggestions are appreciated. Thank you.

Ronald

Parents
  • We followed this guide Recover from full disk situation (261510) (oneidentity.com) but we are not able to restart the lighttpd.service. Could anyone help? Thank you

    (core/master/test)root@localhost:~# df -h /mnt/drbd
    Filesystem Size Used Avail Use% Mounted on
    none 84G 56G 24G 71% /
    (core/master/test)root@localhost:~# systemctl restart lighttpd.service
    Failed to restart lighttpd.service: Unit lighttpd.service not found.

  • HI Ronald,

    Please refer to this KB in regards to the Web Service, should be nginx rather than lighttpd in the newer SPS versions:

    Try: systemctl restart nginx.service

    https://support.oneidentity.com/one-identity-safeguard-for-privileged-sessions/kb/333085/what-web-server-service-is-running-on-the-sps-appliance

    Thanks!

  • Hi Tawfig,

    Thanks for the information. I tried that command and it returned the following

    (boot/master/test)root@localhost:~# systemctl restart nginx.service
    Job for nginx.service failed because the control process exited with error code.
    See "systemctl status nginx.service" and "journalctl -xe" for details.

    boot/master/test)root@localhost:~# systemctl status nginx.service
    â—Ź nginx.service - A high performance web server and a reverse proxy server
    Loaded: loaded (/lib/systemd/system/nginx.service; disabled; vendor preset>
    Active: failed (Result: exit-code) since Tue 2022-06-07 08:51:58 AEST; 2mi>
    Docs: man:nginx(8)
    Process: 2042614 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_pr>
    Process: 2042618 ExecStart=/usr/sbin/nginx -g daemon on; master_process on;>

    Jun 07 08:51:55 localhost systemd[1]: Starting A high performance web server an>
    Jun 07 08:51:55 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
    Jun 07 08:51:56 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
    Jun 07 08:51:56 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
    Jun 07 08:51:57 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
    Jun 07 08:51:57 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
    Jun 07 08:51:58 localhost nginx[2042618]: nginx: [emerg] still could not bind()
    Jun 07 08:51:58 localhost systemd[1]: nginx.service: Control process exited, co>
    Jun 07 08:51:58 localhost systemd[1]: nginx.service: Failed with result 'exit-c>
    Jun 07 08:51:58 localhost systemd[1]: Failed to start A high performance web se>

    (boot/master/test)root@localhost:~# journalctl -xe
    -- The unit nginx.service has entered the 'failed' state with result 'exit-code>
    Jun 07 08:51:58 localhost systemd[1]: Failed to start A high performance web se>
    -- Subject: A start job for unit nginx.service has failed
    -- Defined-By: systemd
    -- Support: http://www.ubuntu.com/support
    --
    -- A start job for unit nginx.service has finished with a failure.
    --
    -- The job identifier is 647 and the job result is failed.
    Jun 07 08:51:58 localhost systemd[1]: bootfw-httpd.service: Succeeded.
    -- Subject: Unit succeeded
    -- Defined-By: systemd
    -- Support: http://www.ubuntu.com/support
    --
    -- The unit bootfw-httpd.service has successfully entered the 'dead' state.
    Jun 07 08:51:58 localhost systemd[1]: Stopped HTTPd on boot firmware to serve u>
    -- Subject: A stop job for unit bootfw-httpd.service has finished
    -- Defined-By: systemd
    -- Support: http://www.ubuntu.com/support
    --
    -- A stop job for unit bootfw-httpd.service has finished.
    --
    -- The job identifier is 700 and the job result is done.
    lines 1129-1151/1151 (END)

  • Hi Ronald,

    To check the disk usage, please run the commands below:

    - Size of all audit trail files
    du -sch /mnt/firmware/var/lib/zorp/audit

    - Size of all system logs
    du -sch /var/log

    - Size of the metadb
    du -sch /var/lib/postgresql

    If it seems that the large amount of data is in the audit trail path then navigate there and run the following command:

    du /mnt/firmware/var/lib/zorp/audit -hxa -t 1G | sort -rh | head -20

    This should show the top 20 audit trails larger than 1 GiB

    if the issue is in the logs then you can run the same for the logs path:

    du /var/log -hxa -t 1G | sort -rh | head -20

    From here you can decide to either delete one of the large files if no longer needed.

    Once space is recovered, you can verify it using the command below:

    df -h

    Then reboot the appliance.

  • Hi Tawfiq,

    Thanks for the reply. We removed some of the logs and it is now 68%. We rebooted the appliance from both in the admin console and from SPS web interface. However, it is still not connecting.

    (boot/master/test)root@localhost:~# df -h
    Filesystem Size Used Avail Use% Mounted on
    udev 7.8G 0 7.8G 0% /dev
    tmpfs 1.6G 740K 1.6G 1% /run
    none 9.8G 885M 8.5G 10% /
    /dev/mapper/vg--root-boot 9.8G 885M 8.5G 10% /initrd/mnt
    /dev/loop0 222M 222M 0 100% /initrd/mnt/root-ro
    tmpfs 7.9G 0 7.9G 0% /dev/shm
    tmpfs 5.0M 0 5.0M 0% /run/lock
    tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
    tmpfs 7.9G 16K 7.9G 1% /tmp
    /dev/sdb1 32G 49M 30G 1% /mnt/azure-resource
    /dev/mapper/vg--root-core 84G 54G 26G 68% /mnt/drbd
    /dev/loop1 1.4G 1.4G 0 100% /mnt/firmware-ro
    none 84G 54G 26G 68% /mnt/firmware

    One thing we noticed after rebooting, it shows the following message when I login the admin console. Is there anything we missed?

    +-------------------Error---------------------+
    | |
    | Failed systemd units on core firmware: |
    | close-active-sessions-in-elastic.service |
    | |
    +---------------------------------------------+
    | < OK > |
    +---------------------------------------------+

  • Hi Ronald,

    If web access is now working ok but you are having another issue with connecting via SPS, I would recommend opening a Service request with One Identity support to troubleshoot this issue further.

    Thanks!

Reply Children
No Data