Disk Full in SPS 6.12

Hi Safeguard experts,

We experienced 100% disk full in SPS and now it rejects all the incoming connections. Is there a way to quickly check what files are taking the space and free up the disk from the admin page? We have already defined a cleanup policy (to delete data after 4 days and scheduled daily) and enabled the cleanup if it reaches 90% of the disk capacity. However it still grows beyond that limit. Any suggestions are appreciated. Thank you.

Ronald

Top Replies

s boyko over 3 years ago in reply to ronald chui +1

Hi Ronald.

I have very useful linux command to track sessions record sizes. It's prepares CSV data with records file size, full path, username and ip address. Data is sorted by file size.

echo "session_size…

Parents

0 ronald chui over 3 years ago

We followed this guide Recover from full disk situation (261510) (oneidentity.com) but we are not able to restart the lighttpd.service. Could anyone help? Thank you

(core/master/test)root@localhost:~# df -h /mnt/drbd
Filesystem Size Used Avail Use% Mounted on
none 84G 56G 24G 71% /
(core/master/test)root@localhost:~# systemctl restart lighttpd.service
Failed to restart lighttpd.service: Unit lighttpd.service not found.
Cancel
Up 0 Down

Reply

Verify Answer

Cancel
0 Tawfiq.Ahmad over 3 years ago in reply to ronald chui

HI Ronald,

Please refer to this KB in regards to the Web Service, should be nginx rather than lighttpd in the newer SPS versions:

Try: systemctl restart nginx.service

https://support.oneidentity.com/one-identity-safeguard-for-privileged-sessions/kb/333085/what-web-server-service-is-running-on-the-sps-appliance

Thanks!
Cancel
Up 0 Down

Reply

Verify Answer

Cancel
0 ronald chui over 3 years ago in reply to Tawfiq.Ahmad

Hi Tawfig,

Thanks for the information. I tried that command and it returned the following

(boot/master/test)root@localhost:~# systemctl restart nginx.service
Job for nginx.service failed because the control process exited with error code.
See "systemctl status nginx.service" and "journalctl -xe" for details.

boot/master/test)root@localhost:~# systemctl status nginx.service
● nginx.service - A high performance web server and a reverse proxy server
Loaded: loaded (/lib/systemd/system/nginx.service; disabled; vendor preset>
Active: failed (Result: exit-code) since Tue 2022-06-07 08:51:58 AEST; 2mi>
Docs: man:nginx(8)
Process: 2042614 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_pr>
Process: 2042618 ExecStart=/usr/sbin/nginx -g daemon on; master_process on;>

Jun 07 08:51:55 localhost systemd[1]: Starting A high performance web server an>
Jun 07 08:51:55 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
Jun 07 08:51:56 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
Jun 07 08:51:56 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
Jun 07 08:51:57 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
Jun 07 08:51:57 localhost nginx[2042618]: nginx: [emerg] bind() to 0.0.0.0:443 >
Jun 07 08:51:58 localhost nginx[2042618]: nginx: [emerg] still could not bind()
Jun 07 08:51:58 localhost systemd[1]: nginx.service: Control process exited, co>
Jun 07 08:51:58 localhost systemd[1]: nginx.service: Failed with result 'exit-c>
Jun 07 08:51:58 localhost systemd[1]: Failed to start A high performance web se>

(boot/master/test)root@localhost:~# journalctl -xe
-- The unit nginx.service has entered the 'failed' state with result 'exit-code>
Jun 07 08:51:58 localhost systemd[1]: Failed to start A high performance web se>
-- Subject: A start job for unit nginx.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit nginx.service has finished with a failure.
--
-- The job identifier is 647 and the job result is failed.
Jun 07 08:51:58 localhost systemd[1]: bootfw-httpd.service: Succeeded.
-- Subject: Unit succeeded
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit bootfw-httpd.service has successfully entered the 'dead' state.
Jun 07 08:51:58 localhost systemd[1]: Stopped HTTPd on boot firmware to serve u>
-- Subject: A stop job for unit bootfw-httpd.service has finished
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A stop job for unit bootfw-httpd.service has finished.
--
-- The job identifier is 700 and the job result is done.
lines 1129-1151/1151 (END)
Cancel
Up 0 Down

Reply

Verify Answer

Cancel
0 Tawfiq.Ahmad over 3 years ago in reply to ronald chui

Hi Ronald,

To check the disk usage, please run the commands below:

- Size of all audit trail files
du -sch /mnt/firmware/var/lib/zorp/audit

- Size of all system logs
du -sch /var/log

- Size of the metadb
du -sch /var/lib/postgresql

If it seems that the large amount of data is in the audit trail path then navigate there and run the following command:

du /mnt/firmware/var/lib/zorp/audit -hxa -t 1G | sort -rh | head -20

This should show the top 20 audit trails larger than 1 GiB

if the issue is in the logs then you can run the same for the logs path:

du /var/log -hxa -t 1G | sort -rh | head -20

From here you can decide to either delete one of the large files if no longer needed.

Once space is recovered, you can verify it using the command below:

df -h

Then reboot the appliance.
Cancel
Up 0 Down

Reply

Verify Answer

Cancel
0 ronald chui over 3 years ago in reply to Tawfiq.Ahmad

Hi Tawfiq,

Thanks for the reply. We removed some of the logs and it is now 68%. We rebooted the appliance from both in the admin console and from SPS web interface. However, it is still not connecting.

(boot/master/test)root@localhost:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 7.8G 0 7.8G 0% /dev
tmpfs 1.6G 740K 1.6G 1% /run
none 9.8G 885M 8.5G 10% /
/dev/mapper/vg--root-boot 9.8G 885M 8.5G 10% /initrd/mnt
/dev/loop0 222M 222M 0 100% /initrd/mnt/root-ro
tmpfs 7.9G 0 7.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
tmpfs 7.9G 16K 7.9G 1% /tmp
/dev/sdb1 32G 49M 30G 1% /mnt/azure-resource
/dev/mapper/vg--root-core 84G 54G 26G 68% /mnt/drbd
/dev/loop1 1.4G 1.4G 0 100% /mnt/firmware-ro
none 84G 54G 26G 68% /mnt/firmware

One thing we noticed after rebooting, it shows the following message when I login the admin console. Is there anything we missed?

+-------------------Error---------------------+
| |
| Failed systemd units on core firmware: |
| close-active-sessions-in-elastic.service |
| |
+---------------------------------------------+
| < OK > |
+---------------------------------------------+
Cancel
Up 0 Down

Reply

Verify Answer

Cancel
0 Tawfiq.Ahmad over 3 years ago in reply to ronald chui

Hi Ronald,

If web access is now working ok but you are having another issue with connecting via SPS, I would recommend opening a Service request with One Identity support to troubleshoot this issue further.

Thanks!
Cancel
Up 0 Down

Reply

Verify Answer

Cancel
0 s boyko over 3 years ago in reply to ronald chui

Hi Ronald.

I have very useful linux command to track sessions record sizes. It's prepares CSV data with records file size, full path, username and ip address. Data is sorted by file size.

echo "session_size,session_record_file,username,server_ip,session_start,session_end,session_id"; find /var/lib/zorp/audit/ -type f -size +1G -name *.zat | while read line; do ls -lSh "$line" | awk -v OFS=',' '{print $5,$9}' | tr '\n' ','; psql -U scb scb -t --csv -c "select remote_username,server_ip from channels where audit LIKE '%$line%' LIMIT 1"; done

In situation when elastic service error appears you have to look at elastic service logs and then manually clear some broken data.
Cancel
Up +1 Down

Reply

Verify Answer

Cancel
0 ronald chui over 3 years ago in reply to s boyko

Thank you s boyko . We tried clearing up the disk space but the connection could not be recovered. We eventually spinned another VM and reconfigured it.
Cancel
Up 0 Down

Reply

Verify Answer

Cancel

Reply

0 ronald chui over 3 years ago in reply to s boyko

Thank you s boyko . We tried clearing up the disk space but the connection could not be recovered. We eventually spinned another VM and reconfigured it.
Cancel
Up 0 Down

Reply

Verify Answer

Cancel

Children

No Data