Auto-restart crashed mining processes in ethOS 1.2.9

ethOS 1.2.9 brings a few changes which break my auto-restart script for ethOS 1.2.7. Since 1.2.9 contains improved GPU crash detection, I rewrote the restart script to use the built-in detection mechanisms. For the required cron job please see my initial post which is available here.

As long as the DRY_RUN variable is set to false, the script won’t take any action, it just logs what it would do. The REBOOT_ON_CRASH variable can be set to true if you want to auto-restart the mining rig once ethOS detects a hard GPU crash (defunct mining processes). By default, the script will only restart the mining processes once ethOS detects a soft GPU crash.

 

Auto-restart crashed mining processes in ethOS 1.2.7

I have been running a crypto-currency mining rig on the Linux based ethOS distro for quite some time now. While I realize that ethOS is problematic license-wise, it’s still a great distro to get a mining rig up and running in almost no time. The Nvidia GPUs in my rig are well tuned to operate at their optimum cost/hashrate ratio. However, due to bugs in the miner and/or the GPU drivers, every few days one of the GPUs stops mining. Sometimes ethOS is able to recover the GPU and gets it back to mining but sometimes it doesn’t seem to detect the crashed/hanging mining process at all. This is why I added a cron job that runs every minute and checks if all GPUs are still mining. If not, the miners will be re started using the minestop and minestart commands provided by ethOS.

The cron job starts the Bash script below. If it detects a problem, it writes to the console and additionally to /tmp/rigcheck.log. It’s been running smoothly on my ethOS v1.2.7 mining rig. I recommend putting it in /home/ethos/rigcheck.sh and don’t forget to add execute permissions using chmod +x /home/ethos/rigcheck.sh

The cron job can be created like this:

cat << EOF | sudo tee /etc/cron.d/rigcheck
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
* * * * *   root    /home/ethos/rigcheck.sh
EOF

Thanks to this script, crashed or hanging miners will be restarted fairly quickly and my rig’s pool-reported hashrate stopped dropping in such situations.