Auto-restart crashed mining processes in ethOS 1.2.9

ethOS 1.2.9 brings a few changes which break my auto-restart script for ethOS 1.2.7. Since 1.2.9 contains improved GPU crash detection, I rewrote the restart script to use the built-in detection mechanisms. For the required cron job please see my initial post which is available here.

As long as the DRY_RUN variable is set to false, the script won’t take any action, it just logs what it would do.

 

46 thoughts on “Auto-restart crashed mining processes in ethOS 1.2.9

  1. I´m probably the most stupid, cause I´m not able to get it work. Ethos 1.3.0. I followed all steps.
    1) i´ve created rigcheck.sh – i can see it via ls command and also fileexplorer
    2) i did chmod +x
    3) I´ve added rows into crontab
    (following SDMN instructions, just added “root” in cron, so I´ve */15 * * * * root ….)

    but I´m not able to even run sh from command line. I can edit it by nano ./rigcheck.sh (or nano /home/ethos/rigcheck.sh) but if I try to run the script by sudo ./rigcheck.sh or sudo /home/ethos/rigcheck.sh I´ve got an error
    sudo: unable to execute ./rigcheck.sh (the same with full path) even I´ve entred just first few letters of the filename and pressed TAB to fill it by system.
    any idea? Thanks

    • OK, I get it working, but LOG doesn’t exists after couple of hours. And even one GPU stopped mining (RX580) script says “everything OK”. What I’m doing wrong? Thanks

        • If ethos detects a crashed GPU, the script will reboot the rig. The script can’t help in situations where ethos doesn’t detect a crashed GPU. Detection works pretty well with Nvidia but I don’t know about AMD.

  2. Can I run /opt/ethos/bin/minestop instead of /opt/ethos/bin/r for clock problem?

    It is sufficient to get it going again without needing full reboot?

    Why did you decide to reboot rather than restart the mining process?

  3. Tried running this script. Even when a GPU has crashed and shows a hash of 0, the script still executes and provides a message of “Everything’s fine… exiting”

    What am I doing wrong? I followed all of the steps, made it executable, even restarted the whole rig after applying all the new modifications. So far doesn’t seem to function as intended.

  4. Thank you for this working script,

    I have noticed that sometimes a GPU:s effect goes down and stays down but still reports hashing (but it actually is’nt). Is it possible to implement a check for this and if lower than “threshhold value” reboot
    Regards

  5. Hello, and first off I want to say thank you. I’m not sure I got this working properly, I type nano rigcheck.log and nothing is in there. Maybe I didn’t set it up properly? Please advise. Thank you

    • I had the same problem. I added this to the cron file.
      */15 * * * * home/ethos/rigcheck.sh >> /home/ethos/rigcheck.log

      This echoed my output to the log file. I am new to Linux, so I don’t know if this is the best way to do it, but it worked for me.

  6. Thank you for the time and effort that you have put into developing and posting this script. I am using ethos 1.2.9 and the script almost works for me. It echo’s properly to the log file, and when I have a failed gpu, the log file reports the following “GPU clock problem detected on GPU(s) ${CRASHED}, rebooting…”. That is the extent of what happens. The script does not restart the system. I need to restart it manually. Thank you very much if you have suggestions to resolve this issue.

      • Thanks Jan for the reply. I am not very familiar with Linux OS. I only know enough to take care of my basic needs. I don’t know if I am calling the script as root, I added the path= to the crone file as per the instructions. The log file does not error message, “please run as root.” The logfile says running in dryrun mode so yes I forgot to disable dry run mode.
        Thanks, hopefully disabling dry mode will make it work.

        What is required to run as root, if that is my problem?
        Thanks Again
        Lamar

        • Thanks again Jan. I saw the problem with a gpu last night and I waited for the script to run. It properly rebooted the system and was back to full capacity again.

  7. Hello, when I try and run this script manually (sudo /home/ethos/rigcheck.sh) my console is coming back with:
    Please run as root or, if calling it from a console, use sudo /home/ethos/rigcheck.sh

    Also I cannot see a log file being created at all.

    Can you help please.

  8. hi
    i am using 1.2.9

    i add scripts in rigckeck.sh and add path in cron file and save but

    its note creating log file i try to run manual rigcheck.sh but log file is not creating i am wondering its working or not.

    i try dry_run = fales

    anyone help me to step by step from first.

    • You may have too many insidants of crontab.
      Type crontab -l to list the running jobs under your name. Andelet it buy typing crontab -r
      Type sudo crontap -l to see what is runing under the root. If you see your script, use the command i sent in my reply to restart it or reboot.

    • One more thing. The echo command does not put anything in the log. So you need to add something to see it. The only time your log will creat an insident is when it crashes. So i added this instead of:
      echo “Everything’s fine, exiting…”
      I replaced it with:
      echo “$(date) everything is ok…….” | tee -a ${LOG_FILE}

      • It’s good but it’s not create log file so some. Other issues with script even I manually run rigcheck script but it’s not create log file

        • Creat it manualy. Nano rigcheck.log. write something and try to save. If does not allow you to save it you need to give it permition to edit. Like i think chmod 777 rigcheck.log. try that. Let me know.

        • Dear sdmn
          Same issue
          Check crontab it’s only one command
          Create nano rigcheck.log it’s easy create not getting any permission porblem
          Even I try to disconnect 1 GPU if it’s restart but it’s not restart system

  9. This is great, my rig is never restarting anymore since the 1.2.9 update. I have not tried this yet, is there anything I need to modify in this file to get it to restart every time a GPU stops hashing?

        • 1- copy script to /home/ethos/rigcheck.sh
          use nano eigcheck.sh and then paste the script there, then press cntl-X, Y then Enter to save
          2- in console: sudo chmod +x /home/ethos/rigcheck.sh
          3- in console: sudo crontab -e
          4- choose 2 to edit crontab in nano
          5- add the following new lines:
          PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
          */15 * * * * /home/ethos/rigcheck.sh – this will run the script every 15min
          6- press Ctrl + X, Y then enter to save the new cron job
          7- excute this comman ‘sudo service cron restart’ [ without apostrophes]
          to test you can check the rigcheck.log after every 15min of the hour. by writing nano rigcheck.log

          I hope this help. just follow the steps exactly and you will be good

  10. I am wondering why the script in 1.2.9 has the reboot instead of minestop and minestart
    like in the older scrip
    /opt/ethos/bin/minestop
    sleep 5
    /opt/ethos/bin/minestart

  11. Hey, is there any tutorial for me to get this work? I’m really not into linux but prefer ethOS since it gives me a plus of 5 MH/s each card. Would like to get some help.

  12. Got your script to run manually when calling it, but am not able to get it to run through a cronjob. I have the following in a file in /etc/cron.d/

    PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
    MAILTO=root
    */15 * * * * root /home/ethos/rigcheck.sh

    does spacing between the */15 * * * * and root and then root and /home/ethos/rigcheck.sh make a difference?

    I’ve also granted the script executable permissions with chmod +x.

    Any thing else you can think of?

  13. Thanks for the great script however whenever i try and run it manually, i get the message bin/bash^M: bad interpreter: no such file or directory. Any suggestions would be greatly appreciated.

    • was able to fix it, if anyone else runs into this error;

      open script with vi

      hit esc and type in the following:
      :set fileformat=unix
      then save and quit.

      • Hi Alex,
        got all the scripts saved. I also did the chmod and added your executed :set Fileformat=unix.

        how would I test it. you mentioned you run it manually. how did you do that?

  14. Hi I just tested it. There seems it is not getting all the problems. I had error: possible pool connection problem when it crashes and it is not working. I was thinking to check file stats.file.

Leave a Reply

Your email address will not be published. Required fields are marked *