Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
condor:submit:checkpointing [2011/08/05 14:47] garrettheath4condor:submit:checkpointing [2011/08/05 16:02] (current) – [Save a Checkpoint without Exiting] garrettheath4
Line 13: Line 13:
 to compile your ''HellowOrld'' program, simply stick ''condor_compile'' in front of this command to compile your program to include the extra Condor features: to compile your ''HellowOrld'' program, simply stick ''condor_compile'' in front of this command to compile your program to include the extra Condor features:
 <code bash>condor_compile gcc -o HellowOrld HellowOrld.c</code> <code bash>condor_compile gcc -o HellowOrld HellowOrld.c</code>
 +**Note** that this will overwrite any existing non-Condor compilation called "''HellowOrld''" It may be a good idea to compile your program normally first and then compile it again with ''condor_compile'' but with a different name, such as ''condor_compile gcc -o HellowOrldCondor HellowOrld.c'' Although a program compiled with ''condor_compile'' should still have the same non-Condor features and functionality as one compiled normally, it may help with debugging to still have a copy of your program that was compiled normally.
  
-=====How to do Standalone Checkpointing=====+=====Standalone Checkpointing=====
 When you compile your program with ''condor_compile'', you can still run it directly from the command line like you normally would.  In fact, you can even use checkpointing in your program without having to submit your job to Condor.  After you recompile your program with ''condor_compile'', a lot of features become available for you to use. When you compile your program with ''condor_compile'', you can still run it directly from the command line like you normally would.  In fact, you can even use checkpointing in your program without having to submit your job to Condor.  After you recompile your program with ''condor_compile'', a lot of features become available for you to use.
  
Line 22: Line 23:
 simply run it with ''setarch x86_64 -R -L'' in front of it, like so: simply run it with ''setarch x86_64 -R -L'' in front of it, like so:
 <code bash>setarch x86_64 -R -L ./HellowOrld arg1 arg2 ...</code> <code bash>setarch x86_64 -R -L ./HellowOrld arg1 arg2 ...</code>
 +
 +:!: **Note**: This page assumes you have a 64-bit processor.  If you have an older 32-bit processor architecture, use ''setarch i386'' instead of ''setarch x86_64'' when running your ''condor_compile''d program on your own 32-bit computer.  To find out what kind of processor you have, run ''uname -p'' and use whatever is printed as the argument for ''setarch'' commands.
  
 Your program will run normally on your computer.  To kill it, press ''Ctrl-C'' like you normally do.  This sends the ''SIGINT'' signal to your program which causes programs to die immediately. Your program will run normally on your computer.  To kill it, press ''Ctrl-C'' like you normally do.  This sends the ''SIGINT'' signal to your program which causes programs to die immediately.
Line 28: Line 31:
 To make your program save a checkpoint and exit, first run your ''condor_compile''d program with checkpointing enabled (see above).  With your program running, simply press ''Ctrl-Z'' This sends the ''SIGSTOP'' signal to the currently running program.  Usually, the ''SIGSTOP'' signal tells a process to freeze but not exit, but Condor has configured your program to interpret this signal to mean that it should create a checkpoint and then exit.  When this happens, your program will create a checkpoint file with the same name as the program but ending in ''.ckpt'' So, if you checkpoint your ''HellowOrld'' program, the checkpoint file will be saved as ''HellowOrld.ckpt''. To make your program save a checkpoint and exit, first run your ''condor_compile''d program with checkpointing enabled (see above).  With your program running, simply press ''Ctrl-Z'' This sends the ''SIGSTOP'' signal to the currently running program.  Usually, the ''SIGSTOP'' signal tells a process to freeze but not exit, but Condor has configured your program to interpret this signal to mean that it should create a checkpoint and then exit.  When this happens, your program will create a checkpoint file with the same name as the program but ending in ''.ckpt'' So, if you checkpoint your ''HellowOrld'' program, the checkpoint file will be saved as ''HellowOrld.ckpt''.
  
 +====Save a Checkpoint without Exiting====
 +To make your program save a checkpoint but not exit, run your ''condor_compile''d program with checkpointing enabled.  With your program running in one terminal window, open up a separate terminal window and run the ''killall'' command with the ''-s USR2'' argument to send your program the ''SIGUSR2'' signal, like so:
 +<code bash>killall -s USR2 HellowOrld</code>
 +Your program will receive the command and create a checkpoint but will continue running.  That way, if the program is killed later and not given the chance to make a new checkpoint, you can start the program from where it left off at the checkpoint instead of completely starting over.
 +
 +====Start from Checkpoint====
 +If you have checkpointed a program and so have the ''*.ckpt'' file, you can resume the program from the point that it was checkpointed.  If, for example, your program is called ''HellowOrld'' and your checkpoint file is called ''HellowOrld.ckpt'', run this to load the program from the checkpoint file:
 +<code bash>setarch x86_64 -R -L ./HellowOrld -_condor_restart HellowOrld.ckpt</code>
  
  
condor/submit/checkpointing.1312555646.txt.gz · Last modified: 2011/08/05 14:47 by garrettheath4
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0