Table of Contents
Benefits of Using condor_compile
In order to run jobs in the “standard” universe, which supports special Condor features like checkpointing and I/O redirection.
“Checkpointing” means that if Condor has to stop your job for any reason and move your job to another computer, such as if a person logged on to the computer, Condor can save the state of your program and use that state to start your program on another computer from where it left off. So, if you have a program that takes 8 hours to run to completion but Condor has to kick it off of a computer that was running it for 6 hours, Condor will start your program on anther computer but at the 6 hour mark.
“I/O redirection” means that any data your program needs to run will be fetched from the computer you submitted the program on. So, if you submit a program from your home directory on your computer that needs a file called
input.txt in your home directory, the program will open the file located on your computer instead of trying to find it on the execute machine that it is actually running on.
How to Use condor_compile
Luckily, Condor makes it easy to recompile your code to incorporate these features. All you have to do is run
condor_compile with the command that you normally use to compile your program. For example, if you normally run
gcc -o HellowOrld HellowOrld.c
to compile your
HellowOrld program, simply stick
condor_compile in front of this command to compile your program to include the extra Condor features:
condor_compile gcc -o HellowOrld HellowOrld.c
Note that this will overwrite any existing non-Condor compilation called “
HellowOrld”. It may be a good idea to compile your program normally first and then compile it again with
condor_compile but with a different name, such as
condor_compile gcc -o HellowOrldCondor HellowOrld.c. Although a program compiled with
condor_compile should still have the same non-Condor features and functionality as one compiled normally, it may help with debugging to still have a copy of your program that was compiled normally.
When you compile your program with
condor_compile, you can still run it directly from the command line like you normally would. In fact, you can even use checkpointing in your program without having to submit your job to Condor. After you recompile your program with
condor_compile, a lot of features become available for you to use.
When you run your program from the command line, you need to put a command in front of it to make it able to checkpoint properly. The purpose of this command is to disable memory address randomization which is enabled by default. If you normally run your program like so:
./HellowOrld arg1 arg2 ...
simply run it with
setarch x86_64 -R -L in front of it, like so:
setarch x86_64 -R -L ./HellowOrld arg1 arg2 ...
Note: This page assumes you have a 64-bit processor. If you have an older 32-bit processor architecture, use
setarch i386 instead of
setarch x86_64 when running your
condor_compiled program on your own 32-bit computer. To find out what kind of processor you have, run
uname -p and use whatever is printed as the argument for
Your program will run normally on your computer. To kill it, press
Ctrl-C like you normally do. This sends the
SIGINT signal to your program which causes programs to die immediately.
Save a Checkpoint and Exit
To make your program save a checkpoint and exit, first run your
condor_compiled program with checkpointing enabled (see above). With your program running, simply press
Ctrl-Z. This sends the
SIGSTOP signal to the currently running program. Usually, the
SIGSTOP signal tells a process to freeze but not exit, but Condor has configured your program to interpret this signal to mean that it should create a checkpoint and then exit. When this happens, your program will create a checkpoint file with the same name as the program but ending in
.ckpt. So, if you checkpoint your
HellowOrld program, the checkpoint file will be saved as
Save a Checkpoint without Exiting
To make your program save a checkpoint but not exit, run your
condor_compiled program with checkpointing enabled. With your program running in one terminal window, open up a separate terminal window and run the
killall command with the
-s USR2 argument to send your program the
SIGUSR2 signal, like so:
killall -s USR2 HellowOrld
Your program will receive the command and create a checkpoint but will continue running. That way, if the program is killed later and not given the chance to make a new checkpoint, you can start the program from where it left off at the checkpoint instead of completely starting over.
Start from Checkpoint
If you have checkpointed a program and so have the
*.ckpt file, you can resume the program from the point that it was checkpointed. If, for example, your program is called
HellowOrld and your checkpoint file is called
HellowOrld.ckpt, run this to load the program from the checkpoint file:
setarch x86_64 -R -L ./HellowOrld -_condor_restart HellowOrld.ckpt