Note: This page is deprecated. The new TODO list is located on this Tasks To-Do List.
john
to be an email server so that condor-admin@condor.cs.wlu.edu
can receive mail (and just forward it to me).sendmail
service on terras
. That way, the sendmail
service doesn't have to run all the time.condor@condor.cs.wlu.edu
to send Condor mail and condor-admin@condor.cs.wlu.edu
to receive mail (and forward to me).condor-admin@condor.cs.wlu.edu
in mail map file, wiki homepage, Condor configuration file, and installation script.root
users from modifying files.RANK
function so that jobs are allocated in a breadth-first manner instead of a depth-first manner. Evenly distribute the work between machines. Ex: 4 submitted jobs on 4 machines with 2 slots each should run with one job on each of the 4 machines instead of all the jobs running on the first two machines and none on the second two machines. Use the TotalLoadAvg
variable in the RANK
function to take the total CPU usage into account.Shell
object __init__
method. (See TODO there.)queueMany(argsLst, outLst, inLst, logLst, …)
method in the Condor module with a variable number of arguments.Condor.submit()
is run so that the user can (manually or automatically) poll for the current status of their jobs, remove the jobs, etc.babbage
's configuration variables so that mouse and keyboard activity more readily boots a running job. Then, test the SUSPEND
, VACATE
, and KILL
settings.condor_compile
(compile as python3condor
and idle3condor
, for example) to be able to checkpoint any Python program (if the user specifies python3condor
as the executable).condor_compile
to be able to checkpoint basic (portable) MATLAB code.condor_cod
“Computing on Demand”, such as to run full graphical MATLAB in parallelcondor_credd
for secure user credentials storage (for Windows machines only?).yum
? but VMware more user friendly)Items listed here are completed items from the list above. Items should be listed in earliest-to-latest order (more or less) and appended with the signature of the user who completed the task. Note: Cross out unimportant items.
/mnt/config/hosts/condor_config_global
condor.cs.wlu.edu
to DNS to refer to fred.cs.wlu.edu
.fred.cs.wlu.edu
to condor.cs.wlu.edu
?LOCAL_CONFIG_FILE
(comma separated?). Use ~/condor/Settings Tree.mm
for reference.AddHost.sh
script in /mnt/config/hosts to create directories for a new host.condor
user on terras
. Make the local condor
users use that dummy UID and GID. Compare /etc/passwd
from all pool machines and from terras
to find a suitable UID and GID for condor
. All machines should use the same CONDOR_IDS
for the sake of file permission management on the NAS.CONDOR_IDS
config variable on all machines, whether they be globally or locally defined./etc/condor
directories.CONDOR_CONFIG
environment variable for all members of the Condor pool. Global environment variables a bad idea?/var/lib/condor/condor_config
to link to the global configuration file on the NAS. This location is considered to be the home folder of the condor
users./etc/init.d/condor
file on the NAS? Networking should be started by the time it is run.carl
and fred
by implementing user-based authentication instead of host-wide authentication. Also added password authentication to enhance security of system. This fixed the following error: PERMISSION DENIED to unauthenticated@unmapped from host 137.113.118.64 for command 60008 (DC_CHILDALIVE), access level DAEMON: reason: DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used for this host: 137.113.118.64,carl.cs.wlu.edu,carl
07/21/11 16:25:21 attempt to connect to <137.113.118.65:52652> failed: No route to host (connect errno = 113). 07/21/11 16:25:21 ERROR: SECMAN:2004:Failed to create security session to <137.113.118.65:52652> with TCP. |SECMAN:2003:TCP connection to <137.113.118.65:52652> failed. 07/21/11 16:25:21 Failed to initiate socket to send MATCH_INFO to slot1@fred.cs.wlu.edu
Opened port range on firewall on pool member machines for Condor to use.
KeyboardIdle
condition to the RANK
configuration. That way, the START
condition won't depend on keyboard idle time but a longer idle time will make jobs more likely to run on that slot.ALLOW_CONFIG
access to john.cs.wlu.edu
. I did this only for testing purposes, but it was a rather large security hole to allow anyone who could log into john
to change the configuration settings. Before this, the condor_config_val
program didn't want to authenticate no matter what I did and so always presented itself as “unauthenticated@unmapped” to the condor_master
. Now that I changed the security settings (SEC_*_AUTHENTICATION
), the condor_config_val
program now wants to authenticate and properly presents itself as “koller@cs.wlu.edu”.babbage.cs.wlu.edu
to the Condor pool by installing Condor on it with the wlu-cs-condor-install.sh
script. Because of this, I was able to test the script for a clean-install and fix some bugs that I noticed. The script didn't create the symbolic link for the shared configuration file, for example, so I fixed that. The script also needed to create the /var/run/condor
directory for the boot script to store the pidfile there, so I added it to the script. — Garrett Koller 2011/07/26 12:09/mnt/config/scripts/
. — Garrett Koller 2011/07/27 15:30/mnt/config/scripts/
to prompt for the operating system and processor architecture and then pick the appropriate release directory to install from. — Garrett Koller 2011/07/27 15:31AddHost.sh
script to the /mnt/config/scripts/
folder. uname -p
command to just make the script universal and centralized.wlu-cs-condor-uninstall.sh
to undo everything in wlu-cs-condor-install.sh
— Garrett Koller 2011/07/27 15:31condor
UID and GID to 64. Since 64 is in the system ID range, it will not be assigned to normal users. Furthermore, Condor might (or might not) be “well-known” to use 64 for its UID and GID. 64 is “well-known” to be used for the condor
user and group according to the Red Hat Enterprise Linux 6 Deployment Guide. In order to change the ownership of all of the old condor
user and group files, I ran the following script on all of the machines in the Condor pool: for file in `find / -uid 1344` do sudo chown -v condor "$file" done for file in `find / -gid 1610` do chown -v :condor "$file" done
— Garrett Koller 2011/07/28 15:03
PoolMembers
variable in the global configuration file. I did this with a few simple sed
commands. — Garrett Koller 2011/07/29 14:41checkPrimes
program on Condor and test checkpointing functionality.condor_qedit
to edit the classads of jobs on the Condor queuecondor checkpoint
condor_config_val
to manually control preemptions and checkpointscondor_status
may say it has 1.5 Gigs, a job's RAM usage is not limited to this number. Instead, this number seems to come into play only during the negotiation phase to figure out where the job will run and only if the user who submitted the job bothered to mention the amount of RAM their job would need in the Requirements
variable and possibly the ImageSize
variable (which is more like the checkpoint file's size).RANK
expression error. RANK
refers to the machine's preferences for jobs that run on it, so the Owner
variable is applicable since it is defined in the job's ClassAd. DEFAULT_RANK
, on the other hand, refers to how a job ranks the machines it wants to run on, so the SlotID
, KeyboardIdle
, and LoadAvg
variables are applicable since they are defined in the machine's ClassAd.condor_compile
. Thus, checkpointing also works.condor_compile
command. — Garrett Koller 2011/08/05 17:34DEFAULT_RANK
is now configured to use the “last” CPUs before using the “first” CPUs on a machine, since “first” CPUs are used by the user first.SetEffectiveOwner(kollerg) failed with errno=13: Permission denied.
. I enabled full debugging for the Shadow daemon, but couldn't find the source of the error.OwnerCheck(condor_pool) failed in SetAttribute for job 41.0
”. I enabled full debugging for the Scheduler daemon, but no luck there either.FS
and FS_REMOTE
. I then simply fixed the authentication to try full username authentication first in order to authenticate processes based on their user ID instead of their oh-I-happen-to-have-a-password-but-you-can't-verify-who-I-am-specifically-ness. For more info, see Authentication in Condor.-r <MATLAB_Command>
, “<MATLAB_Command>
” must be surrounded by single quotes (they don't need to be escaped). So, to run the HellowOrld()
function and exit, the Arguments variable in the submit file must be as follows: Arguments = "-nodisplay -r 'HellowOrld(); exit;'"
/usr/local/bin/matlab
). MATLAB doesn't like not running from deep within the bowels of the /usr/local/
directory, so to tell Condor to transfer input and output files normally but not transfer the executable (therefore assuming that it's present on the execute machine), the following variable must be set in the submit file: transfer_executable = false
condor.cs.wlu.edu/index.html
redirect to the Condor Wiki homepage while still allowing access to CondorView.COUNT_HYPERTHREAD_CPUS = False
in the global configuration to fix this problem. Now, we have better response for the interactive user while also allowing a job running on a slot to run faster since it is less likely to be competing with another Condor job on a complementary hyperthread on the same CPU. — Garrett Koller 2011/08/17 11:33Job.status()
, Job.poll()
, and Job.wait()
in the condor
Python module to let the user know if and when their jobs are done. — Garrett Koller 2011/08/18 19:10NonCondorLoadAvg
in global config file to include Cpus
(LoadAvg
- CondorLoadAvg
/ Cpus
= NonCondorLoadAvg
), assuming that everything that uses NonCondorLoadAvg
has access to the Cpus
Startd variable. This variable is used in the START
expression so that a computer with a lot of cores that are each a tiny bit busy will still be able to accept jobs. — Garrett Koller 2011/08/27 22:48ENABLE_BACKFILL = TRUE
to local (host-specific) configuration file.yum install boinc-client
(newer version, x86_64 architecture) to the install script.setRAM()
, setCPUNum()
, and setDisk()
to condor.py
allow jobs to request for its resources. If the user doesn't specify, it is assumed that each job needs 1 CPU, 1 Gig of RAM, and 32 MB of disk space. — Garrett Koller 2011/09/13 17:45help()
function, such as by running help(Condor.Job)
. In Python, first line of the docstrings serves as the text in the pop-up tooltip that appears in Idle when a user starts typing in the arguments of the function. I made that text be a quick snippet of each function's syntax. — Garrett Koller 2011/09/15 17:36condor-py-scripting
Google Code project. — Garrett Koller 2011/09/15 17:36condor-7.6.4-x86_64_rhap_6.1-updated-stripped
source and installed it into a new release directory. — Garrett Koller 2011/12/01 12:09