This is an old revision of the document!


Network Installation of Condor

This page describes the installation procedure for installing Condor on a network-shared location. Installing Condor on the network allows for centralized configuration and convenient upgrades. This network-shared location should be on a high-availability file server, preferably a high-bandwidth RAID-enabled NAS, that can be accessed from all members of the Condor pool.

Our pool uses a dedicated NAS server with a 2.7TB RAID 5+1 drive configuration to provide faster than normal access times to the data than normal hard drive I/O, especially data reads.

Install Binaries

In order to install the binaries onto the tesla.cs.wlu.edu NAS, run this command in the terminal:

cd /mnt/config/src/fedora64
sudo ./condor_configure --type=manager,submit,execute --central-manager=john.cs.wlu.edu --local-dir=/mnt/config/hosts/_default --install-dir=/mnt/config/release/x86_64_rhap_5 --install --verbose

Add Machines to Condor Pool

FIXME

Add Local condor User

In order for daemons to run correctly and for permissions to be properly set, a local condor user must be present on all members of the Condor pool. The following must be set for the condor users:

''condor'' UID = **1344**
''condor'' GID = **1610**

First, check to see if the condor user exists on the machine. Do this by running:

cat /etc/passwd | grep ^condor:

If you get a match, first reset its settings in case the user wasn't created correctly.

sudo groupmod -g 1610 condor
sudo usermod -c "Owner of Condor Daemons" -d "/var/lib/condor" -m -u 1344 -g condor -s "/sbin/nologin" -L condor

:!: If you get a message that says that the directory /var/lib/condor already exists, run this command next:

sudo chown -R condor:condor /var/lib/condor

If you do not get a match, you need to manually add the user. To do this, run:

sudo groupadd -g 1610 condor

sudo useradd -c “Owner of Condor Daemons” -d “/var/lib/condor” -m -u 1344 -g condor -s “/sbin/nologin” condor sudo usermod -L condor</code>

Just to be sure, do

ls -al /var/lib/condor

and verify that the entry . is owned by condor and is a part of the condor group. If not, you probably have a conflicting UID or GID and will have to set it manually. Set it to one that is not being used by the local user system or by the network and then set the CONDOR_IDS variable in that individual host's Condor local configuration file1)

Set Machine Variables

The problem with putting as much of Condor on the NAS is that this introduces a lot of NFS traffic onto the network, especially when Condor jobs are running. Having the user executables stored centrally on the NAS will cause all of the computers to be almost constantly reading from the NAS when the executables are opened and run.

The W&L Computer Science Department Systems Administrator, Steve Goryl, had a similar problem when all of the Linux applications on the lab computers were actually centrally located and run from the central CS department server. This proved to produce higher-than-expected traffic on the network and the programs became laggy. Installing the applications locally on the hard drives of the lab computers proved to be more of a pain administratively but provided much better overall performance.

We can still have good performance2) while having Condor centrally located by having Condor's binaries and all of the configuration files located on the NAS while storing currently-running Condor job user executables locally on the executing machine's hard drive. Condor's binaries will stay on the NAS for the sake of easy upgrades and job binaries will be stored on and run from the execute machines' local hard drives.

In order to do this, we need to create certain directories on every machine that are owned by the (local) condor user. These directories will serve as the playground for condor jobs when they are executing on a machine. To do this, we need to create such directories and then tell Condor where they are and what to do with them.

sudo mkdir /var/lib/condor/execute
sudo chown -R condor:condor /var/lib/condor/execute
sudo chmod -R 755 /var/lib/condor/execute
1)
A host's individual configuration file is located at /mnt/config/hosts/<HOSTNAME>/config/condor_config.local .
2)
hopefully…
condor/installation/network.1310999096.txt.gz · Last modified: 2011/07/18 14:24 by garrettheath4
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0