====== Network Installation of Condor ====== This page describes the installation procedure for installing Condor on a network-shared location. Installing Condor on the network allows for centralized configuration and convenient upgrades. This network-shared location should be on a high-availability file server, preferably a high-bandwidth [[wp>RAID]]-enabled [[wp>Network-attached_storage|NAS]], that can be accessed from all members of the Condor pool. Our pool uses a dedicated NAS server with a 2.7TB RAID 5+1 drive configuration to provide faster than normal access times to the data than normal hard drive I/O, especially data reads. ===== Add Local condor User ===== In order for daemons to run correctly and for permissions to be properly set, a local ''condor'' user must be present on all members of the Condor pool. The following must be set for the ''condor'' users:\\ condor UID = 64 condor GID = 64 First, check to see if the ''condor'' user exists on the machine. Do this by running: cat /etc/passwd | grep ^condor: **If you get a match**, first reset its settings in case the user wasn't created correctly. sudo groupmod -g 64 condor sudo usermod -c "Owner of Condor Daemons" -d "/var/lib/condor" -m -u 64 -g condor -s "/sbin/nologin" -L condor :!: If you get a message that says that the directory ''/var/lib/condor'' already exists, run this command next: sudo chown -R condor:condor /var/lib/condor **If you do not get a match**, you need to manually add the user. To do this, run: sudo groupadd -g 64 condor sudo useradd -c "Owner of Condor Daemons" -d "/var/lib/condor" -m -u 64 -g condor -s "/sbin/nologin" condor sudo usermod -L condor Just to be sure, do ls -al /var/lib/condor and verify that the entry "''.''" is owned by ''condor'' and is a part of the ''condor'' group. If not, you probably have a conflicting UID or GID and will have to set it manually. Set it to one that is not being used by the local user system or by the network and then set the ''CONDOR_IDS'' variable in that individual host's Condor local configuration file((A host's individual configuration file is located at ''/mnt/config/hosts//config/condor_config.local'' .)) ===== Install Binaries ===== The NAS is used to store all of the program files that Condor needs to run. **Installing these onto the NAS only needs to happen once**, but in order to recompile the binaries onto the ''tesla.cs.wlu.edu'' NAS anyway, run this command in the terminal: cd /mnt/config/src/fedora64 sudo ./condor_configure --type=manager,submit,execute --central-manager=john.cs.wlu.edu --local-dir=/mnt/config/hosts/_default --install-dir=/mnt/config/release/x86_64_rhap_5 --owner=condor --install --verbose ===== Set Machine Variables ===== Whenever the ''condor_master'' program opens, the first thing it does is look for the global configuration file. FIXME The problem with putting as much of Condor on the NAS is that this introduces a lot of NFS traffic onto the network, especially when Condor jobs are running. Having the user executables stored centrally on the NAS will cause all of the computers to be almost constantly reading from the NAS when the executables are opened and run. The W&L Computer Science Department Systems Administrator, [[http://goryl.org/htm/steve.htm|Steve Goryl]], had a similar problem when all of the Linux applications on the lab computers were actually centrally located and run from the central CS department server. This proved to produce higher-than-expected traffic on the network and the programs became laggy. Installing the applications locally on the hard drives of the lab computers proved to be more of a pain administratively but provided much better overall performance. We can still have good performance((hopefully...)) while having Condor centrally located by having Condor's binaries and all of the configuration files located on the NAS while storing currently-running Condor job user executables locally on the executing machine's hard drive. Condor's binaries will stay on the NAS for the sake of easy upgrades and job binaries will be stored on and run from the execute machines' local hard drives. In order to do this, we need to create certain directories on every machine that are owned by the (local) ''condor'' user. These directories will serve as the playground for condor jobs when they are executing on a machine. To do this, we need to create such directories and then tell Condor where they are and what to do with them. sudo mkdir /var/lib/condor/execute sudo chown -R condor:condor /var/lib/condor/execute sudo chmod -R 755 /var/lib/condor/execute =====Configure Authentication===== As specified in our Condor system's global configuration file, access to Condor is restricted to certain machines and usernames. Whenever Condor receives a request, it first checks to see if the requester is allowed to make such a request. Unfortunately, the requesting machine can lie about who it is and therefore "spoof" Condor into thinking the request is coming from a valid source. In order to help prevent this from happening, Condor uses basic authentication to protect it from computers disguised as valid members of its pool. This authentication takes the form of an encrypted password. When Condor starts, it will read the configuration files to figure out where the password is stored. As listed in the global configuration file as the ''SEC_PASSWORD_FILE'' configuration variable, the password is stored as ''/var/lib/condor/pool_password'' with root-only access. In order for machines to be added to the Condor pool, this file __must be manually copied__ from an existing member of the pool to the new member. Once copied, this file must be owned by ''root'' and have read and write access to the owner but all other permissions disabled (mode ''0600''). =====Configure Firewall===== Condor primarily uses **port 9618** for communication between the ''condor_master'' daemons on each of the members of the Condor pool. Because of this, the firewall of each of the members needs to have port 9618 open to accept incoming communication. Condor uses a lot of other dynamically-chosen ports for direct communication between other daemons that want to bypass the ''condor_master'' daemon (in order to not bog down the busy ''condor_master'' daemon, of course). If the daemons are configured to publish their port number publicly (in the filesystem), the daemons should be allowed to directly communicate with each other. In order to do this, a large range of ports needs to be opened so that all of the Condor daemons can freely communicate with each other while still having dynamically-allocated ports. Specifically, the configuration variables ''HIGHPORT'' and ''LOWPORT'' in the global configuration file defines what range of ports Condor is allowed to use. By default and/or by convention, this range is 9600-9700. To open this port range, run ''system-config-firewall'' as ''root'' and add the **9600-9700** user-defined tcp and udp port ranges to the "Other Ports" section. Click "Apply" to finish the deed. Now, Condor daemons can freely communicate while not being impeded by the firewall.