Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Next revisionBoth sides next revision
condor:installation:network [2011/07/15 13:58] – added "set machine variables" and "add machines to condor pool" (TODO) garrettheath4condor:installation:network [2011/07/18 21:48] – [Set Machine Variables] garrettheath4
Line 4: Line 4:
  
 Our pool uses a dedicated NAS server with a 2.7TB RAID 5+1 drive configuration to provide faster than normal access times to the data than normal hard drive I/O, especially data reads. Our pool uses a dedicated NAS server with a 2.7TB RAID 5+1 drive configuration to provide faster than normal access times to the data than normal hard drive I/O, especially data reads.
 +
 +===== Add Local condor User =====
 +
 +In order for daemons to run correctly and for permissions to be properly set, a local ''condor'' user must be present on all members of the Condor pool.  The following must be set for the ''condor'' users:\\
 +  condor UID = 1344
 +  condor GID = 1610
 +
 +First, check to see if the ''condor'' user exists on the machine.  Do this by running:
 +<code bash>cat /etc/passwd | grep ^condor:</code>
 +**If you get a match**, first reset its settings in case the user wasn't created correctly.
 +<code bash>sudo groupmod -g 1610 condor
 +sudo usermod -c "Owner of Condor Daemons" -d "/var/lib/condor" -m -u 1344 -g condor -s "/sbin/nologin" -L condor</code>
 +:!: If you get a message that says that the directory ''/var/lib/condor'' already exists, run this command next:
 +<code bash>sudo chown -R condor:condor /var/lib/condor</code>
 +
 +**If you do not get a match**, you need to manually add the user.  To do this, run:
 +<code bash>sudo groupadd -g 1610 condor
 +sudo useradd -c "Owner of Condor Daemons" -d "/var/lib/condor" -m -u 1344 -g condor -s "/sbin/nologin" condor
 +sudo usermod -L condor</code>
 +
 +Just to be sure, do <code bash>ls -al /var/lib/condor</code> and verify that the entry "''.''" is owned by ''condor'' and is a part of the ''condor'' group.  If not, you probably have a conflicting UID or GID and will have to set it manually.  Set it to one that is not being used by the local user system or by the network and then set the ''CONDOR_IDS'' variable in that individual host's Condor local configuration file((A host's individual configuration file is located at ''/mnt/config/hosts/<HOSTNAME>/config/condor_config.local'' .))
  
 ===== Install Binaries ===== ===== Install Binaries =====
Line 9: Line 30:
 In order to install the binaries onto the ''tesla.cs.wlu.edu'' NAS, run this command in the terminal: In order to install the binaries onto the ''tesla.cs.wlu.edu'' NAS, run this command in the terminal:
 <code bash>cd /mnt/config/src/fedora64 <code bash>cd /mnt/config/src/fedora64
-sudo ./condor_configure --type=manager,submit,execute --central-manager=john.cs.wlu.edu --local-dir=/mnt/config/hosts/_default --install-dir=/mnt/config/release/x86_64_rhap_5 --install --verbose</code> +sudo ./condor_configure --type=manager,submit,execute --central-manager=john.cs.wlu.edu --local-dir=/mnt/config/hosts/_default --install-dir=/mnt/config/release/x86_64_rhap_5 --owner=condor --install --verbose</code>
- +
-===== Add Machines to Condor Pool ===== +
- +
-FIXME+
  
 ===== Set Machine Variables ===== ===== Set Machine Variables =====
 +
 +Whenever the ''condor_master'' program opens, the first thing it does is look for the global configuration file.  FIXME
  
 The problem with putting as much of Condor on the NAS is that this introduces a lot of NFS traffic onto the network, especially when Condor jobs are running.  Having the user executables stored centrally on the NAS will cause all of the computers to be almost constantly reading from the NAS when the executables are opened and run. The problem with putting as much of Condor on the NAS is that this introduces a lot of NFS traffic onto the network, especially when Condor jobs are running.  Having the user executables stored centrally on the NAS will cause all of the computers to be almost constantly reading from the NAS when the executables are opened and run.
Line 21: Line 40:
 The W&L Computer Science Department Systems Administrator, [[http://goryl.org/htm/steve.htm|Steve Goryl]], had a similar problem when all of the Linux applications on the lab computers were actually centrally located and run from the central CS department server.  This proved to produce higher-than-expected traffic on the network and the programs became laggy.  Installing the applications locally on the hard drives of the lab computers proved to be more of a pain administratively but provided much better overall performance. The W&L Computer Science Department Systems Administrator, [[http://goryl.org/htm/steve.htm|Steve Goryl]], had a similar problem when all of the Linux applications on the lab computers were actually centrally located and run from the central CS department server.  This proved to produce higher-than-expected traffic on the network and the programs became laggy.  Installing the applications locally on the hard drives of the lab computers proved to be more of a pain administratively but provided much better overall performance.
  
-We can still have good performance while having Condor centrally located by having Condor's binaries and all of the configuration files located on the NAS while storing currently-running Condor job user executables locally on the executing machine's hard drive.  In order to do this, we need to create certain directories on every machine owned by the (local) ''condor'' user.+We can still have good performance((hopefully...)) while having Condor centrally located by having Condor's binaries and all of the configuration files located on the NAS while storing currently-running Condor job user executables locally on the executing machine's hard drive.  Condor's binaries will stay on the NAS for the sake of easy upgrades and job binaries will be stored on and run from the execute machines' local hard drives. 
 + 
 +In order to do this, we need to create certain directories on every machine that are owned by the (local) ''condor'' user.  These directories will serve as the playground for condor jobs when they are executing on a machine.  To do this, we need to create such directories and then tell Condor where they are and what to do with them. 
 + 
 +<code bash>sudo mkdir /var/lib/condor/execute 
 +sudo chown -R condor:condor /var/lib/condor/execute 
 +sudo chmod -R 755 /var/lib/condor/execute</code>
condor/installation/network.txt · Last modified: 2012/08/09 19:18 by garrettheath4
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0