Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
condor:submit:troubleshoot [2011/08/02 15:23] – created garrettheath4condor:submit:troubleshoot [2011/08/02 15:49] (current) garrettheath4
Line 5: Line 5:
  
 You can find out what is going on by checking a chain of logs to figure out at what step your job failed.  First, check the Negotiator log to see if your job was ever successfully matched.  Run ''condor_q'' and note your job's ID.  Then, run You can find out what is going on by checking a chain of logs to figure out at what step your job failed.  First, check the Negotiator log to see if your job was ever successfully matched.  Run ''condor_q'' and note your job's ID.  Then, run
-<code bash>condor_fetchlog `hostname` MASTER | fgrep JOB_ID</code>+<code bash>condor_fetchlog `hostname` NEGOTIATOR | fgrep JOB_ID</code>
 where ''JOB_ID'' is the ID of your job as listed in the Condor queue, such as "''26.0''" If the Negotiator found a suitable computer to run your job on, you should see one or more lines indicating the match, such as: where ''JOB_ID'' is the ID of your job as listed in the Condor queue, such as "''26.0''" If the Negotiator found a suitable computer to run your job on, you should see one or more lines indicating the match, such as:
 <code text>08/02/11 09:56:49     Request 00025.00000: <code text>08/02/11 09:56:49     Request 00025.00000:
Line 17: Line 17:
 where RUN_MACHINE is the machine that your job was matched with and slotX is the slot that it was potentially run on on that machine, such as where RUN_MACHINE is the machine that your job was matched with and slotX is the slot that it was potentially run on on that machine, such as
 <code bash>condor_fetchlog -master fred.cs.wlu.edu STARTER.slot1</code> <code bash>condor_fetchlog -master fred.cs.wlu.edu STARTER.slot1</code>
-If you tried to run your program recently, it should be listed at the bottom of the log.  To find where the log for a specific run starts, look for "''Submitting machine is ...''" and read the lines just below that to find out if any errors occurred while Condor tried to execute your job on the execute machine.+If you tried to run your program recently, it should be listed at the bottom of the log.  To find where the log for a specific run starts, look for a section surrounded by asterisks and containing the line 
 +<code text>** condor_starter (CONDOR_STARTER) STARTING UP</code> 
 +and read the lines below that section to find out if any errors occurred while Condor tried to execute your job on the execute machine.
condor/submit/troubleshoot.1312298600.txt.gz · Last modified: 2011/08/02 15:23 by garrettheath4
CC Attribution-Noncommercial-Share Alike 4.0 International
Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0