Hi,
Since I installed Deadline4.1 assigning jobs to slaves sometimes takes up to 10mins. Job is submitted from Maya and idle machines don´t grab it.
This happens randomly and is not machine specific.
Any ideas how to fix this?
Thanks in advance!
Hi,
Since I installed Deadline4.1 assigning jobs to slaves sometimes takes up to 10mins. Job is submitted from Maya and idle machines don´t grab it.
This happens randomly and is not machine specific.
Any ideas how to fix this?
Thanks in advance!
Hi there,
I just need to gather a bit more information:
Thanks,
Hi Ryan,
greetings
andy
Thanks! Next, can you send us a screen shot of your Slave Settings in the Repository options? We can take a look at it to see if any of the Polling intervals are larger than they should be.
Cheers,
Hi again,
sure I can.
Screenshot attached.
Didn´t touch any of these settings.
greetings
andy
Everything looks good there. This is quite strange…
I guess the next thing to check is what the slave is actually doing when it’s taking forever to pick up a job. First, we should enable Slave Verbose Logging (if it’s not enabled already). This can be done in the Logging section of the Repository Options. After making the change, restart all the slave applications so that they recognize the change immediately, and start fresh logs.
Now submit a new job, and wait a minute or two. Then go to a machine that hasn’t picked up the job yet and take a screen shot of the Slave application. This should help us see what it’s doing. Also, grab the most recent Slave log for that same slave. This can be found by selecting Help -> Explore Log Folder in the slave. Hopefully there is enough info here to help us figure out what is going on.
Thanks!
this is the log of one slave:
starting between task wait - seconds: 2
Scheduler Thread - slave initialization complete.
Scheduler Thread - performing house cleaning…
Scheduler - Pulse has not been configured. This can be done from the Repository Options in the Monitor.
Scheduler - Ignoring job with ID “999_010_000_72755ab5”. reason:
Scheduler - Ignoring job with ID “999_093_999_4cc6b6ce”. reason:
Scheduler - Ignoring job with ID “999_050_000_3a0789b1”. reason:
Scheduler - Ignoring job with ID “999_093_999_75d32e35”. reason:
Scheduler - Ignoring job with ID “999_093_999_6e37136a”. reason:
Scheduler - Ignoring job with ID “999_010_000_2ce1bde7”. reason:
Scheduler - Ignoring job with ID “999_093_999_693cea19”. reason:
Scheduler - Ignoring job with ID “999_050_000_5e731e6d”. reason:
Scheduler - Ignoring job with ID “999_094_999_77957abf”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_050_000_5e731e6d\999_050_000_5e731e6d.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_4cc6b6ce\999_093_999_4cc6b6ce.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_050_000_3a0789b1\999_050_000_3a0789b1.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_010_000_2ce1bde7\999_010_000_2ce1bde7.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_693cea19\999_093_999_693cea19.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_010_000_72755ab5\999_010_000_72755ab5.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_6e37136a\999_093_999_6e37136a.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_094_999_77957abf\999_094_999_77957abf.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_75d32e35\999_093_999_75d32e35.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Job chooser found no jobs.
starting between task wait - seconds: 20
Scheduler Thread - performing house cleaning…
Scheduler - Pulse has not been configured. This can be done from the Repository Options in the Monitor.
Scheduler - Ignoring job with ID “999_010_000_2ce1bde7”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_010_000_2ce1bde7\999_010_000_2ce1bde7.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_093_999_693cea19”. reason:
Scheduler - Ignoring job with ID “999_093_999_4cc6b6ce”. reason:
Scheduler - Ignoring job with ID “999_093_999_75d32e35”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_693cea19\999_093_999_693cea19.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_75d32e35\999_093_999_75d32e35.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_4cc6b6ce\999_093_999_4cc6b6ce.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_010_000_72755ab5”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_010_000_72755ab5\999_010_000_72755ab5.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_050_000_5e731e6d”. reason:
Scheduler - Ignoring job with ID “999_094_999_77957abf”. reason:
Scheduler - Ignoring job with ID “999_050_000_3a0789b1”. reason:
Scheduler - Ignoring job with ID “999_093_999_6e37136a”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_050_000_5e731e6d\999_050_000_5e731e6d.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_094_999_77957abf\999_094_999_77957abf.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_6e37136a\999_093_999_6e37136a.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_050_000_3a0789b1\999_050_000_3a0789b1.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Job chooser found no jobs.
starting between task wait - seconds: 20
Scheduler Thread - performing house cleaning…
Scheduler - Pulse has not been configured. This can be done from the Repository Options in the Monitor.
Scheduler - Ignoring job with ID “999_010_000_2ce1bde7”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_010_000_2ce1bde7\999_010_000_2ce1bde7.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_093_999_75d32e35”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_75d32e35\999_093_999_75d32e35.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_093_999_693cea19”. reason:
Scheduler - Ignoring job with ID “999_010_000_72755ab5”. reason:
Scheduler - Ignoring job with ID “999_050_000_5e731e6d”. reason:
Scheduler - Ignoring job with ID “999_050_000_3a0789b1”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_693cea19\999_093_999_693cea19.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_093_999_6e37136a”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_050_000_5e731e6d\999_050_000_5e731e6d.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Ignoring job with ID “999_094_999_77957abf”. reason:
Scheduler - Ignoring job with ID “999_093_999_4cc6b6ce”. reason:
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_010_000_72755ab5\999_010_000_72755ab5.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_050_000_3a0789b1\999_050_000_3a0789b1.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_6e37136a\999_093_999_6e37136a.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_093_999_4cc6b6ce\999_093_999_4cc6b6ce.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - minor exception: Fehler im XML-Dokument (6,4). (Die Zeichenfolge wurde nicht als gültiges DateTime erkannt.) (Job has been corrupted, it is recommended that this job be removed from the repository: //Archiv/DeadlineRepository\jobs\999_094_999_77957abf\999_094_999_77957abf.job) (Deadline.Jobs.JobCorruptedException)
Scheduler - Job chooser found no jobs.
…
it goes on and on until he finally grabs one
Thanks! There definitely seems to be a lot of problematic jobs in the queue. This leads me to a couple more questions:
Cheers,
Hey,
There are some queued jobs which threw some errors (all of them eyeon fusion jobs where missing plugins on slave machines or stuff like that was the problem)
Thanks for that additional info. I’m not seeing anything wrong with your setup that would explain this problem. The next thing we can try is to create a “fresh” job folder with nothing in it, and submit new jobs to see if this problem persists. We’ll back up your other jobs first though, so nothing will be lost.
First, stop any running slave applications on your farm. Then go to your repository root and rename the “jobs” folder to something like “jobs_backup”. Finally, create a new “jobs” folder. Now if you refresh your monitor, it shouldn’t contain any jobs.
Now submit a new job and see if the slaves still have problems picking it up.
Cheers,
Hi Ryan,
I created a new job directory, but the problem is still the same.
Would have been acceptable if it was just for the first task of the job.
But having machines looking for 5mins after each task is really not that time efficient…
Maybe I have to reinstall the repository completly…
Didn´t encounter such a problem with 4.0
There definitely seems to be some sort of conflict between the machines that are submitting the jobs and the ones that are trying to render them, because they thing the job is corrupted for some reason. Are the machine’s that you’re submitting from updated to 4.1 as well?
Maybe it might be best to reinstall the 4.1 version repository and clients (both slaves and workstations), just to make sure everything is up to date.
Cheers,
I´m always submitting from the same machine definatly 4.1 (although updated via the autoupdate function from the repository).
Gonna try a fresh install tomorrow.
Hopefully problems will be solved then.
greetings
andy
I wonder if the auto-update failed to update properly in this case. Hopefully the reinstall will resolve these issues.
Cheers,
Hello Ryan,
it´s been a while since I started this thread, but the problem isn´t solved so far…
Had busy weeks over here so no time to “tweak” the renderpipeline…
The mentioned problem still persists.
I reinstalled the repository as well as the client on every machine (no autoupdate) everything freshly installed.
If I hit F5 10 times in the slave window it´ll grab the job instantly, otherwise it takes up to 10minutes.
Are you still getting these warnings in the slave logs?
If so, can you send us one of the *.job files mentioned in the message (the full path to the file is included in the message)? We can take a look at it to see why the slaves think it’s corrupted.
It’s strange that pressing F5 multiple times on the slave eventually gets things going. It’s almost like the slave sees corrupted jobs most of the time, but then all of a sudden everything is good and it can dequeue a job. That could explain the long delay when you aren’t mashing F5, because your current settings have it so that the slave checks for jobs every 20 seconds. If it only “gets through” once in a while, that time can add up.
You wouldn’t happen to have another machine you could install a repository on, just to see if the problem is related to your current server or not? Another Windows server box or a *nix (linux, freebsd, etc) box would work.
Finally, I know you’re running Windows 2003 server for your repository machine, but is it possible there is some limitation on the number of connections that can access a shared folder? I wonder if perhaps the new multithreaded data loading is resulting in a connection limitation being maxed out. That could explain why you didn’t see this problem in 4.0.
Cheers,
Hi again,
I don´t get the slave logs any longer.
They went away after cleaning the job directory.
The repository currently is on a linux server.
You issued a trial license for a windows machine, but due to multiple FlexLicense Manager installs and problems with this machine you switched our final license to a linux machine (ubuntu)
We have another windows7 machine (Core i7 930) 12GB Ram which is one of our renderslaves.
We could try to install the repository on this one.
(would it be possible to use it as the repository and still let it partcipate in rendering?)
So to confirm, you’ve now experienced this problem when the repository has been hosted on both the Windows 2003 server machine and the Linux machine?
I would be hesitant installing on a Windows 7 machine, since I would expect it to have a 10 connection limitation because it’s not a server OS.
Hi,
no on the windows server 2003 machine there was a problem with the license manager.
The clients didn´t obtain the license correctly (or the FlexManager didn´t work properly due to an old flex installation we use for our eyeon fusion packages)
So couldn´t try rendering with the 2003 machine as repository
The repository and the license server don’t have to be on the same machine. So you can keep your license server on the linux one and try installing a new 4.1 repository on your Windows server to see if that makes a difference.