AWS Thinkbox Discussion Forums

Deadline Monitor update issues

Hi guys,

every once in a while the Deadline Monitor tells me in the bottom status bar of the window that there was an issue updating the information from the database (don’t have the exact words at hand right now, sorry). Waiting for the next update does not fix this. The “Last Update” counter in the lower right corner permanently counts the seconds since the last update and resets the counter but it still doesn’t update. Closing and restarting the Monitor does help which is a bit annoying, though. As i couldn’t find a button or menu entry that forces a correct update i found that a workaround is also to pretend to change the repository by selecting “Change Repository…” from the File menu and just hitting OK without actually changing the repo location.

cheers,
Holger

The Monitor log will contain the error message. You can also view it in the Console panel (View -> New Panel -> Console). If you an send us the Monitor log the next time this happens, we’ll take a look.

Thanks!
Ryan

I attached the log file from yesterday. You can see that at 21:38:33 i did that workaround of changing but actually not changing the repository. So if there’s any info about the update problem it definitely must be in there before that. There is also no other log file from that day so i hope in there you’ll find some information regarding the issue.

cheers,
Holger
deadlinemonitor-cell-ws-03-2014-10-08-0000.log (54.5 KB)

Thanks! I think this error has something to do with it:

Error occurred while updating task cache: Object reference not set to an instance of an object. (System.NullReferenceException)

In beta 5, we’ll add stack trace information to these error messages so that it should be easier to track down what the problem is.

When you see this error, what no longer updates properly? Is it just the task list? If you click on to another job, do the tasks for that job show up?

Thanks!
Ryan

One thing that didn’t update properly was the status of (at least) one of the jobs. I didn’t check what else may not have updated properly but will do so next time and let you know.

cheers,
Holger

right now i’m having the situation that the status of the Slave from this bug report

http://forums.thinkboxsoftware.com/viewtopic.php?f=205&t=12524

is still being shown as idle even though i exited it manually quite a few minutes (around 10) ago. And as i’m writing this the status has just changed to ‘Stalled’. The time when i exited the Slave was around 23:55 and the time i noticed the ‘Stalled’ state was around 00:07. As this was around midnight i attached the log files of 13th and 14th of October.

cheers,
Holger
logs.zip (933 KB)

This might be related to your Idle Shutdown problem where the slave appears stalled. Again, the log indicates that the slave shutdown properly, so there must be a bug.

How did you exit the slave manually? Did you just click the [X] button, select File -> Exit, etc?

Thanks!
Ryan

I did ‘File -> Exit’. I just replied in the other thread that this and the other affected machines were not in the ‘all’ group in the Power Management Options.
I just corrected that. So let’s see if that changes things.

cheers,
Holger

A few minutes ago i noticed on the Deadline Monitor of the machine called cell-ws-16 that apparently the job list was not correct. There were a few jobs that simply didn’t exist anymore but were still in the list. I then did the workaround of changing the repository location to the same path and then it updated. I attached a screenshot from right before (Clipboard01.jpg) doing that and right after (Clipboard02.jpg) so you can see the difference in the job list. I also attached the Monitor log of that machine and the Pulse log. I think it affects mostly jobs between around 19:30 Oct, 28 and 0:44, Oct, 29.

Cheers,
Holger
deadline_logs_2014-10-29.zip (1.28 MB)


The problem with deleted jobs still showing in the Monitor should be fixed in beta 6, which was just released yesterday:
viewtopic.php?f=204&t=12593

There were other issues with updating that were fixed in this build as well, so please upgrade and let us know if you continue to see this problem.

Thanks!
Ryan

downloading it now :wink:
i also see in the release notes that the “Slave Stalled” issue should be gone.
looking forward to it!

cheers,
Holger

Unfortunately, i had this issue again right now.
Happened around 2014-11-02 13:09. One of our artist submitted a job a second time with the same name while it was actually still rendering, after the 2nd submit he deleted the first one. This was to make sure that the machines rendering his job would jump onto the 2nd submit right away and not start to render someone else’s job. The result in the Deadline Monitor on my machine was that it displayed both jobs still until i did the “change repository…” workaround.
Before/after screenshots of Monitor and Pulse and Monitor logs attached.
Related question: is there an option in Deadline to append sth. like “_2” or similar to a job that’s being submitted with a name that already exists so it’s easier to distinguish the two submissions in the Monitor? I guess there are no real issues submitting jobs with the same name, or else there’d be some built-in check in Deadline to avoid this, right?

Cheers,
Holger



Deadline_logs_2014-11-02.zip (1.06 MB)

Any chance this monitor was still from a previous beta version (the one that showed both jobs after one was deleted)?

Also, which monitor is this log from? The one that the job was deleted from or the one that showed both jobs after one was deleted?

Finally, there is nothing preventing jobs from having the same name. Each job has an ID that is used to uniquely identify the job.

Cheers,
Ryan

The Monitor was defintely also beta6 already as all the machines were updated at the same time.
The wrong display was on the Monitor of my machine but not the one that the job was deleted from.
I guessed that internally you’re using IDs just wanted to make sure. But how about actually appending the job name by an incremental numbering in the end in case a job with the same name already exists? Not that htis is despereately needed but will help to distinguish those jobs when quickly looking ofer the job list.

Cheers,
Holger

Hmm, I’m not having any luck reproducing this with our internal version. I was able to reproduce it pretty easily before beta 6 though. We’ll keep an eye out for it!

There are currently no plans to introduce an incremental numbering system for jobs with the same name. Our biggest concern here is performance, since we’d have to check the name of every job in the queue to see if there is another job with the same name.

However, you could implement it yourself in an event plugin (similar to the one I sent you before that checks for specific limits). The event plugin would also handle the OnJobSubmitted event, and would then look at the names of all the other jobs in the farm to see if there is one with the same name, and if there is, it could append the next number.

Cheers,
Ryan

I understand the performance concerns. I need to have a deeper look at the costumisation and scripting possibilites of Deadline anyway and will think about whether this is something we’re going to implement here.
More interested in getting that update bug fixed, so i’ll keep my fingers crossed you guys will find it. If it happens again i’ll send new logs.

Cheers,
Holger

Just happened again. The job that’s highlighted in the first screenshot must have been deleted at some point between 18:30 and 19:16 when it was re-submitted. It was deleted on the artist’s machine and the monitor on my workstation was affected by it. Unfortunately, i didn’t have the chance to check any other Monitors.
Hopefully the logs are more helpful this time.

Cheers,
Holger
deadline_logs_2014-11-13.zip (911 KB)

Thanks! We’ll keep trying to reproduce this one so that we can get it fixed!

Cheers,
Ryan

And another update bug. This time it didn’t show the status of some Slaves correctly - to be precise it didn’t show the time that has passed in this status correctly.
It was showing them as being offline only for just a few seconds although they had been shut down for almost an hour. Actually, that counter was even stuck and didn’t change. After the usual “Change repository…” workaround it properly displayed them as being offline since 54 minutes ago. This was happeing around 22:25, logs and screenshots attached.

Cheers,
Holger
Deadline_logs_2014-11-17_02.zip (1.08 MB)

That one is actually a known issue that we just discovered, and it happens because offline slaves don’t change their state, so the Monitor doesn’t update their corresponding rows. This was something new we added in 7.0, and I think we’ll have to remove it for now (otherwise we have to implement a way to update rows where the actual data hasn’t changed, and it’s too late in the game to make a significant change like that).

Cheers,
Ryan

Privacy | Site terms | Cookie preferences