If I remote into a machine running slave and click on “Open Slave Log”. If I exit/quit the Slave application, the Slave Log isn’t closed. You can also click on “Open Slave Log” multiple times, which means you have multiple commandbg.exe’s running on the machine. Should this not be cleaned up on exit and only allow 1 “open slave log” per slave running on a machine (obviously this shouldn’t affect the ability to remotely connect to a slave log)? - [Same issue with Pulse & Monitor -> multiple log windows not cleaned up on exit]
If a slave picks up a Max job with a single task (frame 0), which is a FumeFX sim job (0-50 frames), if I exit/quit the Slave halfway through the sim progress (about frame 25), the slave STDout log locally never makes it to the Deadline monitor as a log report (error). The only thing is a “re-queue” report is created in the job reports in monitor. Doesn’t sound right?
In beta 7, the log windows will now be closed automatically when the main process (Slave, Monitor, Pulse) is closed. There still isn’t a limit on the number of windows you can pop up, but that’s probably not a big deal. Maybe there is even a situation where you want 2 log windows for the same slave (one might have its log suspended, and one might be live).
Yeah, this has always worked this way. The issue we would have to overcome is that the requeue report is usually created by an application other than the slave (ie: the monitor), which doesn’t have access to the current task log. Perhaps this could be changed so that the requeue report is always generated by the slave when it recognizes that its task has been requeued. Then the log could be stored in the requeue report.
Sounds good. Yeah, agreed. I can see situations where you might want more than 1 slave log window open at the same time. At least forcing close any log windows, will help to keep things tidy.
Yeah, an error on a long running ‘sim’ job will of course generate a report, so it’s just the case of re-queues, of having any easily accessible ‘traceability’? I see your point about the re-queue reports being generated on a different machine and I assume this is important if the previously processing slave machine ‘drops’ off the radar for some reason, in which case, the slave wouldn’t be able to self-generate the re-queue report. Perhaps, the logic is, first the slave tries to generate re-queue report with access to local slave log and if this fails, then revert to another machine generating the re-queue report?
Maybe we could have the application that’s triggering the requeue to compare the task info to the slave’s current state. If they match, then let the slave do it, otherwise have the other application do it. I guess there is the case where the slave has died on that task, but hasn’t been marked as stalled yet. In those cases though, you do eventually get a stalled slave report.
Maybe another way would be to have the application create a requeue report immediately with no log, but then store some additional data in the report that the slave could check against when it discovers its task was requeued. If it finds a matching requeue report with no log, it adds its task log to it. That way you’re always guaranteed to get the report, and you’ll get a log in the report as long as the slave hasn’t died.
This sounds more robust. BUT, what if user deletes/clears the re-queue report before it’s been populated with the slave’s task log? Slave doesn’t try to append task log info if it can’t find corresponding “additional data” in DB?
From a high-level point of view, I’m just seeing lots of potentially good STDout/info coming out of the slave before a re-queue, but it’s currently being thrown away (yeah, it is locally stored, I know) and hence, not visible to a user via the monitor.