The same task being picked up on multiple machines.

Hello,

We have some very quick Shotgun jobs (similar to your add Shotgun version jobs but our own flavour). They run as part of an event script on jobs that are tagged to be added to Shotgun. We only have two machines processing these jobs (because they are so quick to go through) so there are two slaves ready to accept them.

We’re increasingly seeing a problem where the same job will be picked up by both slaves on different machines and processed. The end result being two new versions on Shotgun instead of one. I think this is probably down to network latency ie. pulse is not keeping up with what’s been picked up. Is there anyway we can get around it other than solving our latency issues?

Thanks,

Dan

Right, having done some further investigation I’ve found the slaves are not the culprits. At least I don’t think so.

It seems that the event plugin is the problem. It is running multiple times at the end of a job and thus creating these multiple Shotgun jobs. So, when a job finishes, the post event script runs more that once. I’m pretty sure that is not correct behaviour for an event script? I thought it ran on completion of a job - ie. once.

If I look at a job’s Log report I get a report for each task and then 2 (although the number varies) event plugin reports for the same event plugin.

Is this down to latency again and is there a way around it?

Hi Dan,

It could very well be a latency issue. The slaves themselves mark a job as complete (Pulse isn’t involved), and it’s likely that latency is causing more than one slave to think it rendered the last task for a job. Honestly, I’m not sure how to work around the latency issue for now…

We are dealing with latency issues like this in Deadline 6. The database backend we’re using prevents these kind of race conditions from happening, so network latency shouldn’t cause these types of issues in the future.

Cheers,

  • Ryan

Ok. It does seem like that is what is happening. We have rather ramped the amount that the farm handles day-to-day so the increased network traffic could well be causing these latency issues.

I’ll add a check to the Shotgun script which will check to see if the version already exists which should solve things for the time being.

Thanks, Dan