All nodes, and the server, are running DL 4.1 and are using Mono 2.6.7 under OS X 10.6.4. All nodes on the server are spitting out FileIO exception errors on the console (although keep running despite those) and the corresponding slave folders in the repository on the server are full of orphaned ‘deleteMe’ Slave info files. The nodes are engaged in lengthy renders - currently the frames are averaging a 24 hour render time on each node (earlier frames rendered much more quickly, blame volumetrics).
I’m not sure what combination of factors is causing the failure to mop up these deleteMe files, but it’s making the console output very messy indeed. Is there anything that can be done to minimise the problem?
Deadline should periodically clean up the deleteMe slave info files. If you’re running Pulse, there is a 1 in 10 chance that these files will be purged during each Repository Cleanup operation, which by default should occur every minute. So essentially, if you’re running Pulse,these files should be purged once every 10 minutes (on average).
If you’re not running Pulse, then this operation needs to be performed by the slaves, which only occurs “randomly” between job tasks. Since the render times are 24 hours per task, that would explain the delay in cleaning them up.
Also, can you post a log that includes some of FileIO exceptions you’re seeing?
Cheers,
If I could find them, I would. ‘grep’ doesn’t seem to be showing anything in the logs. Since the last reboot (due to the various OS X updates), the nodes have not been complaining.