Hello Guyz !
Just wanted to share a recent experience where we loose more than 1 To of auxiliary files (~2000 jobs), if this can avoid this for anyone of you
We recently begin a new production where our 3dsmax files (our primary activity) increased in size significantly ( 200-500 Mo each).
So we had to move all of submitted jobs from our standard repository place (1To SSD) to a stronger location ( VNX storage).
Thanks Deadline, this case has been planned.
So on a Testing repository i’ve tried to submit some jobs after activated Auxiliary alternate path.
Every think works fine so i decided to activate it on our main repo.
First mistake, i manually made the job copy from repo\path to alternate\path while it looks like it is deadline which is managing it. So has soon as i’ve activated it, the folder was clean… first stress !
I didn’t find any information in Doc about how it is managed :
Which machine is doing the copy ? when ? is there’s a priority ( creation time or priority or next to be dequeue ?..)
By the way, i wanted to avoid a network rush be copying jobs before enabling it, this was a fail.
I also wanted to have an activity recovery fast, and has my jobs where delete, i had some problems of jobs didn’t find because not yet in the alternate path.
#1 - would be nice to have more deep informations about how those copy are managed.
By the way, few hours after, the activity was stabilized and working fine.
No more space problems, except that now all slave must have access to the new path, but without mounting a network path (for obvious security reasons)
2 - could you add an authentification logging informations or use mapped paths to force all slave to access the auxiliary path ?
And last important mistake i’ve forgot, in my secondary testing repo, to delete the path to auxiliary files (which was the same path)
So, 2 weeks after the transition, we create activity on this 2nd repo (for testing another thing) and this had for consequences to immediate clean the auxiliary folder (has my 2nd repo only got 3 or 4 jobs).
We loose near to 1To of max files in seconds, corresponding to about 2k jobs.
Luckily, we were in a slack period, so we didn’t loose too many worktime / rendertime. We just re-submit the jobs that weren’t finished.
But how the hell a 3 jobs repo get overrides over my main repo when they were both pointing to the same path ?
#3 - how can we had security about massive file dropping after a desynchro between DB & repository ?
Now we have entered in a full production submitions and i’d like to be SURE that i won’t lost all job done by graphistes.
How can I, as overall administrator, receive notification (like repository space control ) or massive job deletion ?
Thanks !
-p-