minor error while trying to dequeue

I’m having a problem I don’t think has been covered here before, but maybe it has under a different form.

I have the repository installed on a Linux machine, being shared with windows and another Linux machine through Samba. I have made sure to chmod all permissions of the repository up to 777 and changed the user to nobody.nogroup. It shows that way through the Samba share on the other Linux machine, just as expected. But when I submit a job, including a plain command line job, it gives me the error below, and it confuses the monitor into saying that the job is Active. I’ve tried assigning 777 permissions to the /usr/local/Thinkbox folder too, but that doesn’t seem to help.

Any ideas?

Here’s the error:

Since it’s throwing an ‘Invalid Paramter’ exception in System.io, my guess is that the path may be the trouble.

Is this log from your Windows machine by any chance? Have you configured path mapping?
thinkboxsoftware.com/deadlin … Path_Setup

After discussions with Ryan, he has informed me that my guess was way off.

So! Second helpful piece of info might be to send the specific share segment of your smb.conf file. Are you forcing create modes to 777 as well?

Also, if you’re comfortable showing file names for everything in your repository, you can dump your repository permissions to a file for me to poke though by cd’ing into the repository and using

ls -Rl > /tmp/permissions.txt

and attach permissions.txt here.

New idea is that it might be permissions.

ok, here is the share info for smb.conf:

I also have

in the globals section - without it, it always mounted with 644 permissions or similar, it definitely wasn’t the same permissions as the target folders had on the server. Though now I’m thinking it’s not Samba because I tried it mounted as NFS and that gave the same results. But who knows.

I’ve attached the dump of my repository, made right after I got the error mentioned in my first post:

This was a command line job, the cmd_plugin_info.job is just one line:

And the commandsfile.txt is:

On top of all that, I tried running the launcher as root (using sudo ./deadlinelauncher -nogui) and had the same error. I’m still a beginner with Linux, so I’m not 100% confident in setting up various configurations.

Finally, just to be sure and because I’ll need it setup eventually, I created the entries for mapped paths, which I’ve attached a screen shot of also.

Weird, right?


permissions.txt (149 KB)

I completely agree with the weird part :stuck_out_tongue:

Your permissions and samba config are exactly what we expect to see, so the problem isn’t there…

I want to talk with Ryan about this some more, but he’s escaped off into the wilderness and I don’t expect to see him before Tuesday.

I will poke at the code and report back!

Also, it’s always a good idea to set up bidirectional mapping. I noticed you did for
\nexus and
/nexus/

but not for M:\

It’s usually a good idea to map both ways to avoid trouble further down the road.

Well, they caught me, and now I’m back in the office…

Anyways, thanks for providing us the additional info regarding this problem. I believe we’ve tracked down the problem to where we attempt to set the last write time of the task file that Deadline is trying to dequeue. The reason we do this is that we calculate the task’s render time based on when it was last modified.

At this point, we’re not sure what actually causes the error. We think it’s Mono-related, but just to confirm, do you only see this error when the Linux machine tries to render, or do you see it on the PC as well?

The simple workaround would be to ignore errors that occur when setting the last write time. Or, perhaps we could catch that error and then actually write something to the file to allow the OS to handle updating the last write time. Unfortunately, either case requires changes to core code, so you wouldn’t be able to benefit from the changes until the 5.1 beta becomes available. Hopefully we can figure out why setting the write time is failing on your setup, and fix it from the server side.

Cheers,

  • Ryan

It does only happen when using the client on Linux, running it on my windows machine (the same one I’m submitting from in this case) works beautifully.

I’ve been wondering if it’s a problem with Mono, I’ll have a look around this weekend and see if I can discover anything.

Could it be that the file is locked because it’s being accessed from the monitor running on the other windows machine, or a similar OS conflict on the server? I thought I had a pretty vanilla setup, but I guess it’s a little more like French Vanilla (with no offense intended towards French people). Both Linux installations are running the latest Ubuntu 11.04 server release, in case that changes anything.

thanks for the help so far!

Thanks for letting us know the OS version, and confirming that it only happens on Linux. I think that confirms our suspicions about Mono. We will try to get a similar setup (Ubuntu Slave and Repository) up and running next week and see if we can reproduce the problem. If we can, hopefully we can figure out a fix to deal with it now. Worst case, if we can reproduce, at least we can test our proposed solution to ensure the problem will be fixed in 5.1.

Cheers,

  • Ryan

Victory is mine. I realized that I was not trying an obvious avenue - I was always using a login for a user I created for the render nodes, which is not the same as the workstation user which submitted the job (and created the repository, etc.). When I log in as that user, it works perfectly.

I’m still not sure why I must be logged in as that user, both logins belong to the same group and the repository has group permissions on, so it should in theory work. But for now, it’s ok because I can keep using that workstation login. It’s not ideal, but I think it’s a Linux problem that will have to wait to be fixed as I learn more about the OS.

Thanks for all the help here, Ryan and eamsler

Glad to hear you got it working! We’ll still try to set this up here and reproduce. At the very least, we should provide a better error message when this problem occurs, but hopefully we can workaround it when a problem is detected as well.

Cheers,

  • Ryan