I’ve grown confused about Pulse’s role in the operation. While pulse is running, I thought my slaves should not be communicating with my repository at all – they check in with Pulse every n seconds, it tells them if there’s a job for them to do or not.
But if that’s the case, then Pulse is updating the slaveInfo file in the repository, not the slave itself – is this right? We’re seeing a lot of NFS locking messages on the repository server from the individual slave machines, but no indication that they’re having any trouble talking to Pulse…
Does Monitor also read job & slave updates from Pulse? So when there’s a Pulse server running, that machine is the only thing making changes in the repository?
Or have I got it all wrong? Thanks in advance for any insight (or directions to a doc that spells this out)
Pulse just acts as a proxy to the repository in two areas:
When a slave wants a job
When the Monitor does a refresh
These are the “heaviest” operations in terms of bandwidth, so offloading these to Pulse helps improve the overall speed of the system. Other operations, like modifying job properties from the Monitor, or a Slave updating its slave info, still go to the repository, because they are considered “light” operations.
Yes, that’s helpful. How often do the slaves update their slaveInfo files, then? Is that on the same interval that the slaves connect to the repo for jobs in a non-Pulse config?
So the Monitor app reads Pulse (when available) for slave & job info, never the repo directly? But job submissions, modifications, etc. through Monitor work directly with the repo? have I got this right:
When I submit a job from Monitor or integrated script, it writes directly to the repo.
The Pulse server scans the repo periodically and records the new job
Slaves poll the Pulse server periodically and are ASSIGNED tasks (vs. deciding for themselves which tasks to pick up?)
As Slaves complete tasks, do they update the task/job info in the Repo directly, or do they notify Pulse, which makes the updates in the Repo?
Does the Pulse server cache the entire repository for clients to refer to?
Oh, I see – so the job list in Monitor comes from Pulse, but if I click a job to see its tasks, then Monitor is reading that directly from the repo?
We’re seeing an NLM_FREE_ALL notice on our repository (Solaris) every 20-40 seconds from every slave (Windows 7 via NFS)… that must be the slaveInfo updates, and must be NFS-related. Repo in the other location uses SMB and doesn’t report the FREE_ALLs.
Yesterday, we suddenly began to see no jobs/no slaves in the Monitor application on clients for 15 minutes at a time every couple of hours. No messages in the Monitor log to indicate they’d lost the Pulse server, and nothing in the Pulse log to indicate it lost the repo.
Sorry to be pestering for details, but knowing the mechanics really helps us design how to best plug Deadline into our infrastructure. Is there a doc that details this stuff, or am I getting it the right way?
It could be that Pulse’s cache was corrupted. A restart of Pulse may have helped.
No problem! We don’t have the specifics documented anywhere, so this is the way to get the info you need. Plus, I’m sure others are benefiting from this too.