Being able to submit jobs using the REST API is great for speed and sanity, but right now, we are still forced to manually format all of the job and plugin info keys to strings in the JSON dictionary before POSTing it (so the values match those in the INI-style submission files):
It would be great if Pulse(?) was able to properly handle the incoming JSON on the receiving side (proper type conversion, list handling, etc.), to make constructing a submission structure more intuitive. It would also be great to allow things like EnvironmentKeyValue# and ExtraInfoKeyValue# to just be handled as single dictionaries:
Since the move to the Mongo backend, the reasoning for the current formatting requirements seems questionable.
Additionally, has there been any consideration given to moving to something like YAML (or even JSON again) for the submission files in order to support data structures and typing?
I’d be very interested to hear thoughts on any of this.
The reason for the current format requirement is that our submission code is shared between the command line submission and the rest submission. The command line submission builds up a dictionary from the key/value pairs in the job info and plugin info files, and that’s why the rest API wants the same. However, one thing we could do is to convert basic types like int and bool to strings on Pulse’s end, instead of failing like it currently does. You still won’t be able to pass lists or dictionaries as values for some job properties, but at least it’s a start.
No, there has been no discussion on moving away from the key/value pair files for submission, and honestly, this is the first time it’s even been brought up.
Any reason why you would want an alternative? We find the key/value system to be easy to read and easy to write, which is why we’ve never felt the need to change it. Yes, there are some oddities (like how you need to specify environment variables), but for the most part it’s pretty easy to use.
I guess it seems like it would be more intuitive for the internal interchange format to be proper JSON at this point, rather than a sort of string-ified hybrid. Ultimately, the DB ends up storing pretty much what I showed in my example (including “dicts” for the environment and extra info mappings).
Heh… not too surprising. Leave it to me to request things no one else cares about…
Honestly, I think YAML would make things even easier, for a few reasons:
I think it’s actually more readable than the current style:
Plugin: MyPlugin
Frames: 1-10
Blacklist: [rd-206, rd-207, rd-208]
EnvironmentKeyValue: {BAR: spangle, FOO: bar}
UseJobEnvironmentOnly: true
2) It’s technically a superset of JSON and thus supports proper typing. Plus it would be easy to convert to JSON for internal use.
3) There are existing APIs for reading and writing well-formed YAML with various languages.
When a job is submitted, we don’t just convert the key/values to JSON and dump it in the database. There are many sanity checks in place to make sure the data provided is valid, and then we build up an actual Job object, in addition to other objects like the tasks and machine limit. These objects then get serialized into JSON and sent to the database. So there really isn’t much to be gained from an internal point of view by switching the format of the submission input.
I do understand though that since the web service uses JSON, being able to submit a job in JSON format might make more sense then building up a string dictionary. On the other hand, having a unified format for submitting from the command line and submitting from the web service has its benefits as well (one code base, once set of documentation, etc).
Yeah, that makes sense in hindsight… I glossed over quite a bit there. However, if that’s the case, it seems like it would almost make it easier to support multiple job definition syntaxes… you would just need a sort of pluggable interface for converting a block of text to Job and Task definitions, etc. I realize that’s a very simplistic way of putting it (and one that overlooks the work that would be necessary to establish said interface), but as easy as the current INI-style files may be to hand-write (which I would still argue is less readable and harder to write than YAML), I’ve found them awkward to generate programmatically.
Yeah, I guess that’s a bit of a judgment call. However, for the sake of conversation, I would argue that the format right now isn’t really unified (only the structure of the values), so moving to properly-structured JSON values for web service submissions only seems like less of a jump.
Just to be clear, I’m not dying for these changes to be made or anything… I just like speculating on various possibilities for potential improvement.
The idea of supporting multiple formats is an interesting one, and if we had to choose another, JSON would likely be that choice. I’m going to put this on the wishlist, although it will be a lower priority item for now.
I know this is a low-priority item, but I just wanted to add one more example of why proper JSON value handling would be advantageous for REST submissions.
We submit our own JSON document with every job to store various pipeline-specific information. Right now, I have to dump this document to a JSON string manually, and then include that string in the job’s pluginInfo, which makes it awkward to work with. The awkwardness is multiplied when you consider that it’s being stored in Mongo…
If I were able to just include a sub-document in the pluginInfo JSON, pass the whole thing through the REST API, and have it stored as-is, it would be dead-simple to query a value directly using any Mongo driver, rather than requiring the complete JSON document to be deserialized in order to be queried.