I proposed cmd line args as communication method, because I thought it's the most portable way.
What for are passwords needed? (DB-pw is static, and can be stored in a conf file.)
Here comes a very long
text.... sorry about that.... I'm not a pro programmer, so I'm a little insecure on the approach
, and want to discuss this before jumping into work. Also: time is precious for me, so I don't want to do unneeded things.
Because of the store-and-forward nature of the upload/mirror process I had the idea to mimic a bit my beloved postfix.
All SMTP servers must consider:
* at any time a new job can come in
* a job can result in lots of work
* while a job is running more jobs can arrive
* each job can result in multiple new jobs
* process count and bandwidth must be throttable
* a lot of things can go wrong
* data must never ever get lost
* after a (forceful) shutdown the server must continue where it left of (consistent and persistent states)
The Postfix model (as opposed to sendmail == monolithic):
* separate processes for different tasks (well maintainable code, processes can have diff. permissions, processes can run inside/outside a chroot, processes can run on diff. hosts)
* simple interfaces
* IPC over socket (transparent file or network)
* queues are used to control the server load
* full architecture overview: http://www.postfix.org/OVERVIEW.html
OK, so here is the queue model for the upload system:
socket \ | |
or >-- incoming-mgr -- outgoing-mgr
cmd line / | | | |
meta-extr FTP MD5 DB
Queues are directories that contain text files that contain job infos. The "data" is never passed around, as it is a file already stored on the disk, and there is no need to move it (and if there is, than make it a job).
From left to right:
IMO it doesn't really matter if the input gets communicated to a socket of a running daemon or by cmd line call (starting a prog) that results in a program doing exactly the same :)
The result in both cases: a job description gets written to a file in the incoming-queue-directory. Then the incoming-queue-mgr gets active (socket call) and calls the appropriate incoming-hook: an external program.
The external program moves the job-file to its working-queue-directory (afaik filesystems move atomically) and does its job (metadata extraction, DB entry). Upon completion it exits with a exit code (success/failure) and possibly appends info to the job-file before moving it back to the incoming-queue-directory.
The incoming-queue-mgr reacts according to the exit code:
success -> run next ext.prog or move job to outgoing-queue(-mgr)
failure -> move job to incoming-deferred-queue, contact admin / retry after x min / call SF error func
The outgoing-queue-mgr starts the external programs for FTP-transfer, md5 local-remote-diff, DB entry for the job in this order. It reacts according to the exit code much like the incoming-queue-mgr.
That makes 2 running daemons that know how many ext. progs they have started. When the system starts it checks all queues (incl. the working-queues of the ext. progs) for jobs, and advances those found.
I have never programed such a thing, but I'd like to try. If it works we get a generic, extensible, configurable, workflow oriented job manager(TM).
The manager should be generic as it does not mind the kind of jobs it handles, and that it has an input and a output to another mgr or is the end of the line.
It'd be extensible in that more mgr can be appended "to the right" and more ext.progs can be started in a defined order by each mgr.
It'd be configurable about success/failure actions and number of concurrent jobs.
WOW... now I've written this super long posting and wonder if there is something like that already as OSS
... I also wonder if it's not total overkill for "uploading a bunch of files"