Berkeley CSUA MOTD:Entry 13963
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2024/12/24 [General] UID:1000 Activity:popular
12/24   

1998/4/15 [Computer/SW/Languages/Perl] UID:13963 Activity:very high
4/15    Let me clarify on the Perl question.  I have two commands need
        to be spawned from a Perl script, $cmd1, $cmd2.  $cmd2 depends on
        $cmd1.  However, $cmd1 and $cmd2 are called millions (no, not an
        exaggeration) of times in a foreach loop now with different
        arguements each time, so performace suffers.  Now the question is,
        how to run each set of $cmd1 and $cmd2 in parallel while still
        keeping $cmd1 and $cmd2 to run in serial?  I tried system but
        failed.  Each set of $cmd1 and $cmd2 waits for the previous set
        to finish...
        \_ run fork, and have each child system $cmd1 then $cmd2, then
           exit.  The parent process just forks a million times,
           then exits.  Now, make sure to run this on a big box, or
           million perls will bring you to your knees. - seidl
           \_ Is there an alternative?  fork-ing a million times
              is expensive even on a big box.  (I just ran a simple
              script that forks 10,000 times and it increased the system
              load by 10-fold, and it also forbids more processes to be
              spawned). Can we execute each set in parallel w/o fork?
              Thanks.  Or maybe it wasn't pratical to try to run these
              millions of set of commands in parallel in the first place?
              \_ depending on the speed needed, spacing out the forkings
                 might be a good idea (just sleep 0.1 secs between each
                 pair or something like that)
                \_ So thats 100k seconds per million iterations roughly
                   equals 27 hours per run?  3 million = 81 hours.  I think
                   not.  That 27 hours is only the added delay between runs
                   and does not include actual runtime for the binaries.
                   \_ doh. well again, it depends on the binaries'
                      runtime. 27 hours might be acceptable if it means
                      not killing an overloaded server. Adjust the figure as
                      as appropriate for a faster box
              \_ Uh, how do you plan on running different commands without
                 forking separate processes?
                 \_ errrr...good point.  So fork is the only option when
                    you want to run things in parallel?  How does UNIX
                    shell execute two background commands?  By forking
                    twice?
                    \_ Yep.  Thats how you get a new process. fork.
                       Now, you might be able to avoid the extra perl
                       processes with creative use of system("$cmd1 ; $cmd2");
                       but a million jobs in parallel is a lot. - seidl
                       \_ Instead, sh was spawned.  Is that necessarily
                          faster than spawning perl itself?
                    \_ Fork isn't the only way to run tasks in parallel --
                       you can thread things too.  But unix command-line
                       programs can't be run in threads.
        \_ you cna also call system("$cmd1;$cmd2 &");
           \_ buy a clue. system forks implicitly
             \_ if forks but it also waits till completetion you dumbass.
                try the following in perl
                     system("echo foo; sleep 10");
                     system ("echo bar");
                note that it prints foo, then waits for 10 seconds, and then
                prints bar.   -aspolito
        \_ Ok... can you rewrite the unix binaries so that you're only doing
           a single system call from perl and the binary does the looping
           without forking?  Are the command line parameters known before
           or non-deterministic based on user input, the time, or grains of
           sand on the beach?  I agree with the previous person that doing
           millions of calls to a binary from perl is a bad way to go.
           Rewrite the C, if possible, to relieve PERL and the system of this
           forking burden.