11/15 So if I have a process running in Linux and kill -9 isn't
killing it, and killing its parent process didn't kill it,
and now it's reading as though its parent process is 1,
is there any way to kill it short of rebooting the machine?
\_ Sure, but you dont want to be mucking around in kernel data
strucures. Most practical way is to reboot the machine.
\_ Sure, but you dont want to be mucking around in kernel data
strucures. Most practical way is to reboot the machine.
\_ Ah, good ol' Linux reliability. Just out of curiosity,
what kind of process is this anyway?
\_ This has nothing to do with "Linux reliability"; it has to
do with trying to kill a process that's blocked. The most
common scenario is a disk wait; NFS server goes away or
physical I/O error on hard disk hangs up the process. -tom
\_ I believe some low range of numbers are reserved for
kernel processes only and will not let you kill it with
conventional kill method.
\_ no, you can kill any process with kill signals from the
root user. However, kill is just a signal, and there
are paths (i.e. device wait) where processes are too
wedged to process the signal and die nicely. -ERic
\_ The process in question is referencing a SCSI device.
Is there a way to kill the device wait? -op
\- the singal handler will not send a signal to a
\- the singal dispatcher will not send a signal to a
process in a disk wait. you could try while 1
kill -9 PID or force an umount of the fs ... but
that's all unpredictable. there are some super
hairy things you can do but they are beyond the
scope of the motd and are os dep.
\_ Plug the SCSI device back in. -tom
\_ Never unplugged it. Turned it off and turned it
back on. No love.
\- it might be interesting to see what happens
if you change run levels or unload the
scsi kernel module [rmmod on AssOS].
But as I said earlier, if you dont know
why you are going in a disk wait ... it
could be something obvious like removable
media, netowkring going out etc ... or it
could be unclear [failing disk?] ... that's
what you should be trying to figure out.--psb
\_ Anyway, your fundamental problem is that
your process is waiting on the SCSI device,
and it won't go away untill the SCSI device
unblocks. Look at rescan-scsi-bus or
something. -tom
\- You should try to undnerstand why "kill -9" isnt
killing the process.
\_ Short answer: No. You're waiting on the drive. If your proc is
in locked in 'D' state, you're hosed. Figure out what is wrong
with your drive.
\_ Followup: Rebooted the machine, and now all is well. Will explore
other methods of downing this particular process in case this
arises again. Thanks to all for suggestions and information. -op
\_ Depending on exactly what is going on you may not be able to
kill the proc. At my first job I was 'tape back up guy' among
other things. We had first gen crappy tape drives that often
just stopped responding to commands. Usually power cycling the
tape drive would clear the procs, but very rarely that wasn't
good enough and a full reboot was required. These were Suns.
YMMV, but you may find there is no answer beyond 'reboot'.
\- tape drive device drivers look more like disk device drivers
than tapes looked like disks. --psb
\_ Uhm, ok. Yes. Are you supporting what I said or
disagreeing in some way or ...? --confused |