2/28 I have a a few hundred GB of data that exists in two locations
connected by a slow network. When a change is made in one place,
I want it to update the second location (one way only with a 'master'
copy and a 'mirror'). I use rsync for this now, but it is just too
slow. Products like SnapMirror exist if you want to spend money,
but they are also hardware dependent (Netapp). Is there some better
way to solve this problem? I am not opposed to coding my own solution,
but I'm not sure where the innovation needs to be made. --dim
\_ we are currently considering a product by availl that does this.
we are a huge netapp shop and use snapmirror but needed it to be
r/w on both ends and platform independant.
\_ how much data is changing? Which part of rsync is the bottleneck?
rsync only transfers the parts that have changed, afaik... are you
using compression (if text-ish data)?
\_ the problem with rsync is that it .. at run time ... scans the entire
set of files, on the local and remote, and looks for differences
between them. So it is reading every file and checksumming every
file, regardless of whether they changed or not, and communicating
this information across the link between source and destination.
What you need is something that hooks into the file/operating
system and only notifies of changes when they happen, and
propagates them over. This is what SnapMirror does, and does quite
well. Maybe you need to look for coding a similar solution. As a
quick hack, you could just have rsync go over the files/directories
that have had timestamp change since last run, instead of scan
the whole directory tree. -ERicM
\_ actually, i think if the files are the same size and same
modification date, it'll skip them, even if the files are
actually different.
\_ There are options in rsync to do this, but in general usage
you're wrong.
\_ Actually, I'm pretty sure (s)he's right; look at the docs
for the --size-only option and what it says about the
Normal behavior. --dbushong
\_ even if you tell rsync to only check file sizes/timestamps,
it *still*, at runtime, it still has to communicate with
remote rsyncd to determine if local and remote size+timestamp
are different, for each file. This will eat time and
bandwidth galore, unless you're trying to only sync a few
large files. -EricM
\_ If you are willing to do a bit of hacking this isn't so hard. You
want a custom NFS server that journals modifications and ships them
to the slave. The slave then applies the journal. If the master is
a linux box you should look into FUSE (File system in User SpacE or
something like that), as this would be pretty easy with FUSE + clue
--twohey
\_ Veritas has a tool, called Veritas Volume Replicator that does
this. I have never used it, but considering my previous experience
with Veritas products, I would expect it to work fine. It is
also cross platform. -ausman
\_ We used VVR (as well as Cluster FS) at Walmart. It works very
well and I'd definitely use it again. I love Veritas' file
system and volume management products. (Not cheap!) -- Marco
\_ How about giving rsync a list of files that have been modified
since the last rsync? First find recently modified stuff on the
master, then rsync only that stuff to the mirror. You might
be able to hack rsync to do this for you--, like:
rsync --more-recent-than 24 hours
--PeterM
\_ This is a good idea.
\_ How about OpenAFS or Coda? They both have mirrored modes, IIRC. |