
cvcpSync
version 1.0
This script was made and tested on Mac OS X 10.5.5. You’ll need a working Xsan or storenext envirement. You’ll also need cvcp it self because the script calls this command many times.This script was made and tested on a machine running Xsan 1.4.2. You can run this script automatically via launchd. I recommend you use Lingon to achieve this.
The problem
We have a large xsan volume of 28 TB. We want to sync this volume every night to another xsan volume of the same size. In the beginning when the volume was smaller rsync did the job. But now with 28 TB rsync is simply to slow to get the job done in one night.
The solution
I've made a script that uses rsync to get a list of all the changes and then use cvcp to actually copy the data. This dramatically improves the speed. I've tested with 145 GB of data. Rsync did it in 70 minutes and the script did it in 40.
While testing with this I noticed that the CPU and fibre could easily handle another cvcp command at the same time. So I changed the script to do just that. After that the script did the same 145 GB in around 20 minutes.
I am doing a sync now everynight in an average of three hours. The amount of data is off course different every night. At the most it needs to transfer 1.5 TB. It might take a little bit longer then but it is still done in the morning.
The script
If you put the script in /usr/bin/ of /usr/sbin than you could simpy call the command cvcpSync instead of using the path everytime.
The script has the following options (flags):
-H Help.
-d Debug mode.
-r Do not perform a final rsync afterwards.
-t Do not copy the .Trashes folder
-c Number of multiple cvcp commands. Default is 3.
-s Number of seconds to sleep between process checks. Default is 1.
EXAMPLE: cvcpSync -c5 -s2 /Volumes/MySan/Folder/ /Volumes/MyOtherLocation/Folder/
This would use maximum 5 cvcp commands at the same time. When all the 5 commands are running it would check every 2 seconds if there has a command ended.
-d : Debug mode
Within debug mode you'll see a lot more messages appear on the screen. This is offcourse for debuging while writing the script. But it's very important for configuration as well.
-r : Do not perform a final rsync afterwards.
There is one problem with cvcp that is that it doesn't make folders.
So when I say copy /Volumes/mySan/myFolder to /Volumes/myRedundantSan/myRedundantFolder and myRedundantFolder doesn’t exist, cvcp would give an error instead of making the folder himself.
Because of this I have to make the folder before we can copy the data. This brings up another problem. I make this folder as root (or the user running the script). So the folder has a different owner, permissions and a date stamp. To solve this problem I run another rsync after cvcp is finished. When all went well this final rsync will only change these folders owner, permission and data stamp as well as some resourceforks.
The downside of this is that you run rsync two times. For those off us knowing rsync we know that it can take a while for rsync to build up a file list. So it has to do this again.
If owner, permissions and date stamp doesn't matter to you for your redundant san then you could turn this off. If you, like me, need an exact copy of the data then simply don't use -r.
-t : Do not copy the .Trashes folder
When copying a whole volume you'll copy all the trash as well. In our case with the 28 TB the .Thrashes folder sometimes contains more then 1 TB. So I think when people threw stuff away I don't need to copy that. This saves me data and time. If this is you to then use the -t option.
When you really want an exact copy of even the trash then don't use the -t option.
-c[number] : Multiple commands
cvcpSync will use per default maximum three commands at the same time.
When you want more or less simply use -c[number] for how many you can handle.
So like -c2. See the configuration part for more details.
-s[number] : Sleep
When all the commands are running we need to check when the currently running commands are finished. Per default it will check every second.
When you only have hudge files you could set this to a couple of seconds. When you have a lot of small files I suggest you keep this at 1. See the configuration part for more details
So it's simply -s4 to wait 4 seconds.
Configuration
It's important to configure the script. This is how I do it.
If you have a test volume use that. Needless to say the server must have at least one xsan volume mounted and have cvcp installed.
Please do not take this script in production right away.
First run this script only in debug mode and with only one command.
So like this:
cvcpSync -d -c1 /Volumes/mySan/ /Volumes/myRedundantSan/myFolder
Now it's best to have activity monitor open while running this script. While running this for the first time look at the CPU load and disk activity
As said before debug mode will give you a lot of information on the screen. Look for this string within the debug data: all_running : All available commands are running.
This string means that the maximum amount of commands have been reached and we're looking if some process has been finished. If you have them a lot consider an extra command.
Most people will never use one command (that's why the default is 3). So in the beginning you will allways need an extra command. But it's important to see how your server is doing while running cvcpSync.
Remember: When you want to use another extra command always check if your CPU and fibre speed can handle it.
When your CPU or fibre can't handle any more commands than use the -s option to wait longer. This will in most cases help to reduce the CPU load as well. But there is a downside. Say you have large and small files on your volume and three commands running simultaniously with a sleep of 5 seconds. When the maximum of commands is reached it will wait 5 seconds before it checks if some processes are finished.
So it's possible that cvcpSync was only copying small files that are done copying within the second. Still we told him to wait 5. Imagine doing this for a whole volume it could take some time.
You could also think that because this script will run at night you'll kick your server and fibre on it's tail and go full speed. I don't advise this. I reckon that when you don't need to kick your server on it's tail don't do it. Be nice to him, then he might be nice to you!
Finally a word to the wise...
Although I haven't had any problems with the script the risk is still all yours. Whatever may happen while running this script: I'm not responsible!
I'm curious if this script works for you. Let me know!
