stevehanov.ca/blog/index.php?id=50
Why Perforce is more scalable than Git Posted on: 2009-02-23 21:31:22 Okay, say you work at a company that uses Perforce (on Windows). So you're happily tapping away using perforce for years and years. Perforce is pretty fast -- I mean, it has this "nocompress" option that you can tweak and turn on and off depending on where you are, and it generally lets you get your work done. If you change your client spec, it synchronizes only the files it needs to. Perforce is great, why would you ever need anything else? Once you've experienced git, there is no going back, man. You might have checked out firefox -- but have you checked out firefox ooon GIT? Want to store your changes temporarily to work on something else? Want to automatically detect out of bounds array accesses and add missing semicolons to all your code? "git umm-nice-try" Branching on git is like opening a new tab in a browser. And you wrote the code, so you get to merge it back in, because you are the expert. Branching on Perforce is kind of like performing open heart surgery. It should only be done by professionals: experts in the art who really know what they are doing. You have to create a "branch spec" file using a special syntax. If you screw up, the entire company will know and forever deride you as the idiot who deleted "//depot/main". Now, if you have been using git for a few days you might discover this tool called "git-p4". you might say, "I can import from my company's p4 server into git and work from that, and then submit the changes back when I am done," you might say. It's just a big python script, and it works by downloading the entire p4 repository into a python object, then writing it into git. If your repo is more than a couple of gigs, you'll be out of memory faster than you can skim reddit.
hack up git-p4 to do things a file at a time in about an hour. The real problem is: Git can't handle large repositories Okay this is subjective because it depends on your definition of large. Because your company's source tree is probably that large. Maybe you check in binaries of all your build tools, or maybe for some reason you need to check in the object files of the nightly builds, or something silly like that. P4 can handle this because it runs on a cluster of servers somewhere in the bowels of your company's IT department, administered by an army of drones tending to its every need.
when it started to show its strain, Larry Page personally went to Perforce's headquarters and threatened to direct large amounts of web traffic up their executives' whazzoos until they did something about it. The typical git user considers the linux kernel to be a "large project".
Linus's git rant on Google code, take a listen to see how he sidesteps the question of scalability. Go ahead and wait a minute after every git command while it scans your entire repo.
I don't think many people really use distributed source control. Most git users (especially those using Github) use the centralized model anyway. Ask yourself this: Is it really that important to duplicate the entire history on every single PC? Do you really need to peruse changelist 1 of KDE from an airplane? What you really want is the other stuff: easy branching, clean, and stash, and the ability to transfer changes to another client. The distributed stuff isn't really asked for, or needed.
Just give me a version control system that lets me do these things and I'll be happy: * Let me merge changes into my coworker's repos, without having to check them in first. git folder, when this could be stored on a central server.
Subscribe to posts Post comment Real Name: Your Email (Not displayed): Text only. No HTML If you write "http:" your message will be ignored. Choose an edit password if you want to be able to edit or delete your comment later. Editing Password (Optional): Post Holger Schurig 2009-03-14 19:14:09 Try "git clone --depth 0".
Edit zzz 2009-03-14 20:04:08 Git was designed to be a version control tool - no a quasi- file-server 'repository' which is how most other tool like Subversion and Perforce are actually used. It was also not designed to track a whole set of unrelated projects - say a teams entire code-base, something that both Linus and then Randall made pretty clear. Track each project as a single Git repository, and if you need to tie them together, create a master repository that included each one as a sub-module. The flexibility you gain from 'setting free' your individual projects is enormous, as it the smart use of a master repository that uses branches to create different mash-ups of your overall code-base.
Edit Stephen Waits 2009-03-14 20:04:40 I've done some testing on big p4 repositories. A p4 sync takes less than one second, if no files have changed on the server. Git, plain and simple, does not scale to large repositories. That's OK, I guess, it's not really designed to handle that use case.
Edit Stephen Waits 2009-03-14 20:07:27 @zzz, FWIW, we store all of our code and data in p4 because it's the Right thing to do. If I need to sync back a month to look at some issue, I need the specific data to be sync'ed back too. It stays out of our way, it's faster than anything out there.
Edit zzz 2009-03-14 20:35:28 "we store all of our code and data in p4 because it's the Right thing to do" Well you may have identified another use case where Git is not ideal - really large binary blobs. I think the problem is Git has to checksum (sorry SHA1) all files it scans - and that would take some time on a 36GB file. To be fair, Git has always been advertised as a SCM - ie a source-code management system - and for that use-case it absolutely rocks IMO Personally I would still investigate a hybrid approach where you have the option of pulling just the source down to your lappy with Git, so if you are on the plane and you DO want to look at change-set 1 at least you can!
Edit wcoenen 2009-03-14 20:36:51 Let me get this straight: you're saying perforce is faster than git for large projects? This surprises me because most git operations are completely off-line since all the data is local. I thought that operations which require network I/O are the slower ones. Care to back up your claim with a specific use case and some data?
Edit jeremydw 2009-03-14 20:45:45 @ Jason P, an internal wrapper script for p4 that sends changelists to reviewers for reviews and approvals. To merge changes into a coworker's repo, why can't they just patch a CL? You don't have to submit a CL for a coworker to grab the changes.
Edit albert 2009-03-14 20:50:40 "Let me merge changes into my coworker's repos, without having to check them in first." Would that be your coworkers distributed repository, by any chance?
Edit NotAFan 2009-03-14 21:08:17 Perforce may scale well with regards to data size. In my experience it doesn't scaled well over a distributed network. Between having to check files out to work on them and tight integration with Visual Studio, if your link to the Perforce server does down you practically have to stop work. My one experience of Perforce was doing work with another company remotely. Combine that with Perforce lead to an incredibly frustrating experience.
Edit SJS 2009-03-14 22:21:27 The one feature of a DVCS that I really really really like is the ability to use it as a sneakernet. Not all of the machines I develop on are connected to a network, or connected to the same network that the central/blessed repository is on. This I do not see as a feature -- if there's a central repository, it should be used as the mechanism of communication between developers. Sometimes I care, sometimes I don't (disk is cheap, but disk fills up faster still). Having an option for git to use either a local or a remote (central/blessed) repository would be nice. Disclaimer: I still use CVS, I've used Perforce (and liked it), and I use git (and like it), and I don't currently have an repositories that approach the sizes discussed in the article.
Edit zzz 2009-03-14 23:10:09 "we store all of our code and data in p4 because it's the Right thing to do" Well you may have identified another use case where Git is not ideal - really large binary blobs. I...
|