www.secretgeek.net/csv_trouble.asp
And sometimes there's double quotes, sometimes single quotes. Step 4 -- The Descent into Chaos You start to adopt a 'test-driven' approach, only it's more of a 'panic driven' approach. You write numerous test cases for your unwieldy csv parser. it breaks your existing code and you need a new test case or two. You begin to add new test cases, and trying always to do the simplest thing that will get the code to work. You have grown a long beard, which is particularly annoying as you are a woman. You have lost all boundaries in regard to personal hygiene. Managers circle your desk like vultures circling a wounded leopard. You lift your head from the keyboard for just moment when a thought strike you. You download a code sample from the internet, and use your test cases to try them out. The downloaded code is much worse than what you've written yourself. When you try to contact the developers of each library to see how they work, you find that the developers have generally retired and/or passed away and/or quit working in the IT industry.
The resulting code is so readable that you'll survive your next code inspection without getting your arms and legs torn off by Terry (Head Code Nazi and leader of the local chapter of The Programming Gestapo). You can stop re-inventing the wheel and get on with your day job: cranking out more bugs, faster.
lb' on Wed, 13 Sep 2006 01:52:50 GMT, sez: i was gonna do a side track about the ODBC text drivers. they're excellent and worth while -- but they're still a potential dead end.
lb' on Wed, 13 Sep 2006 04:36:55 GMT, sez: hey i do it too. in the last week i've worked with three different groups of people who have all done it too. this is more of a go at myself than anyone jeb -- though the stringbuilder lesson we learnt today certainly *did* bring you to mind. The latest innovation was to write a wrapper around Split to put its results in a List instead of an array (to facilitate Removing subarrays of data efficiently). It's certainly not the first time I've done this and it's not quite as bad as you make it out to be :) I'd never consider regular expressions for this so maybe that's why I don't have as many problems. Historically, my issues tend to be with the people on the other end who like to randomly decide to move columns around, add new columns, etc.
Marcos' on Wed, 13 Sep 2006 11:29:49 GMT, sez: Leon Thanks a lot for spread the FileHelpers to the community and for your comments about it =) Dave: "Historically, my issues tend to be with the people on the other end who like to randomly decide to move columns around, add new columns, etc. If you the FileHelpers you only need to move a field up or down and you can make some checks of types and length to ensure that your files are not modified. Cheers 'b0n' on Wed, 13 Sep 2006 12:19:29 GMT, sez: This sounds fantastic. I was always too humble/indignant to write my own CSV parser, but instead wasted those 15 months searching for decent code on the Internet. Now if you could get Microsoft to package this with every crappy copy of SQL Server Express, you'd be up for a Nobel peace prize.
Haacked' on Wed, 13 Sep 2006 23:28:04 GMT, sez: Why write a CSV parser when you can write a domain specific programming language for parsing CSV files? Any delimited format holding arbitrary data needs quoting, which rules out pure regex and implies a need for escapes. Someone who's been around the block a few times will see all of the above in a single glance and go straight to a parser generator. I hope all beginning programmers see this webpage and decide whether they understand the rules or not. How to write it: Write the parser until it starts to run on simple cases. Figure out what few simple homemade parsing functions would help make you parser handle all official rules. Then start all over and write the parser so that it is relatively simple due to calling your smart parsing functions. If you control the data being input, you can restrict it to a subset of CSV rules. But, if you are reading in major amounts of data from outside of your control, you should buy (with money) a real full-blown CSV parser library. Or, expect to spend months on all the degenerative cases for poorly formed input data. Otherwise, what happens when your client Foobar Industries sends you a gigabyte file of pseudo CSV and you must load it *today*.
But then I found Odbc Text drivers and it allows csv to be queried using SQL - so i stuck with it. But all in all I give a big thumbs up for Sebastien's piece of excellent work -- and it's free too! I found Seb's CSV parser about 5 months ago and it was the best (and fastest) by far after going through weeks of testing the others. However, I moved to Odbc text drivers only because of SQL query capabilities. I am keeping Seb's CSV parser close at hand in case I may require it in future.
lb' on Sat, 03 Mar 2007 03:15:52 GMT, sez: Thanks Mandar -- i like the approach of your code. Well, the vultures (managers) wouldn't get near my desk. But the "Error: unknown error" is really freaking me out man!
Jeff Zanooda' on Sat, 10 Mar 2007 00:28:41 GMT, sez: I'm sorry, I don't get it. It's just a finite state machine (one state variable + one large switch statement). this is not talking about data that you can control, this is about handling data which you cant and which is not (allways) following rules.
lb' on Sat, 10 Mar 2007 09:48:00 GMT, sez: "escape the commas before they go into the csv file" sorry Zepolen -- but generally when you have a nasty csv parsing problem, you're dealing with someone else's csv -- so how you would write it is irrelevant. i like (and am a little amused by) your code -- replacing the commas with a weird string, then replacing the weird string later. lb 'pop' on Sat, 10 Mar 2007 12:28:16 GMT, sez: @zepolen check out fgetcsv in the PHP manual. About the only thing that makes a mess is storing file locations on Windows.
|