Berkeley CSUA MOTD:Entry 48537
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2025/07/08 [General] UID:1000 Activity:popular
7/8     

2007/11/5-6 [Computer/SW/Unix] UID:48537 Activity:kinda low
11/5    sed/awk question.  Let say i have a csv file.  I need to do two things:
        1. insert a comma after first character in the first column
        2. merge 2nd colmun with first, separate by a comma.

        so, original file may look like this
        abc, def, ghi, jkl ...
        it will become
        a\,bc\,def, ghi, jkl

        any idea?
        \_ Stop rolling your own CSV parser!
           http://www.secretgeek.net/csv_trouble.asp
           \_ Sadly the solution he has is .NET, but yeah.  Libraies exist
              for this kind of stuff in every language you would ever use.
              There's no need to reinvent the wheel.
              \_ The advice is good.  I used FasterCSV for Ruby and it's great.
        \_ Hypothetical: the above csv libraries don't exist.  Why use sed/awk
           insead of perl?
2025/07/08 [General] UID:1000 Activity:popular
7/8     

You may also be interested in these entries...
2012/5/8-6/4 [Computer/SW/Unix] UID:54383 Activity:nil
5/8     Hello everyone!  This is Josh Hawn, CSUA Tech VP for Spring 2012.
        About 2 weeks ago, someone brought to my attention that our script
        to periodically merge /etc/motd.public into /etc/motd wasn't
        running.  When I looked into it, the cron daemon was running, but
        there hadn't been any root activity in the log since April 7th.  I
        looked into it for a while, but got lost in other things I was
	...
2011/4/6-20 [Computer/SW/Mail, Computer/SW/Unix, Industry/Startup] UID:54078 Activity:nil
4/6     My company is evaluating version control systems. Our two candidates
        are Perforce and Subversion. Anyone worked with both and have good
        arguments one way or the other? (These are the only two options we
        have.) We're most interested in client performance, ease of use, and
        reasonable branching.
        \_ I'll be 'that guy'. If perforce and subversion are optins, why isn't
	...
2009/8/21-9/1 [Computer/SW/Unix] UID:53297 Activity:nil
8/20    When I use rsync to backup, it's pretty cool except in cases where
        I rename a directory name from the source. Rsync will just do
        a plain copy. Is there a program that'll detect renaming of
        directories (by checking for children files), or at least
        move them to a dated directory?
        \_ Not related but beware of using rsync as a backup tool.
	...
2009/4/30-5/6 [Computer/Theory] UID:52923 Activity:nil
4/30    Sorting question!  I have n sorted arrays of doubles.  What's the
        fastest way to sort them into 1 big sorted array?
        \_ as mentioned below: you are describing one half of mergesort
        \_ You really have to ask this question?
           \_ You don't know either, huh?
        \_ If three are n sorted arrays of m doubles each, I think the fastest
	...
2009/4/27-5/4 [Computer/SW/Unix] UID:52913 Activity:nil
4/27    Git, Darcs, or Mecurial?
        \_ If you do not need and will not want anyone on Windows, Git is OK.
           I'm partial to Mercurial, since it's simple, and doesn't pollute
           my directories with .svn, it just has a top-level .hg directory.
           I hear darcs is good if you want to allow multiple people editing
           the same file, since you can merge different changes within a file.
	...
2008/12/2-7 [Computer/SW/Editors/Vi] UID:52143 Activity:nil
12/2    Article 1, Section 6
        No Senator or Representative shall, during the Time for which he was
        elected, be appointed to any civil Office under the Authority of the
        United States, which shall have been created, or the Emoluments whereof
        shall have been encreased during such time
        \_ Stop stomping my changes.
	...
2008/8/14-17 [Computer/SW/Languages/Misc] UID:50866 Activity:nil
8/14    someone messed up their svn motd branch merge script
	...
2008/3/17-21 [Computer/SW/Apps] UID:49481 Activity:kinda low
3/17    Is there a way to compare differences between two substantially
        similar Excel and/or Powerpoint documents other than going through
        them manually?
        \_ In Excel 2003, there is a "Compare and Merge Workbooks..." option
           under the Tools menu.  (But for some reason it is greyed out in my
           Excel 2003.)  In PowerPoint 2003 there is "Compare and Merge
	...
Cache (5932 bytes)
www.secretgeek.net/csv_trouble.asp
And sometimes there's double quotes, sometimes single quotes. Step 4 -- The Descent into Chaos You start to adopt a 'test-driven' approach, only it's more of a 'panic driven' approach. You write numerous test cases for your unwieldy csv parser. it breaks your existing code and you need a new test case or two. You begin to add new test cases, and trying always to do the simplest thing that will get the code to work. You have grown a long beard, which is particularly annoying as you are a woman. You have lost all boundaries in regard to personal hygiene. Managers circle your desk like vultures circling a wounded leopard. You lift your head from the keyboard for just moment when a thought strike you. You download a code sample from the internet, and use your test cases to try them out. The downloaded code is much worse than what you've written yourself. When you try to contact the developers of each library to see how they work, you find that the developers have generally retired and/or passed away and/or quit working in the IT industry. The resulting code is so readable that you'll survive your next code inspection without getting your arms and legs torn off by Terry (Head Code Nazi and leader of the local chapter of The Programming Gestapo). You can stop re-inventing the wheel and get on with your day job: cranking out more bugs, faster. lb' on Wed, 13 Sep 2006 01:52:50 GMT, sez: i was gonna do a side track about the ODBC text drivers. they're excellent and worth while -- but they're still a potential dead end. lb' on Wed, 13 Sep 2006 04:36:55 GMT, sez: hey i do it too. in the last week i've worked with three different groups of people who have all done it too. this is more of a go at myself than anyone jeb -- though the stringbuilder lesson we learnt today certainly *did* bring you to mind. The latest innovation was to write a wrapper around Split to put its results in a List instead of an array (to facilitate Removing subarrays of data efficiently). It's certainly not the first time I've done this and it's not quite as bad as you make it out to be :) I'd never consider regular expressions for this so maybe that's why I don't have as many problems. Historically, my issues tend to be with the people on the other end who like to randomly decide to move columns around, add new columns, etc. Marcos' on Wed, 13 Sep 2006 11:29:49 GMT, sez: Leon Thanks a lot for spread the FileHelpers to the community and for your comments about it =) Dave: "Historically, my issues tend to be with the people on the other end who like to randomly decide to move columns around, add new columns, etc. If you the FileHelpers you only need to move a field up or down and you can make some checks of types and length to ensure that your files are not modified. Cheers 'b0n' on Wed, 13 Sep 2006 12:19:29 GMT, sez: This sounds fantastic. I was always too humble/indignant to write my own CSV parser, but instead wasted those 15 months searching for decent code on the Internet. Now if you could get Microsoft to package this with every crappy copy of SQL Server Express, you'd be up for a Nobel peace prize. Haacked' on Wed, 13 Sep 2006 23:28:04 GMT, sez: Why write a CSV parser when you can write a domain specific programming language for parsing CSV files? Any delimited format holding arbitrary data needs quoting, which rules out pure regex and implies a need for escapes. Someone who's been around the block a few times will see all of the above in a single glance and go straight to a parser generator. I hope all beginning programmers see this webpage and decide whether they understand the rules or not. How to write it: Write the parser until it starts to run on simple cases. Figure out what few simple homemade parsing functions would help make you parser handle all official rules. Then start all over and write the parser so that it is relatively simple due to calling your smart parsing functions. If you control the data being input, you can restrict it to a subset of CSV rules. But, if you are reading in major amounts of data from outside of your control, you should buy (with money) a real full-blown CSV parser library. Or, expect to spend months on all the degenerative cases for poorly formed input data. Otherwise, what happens when your client Foobar Industries sends you a gigabyte file of pseudo CSV and you must load it *today*. But then I found Odbc Text drivers and it allows csv to be queried using SQL - so i stuck with it. But all in all I give a big thumbs up for Sebastien's piece of excellent work -- and it's free too! I found Seb's CSV parser about 5 months ago and it was the best (and fastest) by far after going through weeks of testing the others. However, I moved to Odbc text drivers only because of SQL query capabilities. I am keeping Seb's CSV parser close at hand in case I may require it in future. lb' on Sat, 03 Mar 2007 03:15:52 GMT, sez: Thanks Mandar -- i like the approach of your code. Well, the vultures (managers) wouldn't get near my desk. But the "Error: unknown error" is really freaking me out man! Jeff Zanooda' on Sat, 10 Mar 2007 00:28:41 GMT, sez: I'm sorry, I don't get it. It's just a finite state machine (one state variable + one large switch statement). this is not talking about data that you can control, this is about handling data which you cant and which is not (allways) following rules. lb' on Sat, 10 Mar 2007 09:48:00 GMT, sez: "escape the commas before they go into the csv file" sorry Zepolen -- but generally when you have a nasty csv parsing problem, you're dealing with someone else's csv -- so how you would write it is irrelevant. i like (and am a little amused by) your code -- replacing the commas with a weird string, then replacing the weird string later. lb 'pop' on Sat, 10 Mar 2007 12:28:16 GMT, sez: @zepolen check out fgetcsv in the PHP manual. About the only thing that makes a mess is storing file locations on Windows.