Berkeley CSUA MOTD:Entry 28831
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2025/05/24 [General] UID:1000 Activity:popular
5/24    

2003/6/25-26 [Politics/Foreign, Politics/Domestic] UID:28831 Activity:high
6/26    Does anyone knows where I can get a copy of the following:
        http://analyst.sourceforge.net
        This is a parser which parse 10K SEC filing, and it was taken
        off because of PWC's patent on parsing number or something
        obvious.
        I am hoping that the project simply went underground instead of
        being completely dead.                  -kngharv
        \_ I want a copy of this software partly for parsing 10k filing,
           but now, I want it mostly for the sake of violating what I
           considered as abuse of intellectual property laws.
                        -- OP
        \_ PWC should be allowed to defend its patents by destroying
           the computers of anyone running this 'analyst' theft
        \_ If they weren't in violation of the law the project would still be
           there.  Are you one of those "the information wants to be free!"
           fanatics who think other people's work should be made available to
           you in exchange for *nothing* from you?  So selfish, so stupid.
           Patents expire.  Childish stupidity does not.
           \_ It is heartening to see one with such abject faith in the system.
              btw, you are right about childish stupidity. -crebbs
           \_ excuse me? patenting parsing?  please.
              \_ nonsense.  did you *read* the patent?  are you qualified to
                 do so?  maybe you've seen LA Law a few times?
           \_ "It was never the object of patent laws to grant a
              monopoly for every trifling device, every shadow of a
              shade of an idea, which would naturally and spontaneously
              occur to any skilled mechanic or operator in the ordinary
              progress of manufactures. Such an indiscriminate creation
              of exclusive privileges tends rather to obstruct than
              to stimulate invention. It creates a class of speculative
              schemers who make it their business to watch the advancing
              wave of improvement, and gather its foam in the form of
              patented monopolies, which enable them to lay a heavy
              tax on the industry of the country, without contributing
              anything to the real advancement of the arts. It embarrasses
              the honest pursuit of business with fears and apprehensions
              of unknown liability lawsuits and vexatious accounting
              for profits made in good faith." --U.S. Supreme Court,
              Atlantic Works vs. Brady, 1882
              \_ motdformatd
              \_ Yes and this is incorporated into the "obviousness" clause
                 used by the PTO to deny many patents today.
           \_ Patents used to expire. The current crop is getting extended
              faster than it is expiring. This does nothing but enrich
              the patent holder, usually long after the inventor is dead.
              \_ I do not think that word means what you think it means.
                 \_ which word? they look good 2 me. -phuqm/nottheposter
                    \_ patent.
              \_ No. Sorry.  Just wrong.  Patents used to be 17 years from the
                 date of issue.  A few years ago they changed to be 20 years
                 from the date of application.  Since it can easily take 3 or
                 more years to issue, patents are actually shorter now.  I'd
                 *love* to see a URL backing up your false and ignorant claim
                 that patents are "getting extended".
                 \_ he's probably getting trademarks and patents confused
                    \_ probably.  that doesn't excuse his clueless scribblings
                       and droolings all over the motd.
                    \_ I do not think that word means what you think it means.
                       In this case, the word would be trademark.
                 \_ Given the pace of (I).technology, particularly things like
                    that which is being discussed, it would be reasonable to
                    cut the patent protection time, not increase it.
                    (even if it is only a nominal increase)  Of course, there
                    is no perfect solution.  It is all a kludge -phuqm
2025/05/24 [General] UID:1000 Activity:popular
5/24    

You may also be interested in these entries...
2009/8/12-9/1 [Politics/Domestic/California/Arnold, Politics/Domestic/California/Prop] UID:53268 Activity:moderate
8/12    Thanks for destroying the world's finest public University!
        http://tinyurl.com/kr92ob (The Economist)
        \_ Why not raise tuition? At private universities, students generate
           revenue. Students should not be seen as an expense. UC has
           been a tremendous bargain for most of its existence. It's time
           to raise tuition to match the perceived quality of the
	...
2009/2/26-3/5 [Politics/Domestic/California, Politics/Domestic/Election] UID:52650 Activity:nil
2/26    "Fiscal conservatives" like Saxby Chambliss predictably uninterested
        in Obama's proposal to curtail ag subsidies:
        http://washingtontimes.com/news/2009/feb/26/lawmakers-hit-obama-proposal-to-cut-farm-aid
	...
2009/2/17-19 [Politics/Domestic/California/Arnold, Politics/Domestic/California/Prop] UID:52590 Activity:high
2/16    California is truly f'd for sure this time.  Can we find another pair
        of stupid radio DJs to start a drive to recall Arnold?
        http://www.nytimes.com/2009/02/17/us/17cali.html?_r=3&hp
        \_ It will only help if we get a governor with a spine, and get rid of
           the incompetent legislature.
           \_ How do you expect that we will get a decent ledge?  With the 2/3rd
	...
2009/2/9-15 [Politics/Domestic/California, Politics/Domestic/Election] UID:52538 Activity:low
2/9     GOP may be "winning" stimulus debate on TeeVee, but they're losing quite
        badly with the public:
        http://tpmdc.talkingpointsmemo.com/2009/02/another-poll-shows-public-approving-obama-disapproving-gop-on-stimulus.php
        http://tpmdc.talkingpointsmemo.com/2009/02/poll-obama-way-ahead-of-gop-on-stimulus.php
        \_ Uh, the support for the pork package is falling, and now the CBO
           says we'll get out of the recession without a stimulus.
	...
2009/2/5-10 [Politics/Domestic/Election, Politics/Domestic/President/Bush] UID:52518 Activity:low
2/5     Really Obama?  Really?  "This recession might linger for years. Our
        economy will lose 5 million more jobs. Unemployment will approach
        double digits. Our nation will sink deeper into a crisis that, at some
        point, WE MAY NOT BE ABLE TO REVERSE," Obama wrote in the newspaper
        piece titled, "The Action Americans Need."
        \_ Nice selective quoting there.  That's what he is saying we need to
	...
2008/10/16-20 [Politics/Domestic/President/Clinton, Politics/Domestic/President/Bush] UID:51555 Activity:low
10/16   You think the deficit is bad?  Look at the debt side.
        http://news.yahoo.com/s/csm/20081016/ts_csm/adebt
        '"At the end of the last fiscal year, that (the national debt) came to
        $53 trillion or about $550,000 per household," he says. "We may well
        have passed the point where the federal government's total financial
        hole exceeds the net worth of all Americans."'
	...
2008/9/16-23 [Politics/Domestic/Election, Politics/Domestic/President/Bush] UID:51198 Activity:low
9/16    (Brought up with update)
        Interesting old NYTimes article.  It seems Bush tried to overhaul
        Fannie and Freddie regulation 5 years ago, but was blocked by Dems.
        http://csua.org/u/mdc
        \_ Thanks, this renewed my faith in McCain
        \_ How was it blocked by Dems if the GOP had a majority of both
	...
2008/9/8-14 [Politics/Domestic/Election] UID:51100 Activity:low
9/8     here's a chance for your ranting to be useful to someone:
        assume McCain wins the election.
        Then what happens in the next 4 years?  The economy?
        Wars?  What?
        \_ We invade Iran.  US credibility continues to crater.  Dollar
           weakens further, both in terms of domestic inflation and
	...
2008/8/27-9/3 [Politics/Domestic/President/Clinton, Politics/Domestic/President/Bush] UID:50987 Activity:kinda low
8/27    Rising tide has not been lifting all boats:
        http://krugman.blogs.nytimes.com/2008/08/26/about-that-bush-boom
        \_ dems, always wanting a handout from highly productive republicans
           </sarcasm>
        \_ Look at all the whining people in the comments.  "Master's degrees
           from esteemed universities mean little ... journalism"  LAUGH.
	...
2008/7/30-8/5 [Politics/Domestic/President/Bush] UID:50741 Activity:high
7/31    Tell me again how City IS workers are all overpaid:
        http://www.sfgate.com/webdb/sfpay2008/?appSession=45817914602221
        \_ There are MANY governments in the US, from Alaska to Tenessee.
           It is like a FREE MARKET for government loving employees. They
           can go to any government branch they want, no one is FORCING them
           to go to SF government. Free market is at force, and therefore
	...
Cache (8192 bytes)
analyst.sourceforge.net
Please look at the 17 bottom of this page for more information. Suppose, for example, you wanted to find out what a particular company's Days of Sales Outstanding is. Without this program, you would have to retrieve the last four quarterly statements from the SEC's EDGAR database, find the balance sheets, average all the receivables listed under the current assets, find the income statements, add all the per-quarter revenues, divide the average receivables by the revenues, and multiply by 365. Now imagine doing this not only for DSO, but Days of Inventory Outstanding, Days of Payables Outstanding, Cash Conversion Cycle, capital turnover, gross profit margin, operational profit margin, net profit margin, net cash flow, flow ratio, return on invested capital, etc. Now imagine performing all those calculations, except you also want the history of all those measures quarter-by-quarter to spot trends. Maybe even compare the numbers to other companies in the same industry. The Analyst will automatically extract all statements for a given company from the EDGAR database, search each statement for the relevant numbers, and perform the fundamental calculations, leaving you the time to sift through the final analyses for your portfolio picks. All of these packages are released under the 19 GPL or the 20 LGPL . All of these packages are released under the 26 GPL or the 27 LGPL . Thanks to the power of open source and specifically the above projects, The Analyst was able to be written faster and with higher quality. Enter the name of the corporation you want to look at (not the ticker symbol). For example, the code for Learning Tree (LTRE) is 0001002037. If you need to go through an HTTP proxy, then after java but before -classpath, put -DproxyHost=host -DproxyPort=port , where host is the hostname or IP address of the HTTP proxy host, and port is the port that the HTTP proxy accepts requests on. This will download the forms to the current directory if they haven't already been downloaded. Extracting and saving income and balance information from all files You must have at least four, and usually five, quarters of statements downloaded. The Analyst will extract the income and balance information from all the downloaded forms, and the results, provided the forms are parsed by the income and balance experts properly, will be placed in a CSV file suitable for importing into spreadsheets. Performing simple fundamental analysis This is the most fun. This does the same thing as the previous command -- parsing those files which have not already been parsed and placing the results in a CSV file -- but a series of charts will also be displayed showing various useful fundamental measures for each quarter. The ratios displayed are based on four-quarter averages to eliminate cyclical effects. If you're not familiar with some of these measures, you can look at my 35 "Money Machine" web page , which gives my own wacky take on fundamental measures. For even more in-depth information, you will probably want to browse through 36 The Motley Fool . It is also used to collect data for the data mining training effort. Enter the name of the corporation you want to look at (not the ticker symbol). For example, the code for Learning Tree (LTRE) is 0001002037. If you need to go through an HTTP proxy, then after java but before -jar, put -DproxyHost=host -DproxyPort=port , where host is the hostname or IP address of the HTTP proxy host, and port is the port that the HTTP proxy accepts requests on. You can use the -t option to download more than one form at a time. This works well on fast connections but not very well on dialup connections. This will download the forms to the current directory if they haven't already been downloaded. EDGARGrabber: This section of the program requires a Central Index Key for the company whose statements you want to download. This directory consists of subdirectories, txt files, htm files, sgml files, and html files. The first things we are interested in are the sgml, htm, and html files. These are index files, which tell you what the corresponding txt file contains. The only difference between the three types of files is the format in which the information is kept, ha ha. We are interested in only three pieces of information from each index file: the form type, the fiscal year end, and the date of the statement. In addition, we are only interested in form types 10-Q, 10-QSB, 10QSB, 10-Q405, 10QSB40, 10-K, 10-KSB, 10KSB, 10-K405, and 10KSB40. The Q's are quarterly statements, and the K's are the annual statements. Once we have downloaded all the index files, filtered for the quarterly and annual statements, we can download the txt files corresponding to the index files. We can also rename the files to something more readable. Based on the fiscal year end and statement date, we can determine the fiscal year and quarter of the statement. The filenames sort lexicographically the way you would expect the statements to sort, making it easy to spot missing statements. Analyst: Once all the statements are downloaded, the Analyst portion of the program can get to work. We get to contend with inconsistent terminologies, misspellings, nonstandard table formats, differing row, column, and table orders, and data errors, ha ha. One wonders when there will be a standard reporting format. Or if there will ever be one: chances are the corporate accountants and CFO's wouldn't want a standard, since that would mean less opportunity to massage data for more favorable-appearing results, ha ha! There are three statements we are interested in: the income statement, the balance sheet, and the cash flow statement. For each statement, the Analyst follows the same general procedure: 1. Determine the number of columns in the statement, and the dates for each column 3. Parse the statement to extract the values into tabular form 5. Scan the numbers and row labels to guess the values for specific financial categories. Steps 1 through 3 are made a little more difficult since a given txt file can contain a pure-text representation of the statement, or an HTML representation, ha ha. The text representation is excruciating to parse, but the program manages it. Very few seemingly useful features in the text representation are reliable, such as <TABLE> table indicators or <C> column indicators. Most of the time they are either not present, or present in the wrong positions, ha ha. We just have to learn to parse the text without them -- and that's no ha ha. Locating the beginning and end of each statement We are looking for lines in the statement which indicate the beginning of a particular statement. The regular expressions are called income_start, balance_start, and cash_flow_start. We can't do that with an HTML representation, since the statement title is normally encased in HTML codes, so we dispense with that requirement. Note several of the features of the above regular expression. A given statement can be "CONSOLIDATED", "CONDENSED CONSOLIDATED ", or "COMBINED". It can be referred to in the singular (STATEMENT) or the plural (STATEMENTS). There can be junk between "CONSOLIDATED" and "STATEMENT ". Only by running the program on different files could we come up with this expression, and even then the expression isn't guaranteed to work with all files. If there is a file on which the expression does not work, then the expression needs to be modified to take that file into account. We don't want the expression to be too compressed, because otherwise it becomes difficult to modify. Once we find the beginning of a statement, we need to find the end of the statement, which we do in the same way, except with regular expressions income_end, balance_end, and cash_flow_end. These are somewhat more complicated than the corresponding start expressions, since the statements could come in any order, ha ha. Sometimes the statement finder can be fooled by an annual statement's table of contents or index of tables, ha ha. In this case we rely on the date finder or the table parser to report a bad statement, and then we can reject the bogus statement start to find the "real" statement sta...