analyst.sourceforge.net
Please look at the 17 bottom of this page for more information. Suppose, for example, you wanted to find out what a particular company's Days of Sales Outstanding is. Without this program, you would have to retrieve the last four quarterly statements from the SEC's EDGAR database, find the balance sheets, average all the receivables listed under the current assets, find the income statements, add all the per-quarter revenues, divide the average receivables by the revenues, and multiply by 365. Now imagine doing this not only for DSO, but Days of Inventory Outstanding, Days of Payables Outstanding, Cash Conversion Cycle, capital turnover, gross profit margin, operational profit margin, net profit margin, net cash flow, flow ratio, return on invested capital, etc. Now imagine performing all those calculations, except you also want the history of all those measures quarter-by-quarter to spot trends. Maybe even compare the numbers to other companies in the same industry. The Analyst will automatically extract all statements for a given company from the EDGAR database, search each statement for the relevant numbers, and perform the fundamental calculations, leaving you the time to sift through the final analyses for your portfolio picks. All of these packages are released under the 19 GPL or the 20 LGPL . All of these packages are released under the 26 GPL or the 27 LGPL . Thanks to the power of open source and specifically the above projects, The Analyst was able to be written faster and with higher quality. Enter the name of the corporation you want to look at (not the ticker symbol). For example, the code for Learning Tree (LTRE) is 0001002037. If you need to go through an HTTP proxy, then after java but before -classpath, put -DproxyHost=host -DproxyPort=port , where host is the hostname or IP address of the HTTP proxy host, and port is the port that the HTTP proxy accepts requests on. This will download the forms to the current directory if they haven't already been downloaded. Extracting and saving income and balance information from all files You must have at least four, and usually five, quarters of statements downloaded. The Analyst will extract the income and balance information from all the downloaded forms, and the results, provided the forms are parsed by the income and balance experts properly, will be placed in a CSV file suitable for importing into spreadsheets. Performing simple fundamental analysis This is the most fun. This does the same thing as the previous command -- parsing those files which have not already been parsed and placing the results in a CSV file -- but a series of charts will also be displayed showing various useful fundamental measures for each quarter. The ratios displayed are based on four-quarter averages to eliminate cyclical effects. If you're not familiar with some of these measures, you can look at my 35 "Money Machine" web page , which gives my own wacky take on fundamental measures. For even more in-depth information, you will probably want to browse through 36 The Motley Fool . It is also used to collect data for the data mining training effort. Enter the name of the corporation you want to look at (not the ticker symbol). For example, the code for Learning Tree (LTRE) is 0001002037. If you need to go through an HTTP proxy, then after java but before -jar, put -DproxyHost=host -DproxyPort=port , where host is the hostname or IP address of the HTTP proxy host, and port is the port that the HTTP proxy accepts requests on. You can use the -t option to download more than one form at a time. This works well on fast connections but not very well on dialup connections. This will download the forms to the current directory if they haven't already been downloaded. EDGARGrabber: This section of the program requires a Central Index Key for the company whose statements you want to download. This directory consists of subdirectories, txt files, htm files, sgml files, and html files. The first things we are interested in are the sgml, htm, and html files. These are index files, which tell you what the corresponding txt file contains. The only difference between the three types of files is the format in which the information is kept, ha ha. We are interested in only three pieces of information from each index file: the form type, the fiscal year end, and the date of the statement. In addition, we are only interested in form types 10-Q, 10-QSB, 10QSB, 10-Q405, 10QSB40, 10-K, 10-KSB, 10KSB, 10-K405, and 10KSB40. The Q's are quarterly statements, and the K's are the annual statements. Once we have downloaded all the index files, filtered for the quarterly and annual statements, we can download the txt files corresponding to the index files. We can also rename the files to something more readable. Based on the fiscal year end and statement date, we can determine the fiscal year and quarter of the statement. The filenames sort lexicographically the way you would expect the statements to sort, making it easy to spot missing statements. Analyst: Once all the statements are downloaded, the Analyst portion of the program can get to work. We get to contend with inconsistent terminologies, misspellings, nonstandard table formats, differing row, column, and table orders, and data errors, ha ha. One wonders when there will be a standard reporting format. Or if there will ever be one: chances are the corporate accountants and CFO's wouldn't want a standard, since that would mean less opportunity to massage data for more favorable-appearing results, ha ha! There are three statements we are interested in: the income statement, the balance sheet, and the cash flow statement. For each statement, the Analyst follows the same general procedure: 1. Determine the number of columns in the statement, and the dates for each column 3. Parse the statement to extract the values into tabular form 5. Scan the numbers and row labels to guess the values for specific financial categories. Steps 1 through 3 are made a little more difficult since a given txt file can contain a pure-text representation of the statement, or an HTML representation, ha ha. The text representation is excruciating to parse, but the program manages it. Very few seemingly useful features in the text representation are reliable, such as <TABLE> table indicators or <C> column indicators. Most of the time they are either not present, or present in the wrong positions, ha ha. We just have to learn to parse the text without them -- and that's no ha ha. Locating the beginning and end of each statement We are looking for lines in the statement which indicate the beginning of a particular statement. The regular expressions are called income_start, balance_start, and cash_flow_start. We can't do that with an HTML representation, since the statement title is normally encased in HTML codes, so we dispense with that requirement. Note several of the features of the above regular expression. A given statement can be "CONSOLIDATED", "CONDENSED CONSOLIDATED ", or "COMBINED". It can be referred to in the singular (STATEMENT) or the plural (STATEMENTS). There can be junk between "CONSOLIDATED" and "STATEMENT ". Only by running the program on different files could we come up with this expression, and even then the expression isn't guaranteed to work with all files. If there is a file on which the expression does not work, then the expression needs to be modified to take that file into account. We don't want the expression to be too compressed, because otherwise it becomes difficult to modify. Once we find the beginning of a statement, we need to find the end of the statement, which we do in the same way, except with regular expressions income_end, balance_end, and cash_flow_end. These are somewhat more complicated than the corresponding start expressions, since the statements could come in any order, ha ha. Sometimes the statement finder can be fooled by an annual statement's table of contents or index of tables, ha ha. In this case we rely on the date finder or the table parser to report a bad statement, and then we can reject the bogus statement start to find the "real" statement sta...
|