Anthony@SpreadsheetDetective.com

Peter Hoare, KPMG

Published in

The example in exhibit 1 has been used to model the profitability of producing a product, and the Excel Solver has then been used to determine the optimum Price. This would be difficult to do accurately using paper based calculations or calculus. Based on this model one can confidently begin a marketing campaign advertising a price of $31.50. One could then just as confidently sustain a significant loss because like many spreadsheet models, this one contains serious errors!

Exhibit 1

Errors can be hard to find, as illustrated by Galletta et al 1996 who performed an experiment in which 113 MBA students were asked to validate a single model seeded with eight errors, but on average only half the errors were identified. Galletta also noted that subjects were more accurate when working with printed reports rather than debugging spreadsheets interactively. Davis & Ikin 1987 audited 19 operational models and found that 4 had serious errors despite the fact that the subjects were very confident that the spreadsheets were error free. Panko 1998, also performed an experiment in which subjects stated they considered that there was an 18% chance that their completed spreadsheets contained errors, while in fact 86% contained errors. Moreover, there was no correlation between each subject's level of confidence and the integrity of their completed spreadsheets.

This overconfidence is perhaps the most serious aspect of spreadsheet errors because it reduces the extent to which people validate their models before using them to make important decisions. Mistakes that make large changes to output values are relatively easy detect. It is the errors that lead to subtle but important changes that are most dangerous because they can be very hard to detect but still lead to seriously flawed conclusions.

Another common and subtle source of error results from formulas that have not been copied consistently throughout a spreadsheet. This can checked in Excel by using the Go To / Special dialog or the "Ctrl+\" and "Ctrl+|" keys to select cells that are inconsistent with the first column (or row) within a range of cells. Exhibit 2 shows the result of selecting the range C5:F11 and then pressing "Ctrl+\". This shows that cell F9 is inconsistent with cell C9. Excel has also selected the range D11:F11 even though these cells are consistent because they are not consistent with the empty cell C11, and so this range needs to be checked separately by first selecting D11:F11 and then pressing "Ctrl+\" again. Likewise the range G6:G10 needs to be selected before "Ctrl+|" is pressed. Applying this option repeatedly throughout a spreadsheet and interpreting the somewhat cryptic results is tedious, but inconsistent formulas are a potentially serious source of error.

Exhibit 2

Another problem in larger spreadsheets it that it can be difficult to validate "A1" cell references. In the example, it is difficult to understand the formula "=Price - Prod!E56" in cell F2 because "E56" does not describe the meaning of the cell being referred to. To address this problem, a range may be given a Name which can then be used within the formulas. In the example, the Name "Price" has been defined to refer to the range C2, and then the Name "Price" has been used in the formula in cell F2. The use of Names makes formulas easier to understand but can be very misleading if a Name's definition is incorrect. In the example, cell C10 contains the formula "= C7 - PlantEquip - C9", but the Name PlantEquip actually refers to row 9 and not row 8. For this reason Names should only be used for cells that are referenced by several different formulas, and then the definitions of these Names needs to be checked carefully.

Excel 97 has also introduced a new "natural language" capability which enables cell labels such as "Plant & Equipment" to be entered directly into formulas. However, this requires Excel to use heuristics to infer which cells are actually being referenced, which can occasionally produce misleading results. We therefore recommend that if this feature is used at all that extreme care be taken.

The following guidelines should also be used when writing spreadsheets:-

- Never place constants in formulas. In the example the formula in C6 is "=C5*(1000-20*Price)", which hides the assumptions about the elasticity of the market and so makes it more difficult for others to use or review the model.
- Follow a simple input-output structure. Real world businesses convert inputs to outputs and deliver a financial result. Likewise, a financial model should be structured to utilize business inputs and emulate the business processes to produce the financial outputs. For example, inputs may include market data, production data, overhead drivers to produce the model outputs of sales, operating costs and earnings.
- Repeat important input values from other worksheets. In the example, a cell labeled "Unit Cost" could have been included with the formula "= Prod!E56", and then this cell could be referenced by the Markup formula. This would make it easier to understand how the Markup was calculated, and means that the printed worksheet tells a more complete story.
- Break up long, complex formulas into small chunks that can be verified independently. In the example, it would be possible to calculate the Gross Profit in one step as "= C5 * (1000 - 20 * Price) * (Price - Prod!B56), but this is difficult to follow. If adding intermediate results makes a worksheet too big for a report, then create a second summary worksheet which just contains references to the important cells in primary worksheet that calculates them.
- Avoid having formulas that refer to other cells below them or to their right. Apart from avoiding circular references, this clarifies the meaning of the spreadsheet by suggesting the dependency relationships between cells.
- Document important formulas in the worksheet. In the example, the Units Sold formula has been shown explicitly. However, care must be taken that the documentation actually reflects the formulas, otherwise it can be extremely misleading. The example also demonstrates the use of "+"s and "-"s to the right of the labels to show how the net profit was calculated.
- Add redundant verification formulas to the model. In the example, cell G12 contains the validation formula "=IF(G7 - G8 - G9 <> G10, "ERROR", "ok")" which highlights one of the formula errors in this model.
- Collect redundant input values and use validation formulas to check the results. In the example, the user is expected to redundantly enter the total cost for Plant & Equipment and the formula in cell H8 checks that this is consistent. Excel 97 enables additional validation formulas and conditional formatting formulas to be added to cells, but these cannot be easily seen in the formula bar and so can add unnecessary complexity.
- Be careful when using the Solver because it works like a blind man trying to feel his way to the top of a mountain and so can think that it has reached the top when in fact it has only reached the top of a foothill along the way.

Exhibit 3

Exhibit 4 shows annotations that describe the formulas more accurately. The blue text in each red box contains the text of the formula in the corresponding cell, thus cell C9 contains the formula " =30%*C7". The dots and lines show how the formula has been copied throughout the spreadsheet, so cell E9 contains " =30%*E9" but F9 contains a different formula. The circle in E7 indicates that it does not contain a formula, while the lines with "R"s highlights the fact that the ranges being used by the Sum in cell G6 does not include C6.

A1 references have also been clarified by inserting abbreviations of labels that describe them in the annotation. Thus the annotation in cell C9 is actually "= 30% * C7`GrossProfit". The green annotation in B9 indicates that the Named range "PlantEquip" has been defined to be the range C9:G9, which makes it easy to see the error with this definition.

The annotated spreadsheet can be printed and soberly reviewed. User experience with FormScheme indicates that in many instances review time can be dramatically reduced (by the order of one half to two thirds of the time) and the quality of the review enhanced.

Exhibit 4

An appropriate response to these risks is to:-

- develop and strictly follow good practice modelling protocols, a number of which have been outlined in this article;
- require an independent review by someone equipped to do so; and
- consider the use of software tools to improve the effectiveness of any review.

Freeman, D. 1996. How to Make Spreadsheets Error-Proof, Journal of Accountancy, Vol 181 Nr 5 May 1996.

Davis, N. and Ikin, C. 1987. Auditing Spreadsheets, Australian Accountant Dec 1987.

Galletta D.; Hartzel, K.; Johnson, S.; Joseph, J. and Rustagi, S. 1996. An Experimental Study of Spreadsheet Presentation and Error Detection. Proc. Hawaii Intl. Conf. on System Sciences.

Dr Anthony Berglas has a doctorate in computer science which investigated technology for end user computing, and can be contacted by E-Mail at ABerglas@uq.net.au.

Peter Hoare, B Bus MBA ACA ASIA is a senior consultant with KPMG Management Consulting.