Parametric Modeling Using High-Level Objects (HLOs)
At the core, there are fundamentally three approaches to estimating:
- Bottom up3, in which the project is decomposed to individual components that can be estimated by some other means, often expert judgment for labor.
- Analogy, in which historic budgets or actual expenditures are used as a basis for current estimates. These numbers are normally adjusted to account for known differences between the historic project and the current project, including as a minimum an allowance for inflation based on the relative timeframes;
- Parametric, in which models are used to forecast cost based on some representation of project size combined with adjusting factors.
Traditional Project Management guidance is that Bottom up is the most accurate; followed by Analogy; followed by Parametric. Whether this is true is highly dependent on the maturity of the parametric models in the domain you are trying to estimate. For example, in the software domain we have had an opportunity to track budgets versus actuals for more than 12,000 projects that were estimated using the above three approaches. What we have found is quite the opposite.
In the software domain, parametric is the most accurate, followed by analogy, followed by bottom up. The standard deviation of the estimate for parametric estimates is 55% smaller than estimates by analogy and 64% smaller than bottom up estimates. These ratios hold more-or-less true for estimates prepared at any stages of the project lifecycle (budgeting, feasibility, planning). A value that is often more critical, at least when dealing with project portfolios or clusters of estimates for a given scope change, is estimation bias. If you look at a statistically significant sample of estimates (e.g., 20 projects in a portfolio), and you total up both the estimates and the actuals for that collection, the bias is the difference between the numbers. With parametric estimates and a properly calibrated model, this bias approaches zero (we consistently see it under 5% for large organizations, with a random direction). With estimates by analogy, this number is typically closer to 10%, also with a random bias. But with Bottom Up estimates, this number is typically between 15% and 20% with a bias toward under-estimating. In the remainder of this section we’ll discuss parametric estimation in more detail.
Figure 1: Core Estimating Concept
As shown in Figure 1, the core requirements for effective parametric estimation in any domain are relatively simple4. Step one in the process is to identify one or more high level objects (HLOs) that have a direct correlation with effort. The HLOs that are appropriate are domain specific, although there is sometimes an overlap. Examples of HLOs include yards of carpet to lay, reports to create, help desk calls to field, or claims to process. In activity based costing (ABC), these would be the cost drivers. HLOs are often assigned a value based on their relative implementation difficulty, thereby allowing them to be totaled into a single numeric value. An example is function points, which are a total of the values for the function point HLOs (EQ, EI, EO, ILF, and EIF). Don’t worry if you’re not familiar with those terms, it’s the idea that they represent something that’s important here.
HLOs may have an assigned complexity or other defining characteristics that cause an adjustment in effort (e.g., simple report versus average report). It’s also typically necessary to have a technique for managing work that involves new development, modifications or extensions of existing components, or testing/validation only of existing components. Various formulas or simplifying assumptions may be used for this purpose. For example, in the case of reuse, the original COCOMO I model5 reduced the HLO size to:
HLO = HLO * (. 4DM + .3CM + .3IT )
where DM is the percent design modification (1% to 100%); CM is the percent code modification (1% to 100%); and IT is the percent integration and test effort (1% to 100%).
Step two is to define adjusting variables that impact either on productivity, or on economies (or diseconomies) of scale. The productivity variables tend to be things like the characteristics of the labor who will be performing the work or the tools they will be working with; characteristics of the products to be created (e.g., quality tolerance) or the project used to create them; and characteristics of the environment in which the work will be performed. The variables that impact on economies or diseconomies of scale are typically things that drive the necessity for communication/coordination, and the efficiency of those activities. These adjusting variables are important both to improve the accuracy of any given estimate, and also to normalize data to support benchmarking across companies or between application areas.
Step three involves defining productivity curves. These are curves that allow a conversion between adjusted HLO sizing counts and resultant effort. They are typically curves (versus lines) because of the economies or diseconomies of scale that are present. Curves may be determined empirically or approximated using industry standard data for similar domains. Curves may also be adjusted based on the degree to which the project is rushed. In any event, procedures are put in place to collect the necessary data to support periodic adjustment of the curves to match observed results, a process called calibration.
The outputs of the process are driven by the needs of the organization. These outputs can be broken down into three major categories:
- Cost (or effort, which is equivalent for this purpose): In addition to the obvious total value, most organizations are interested in some form of breakdown. Typical breakdowns include breakdowns by organizational unit for budgetary or resource planning purposes; breakdowns by type of money from a GAAP perspective (e.g., opex versus capex); or breakdown by WBS elements in a project plan. These outputs will also typically include labor needed over time, broken down by labor category. These outputs are generated using a top down allocation.
- Non-Cost Outputs: Non-cost outputs are quantitative predictions of either intermediate work product size, or non-cost deliverable components. Examples include the number of test cases (perhaps broken down by type), the engineering documents created with page counts, the number of use-case scenarios to be created, or the estimated help desk calls broken down by category. These outputs are typically created using curves similar to the productivity curves, operating either on the HLOs or on the total project effort.
- Lifecycle Costs: If the estimate is for a product to be created, delivered, and accepted then the cost and non-cost items above would typically cover the period through acceptance. In most cases there would then be an on-going cost to support and maintain the delivered product throughout its lifecycle. These support costs are relatively predictable both in terms of the support activities that are required and the curves that define the effort involved. For many of them, the effort will be high immediately following acceptance, drop off over the course of one to three years to a low plateau, then climb again as the product nears the end of its design life.
In the next few sections of this article we’ll focus on the application of parametric modeling techniques to IGCEs in support of the three procurement phases.