Scheme Boston: April 2003 meeting

Notes by Ed Kademan <kademan@phz.com>

The April 2003 meeting of Scheme Boston took place at Northeastern University Monday night April 14, 2003. Nine people were in attendance to hear Tudor Antoniu and Paul Steckler describe a Scheme program that studies Microsoft Excel spreadsheets and flags certain types of inconsistencies in the way the calculations are layed out.

Paul Steckler who talked first did not really address XeLda itself but rather the library that he developed that makes XeLda possible. This library---called MysterX---is a toolkit that allows Scheme to interact with Windows applications via the Component Object Model or COM. COM specifies protocols or interfaces that allow programs to communicate and is the glue that holds the Windows desktop together. Since almost all Microsoft programs expose and use COM interfaces and since COM is designed to be language independent, programmers can automate applications like Word and Excel in a variety of ways. The newer .NET architecture that Microsoft has also introduced has vaguely similar and maybe superior functionality but so far COM is keeping up with it in the sense that there exist compatibility wrappers enabling applications to harness .NET features using COM calls. In other words, COM is not going away any time soon.

Thanks to COM, Scheme/MysterX can instantiate classes, invoke methods belonging to those classes, and retrieve and set class properties. It also has "reflection" capabilities that make it possible for programs to query classes for their methods, properties, and types. MysterX simplifies the writing of handlers for the events that interactive classes can generate and uses COM's automation feature to ease the job of dispatching method calls. The bad news is that the sorts of values that one can send and accept through COM is limited to very simple things like strings, numbers, and arrays. Windows applications that want to invoke code in the Scheme library must send strings to be evaled, from an input form for example. Also the COM automation method of dispatch is very slow and inefficient. There is a faster but more cumbersome direct method that Paul mentioned as a target of future development. He has reported on the experiences that he had writing this library in Scheme Workshops for 2000 and 2001 and in Software Practice and Experience for 2002.

Tudor Antoniu started off by making the comments that computer spreadsheets are actually a type of functional programming language, that they are very widely used, and that there is little or nothing in the way of diagnostic tools to insure their consistency or meaningfulness, even though the models built in them are often quite complex. Most of these models come from the financial world or the physical sciences where it is natural to associate units (dollars, meters, seconds, etc.) with the cell values. An obvious way to make this association is to use the optional comment fields that Microsoft Excel can attach to its cells. And once these comment fields contain unit information XeLda can analyze them.

XeLda has three components, the COM interface layer that uses MysterX, a graphical user interface, and the unit checker that does the real work. To begin with, one of the simplest but most useful features of XeLda is its ability to easily add or edit units in entire sections of spreadsheets. Any cell containing a simple value without a unit is considered dimensionless. A cell containing a formula might or might not have a unit associated with it but if it does XeLda can check that that unit is consistent with the units of the other cells entering into the calculation.

XeLda can detect at least three types of errors

  1. Computation Errors. These result from formulas whose inputs have inconsistent units. For example, a formula that added a dimensionless value to one with a unit would suffer this type of error, and XeLda would color its cell green.
  2. Propagation Errors. These result when one cell references another cell that was found to be in error. XeLda colors propagation error cells purple.
  3. Mismatch Errors. These are the sorts of errors described above that occur in those cells whose annotated formulas specify units that do not agree with what XeLda determines to be the correct units based on the inputs. These cells are blue.

A critical component of XeLda is the set of unit transformer rules that stipulate what the unit of a function's output should be based on the units of its inputs. XeLda represents units as combinations of unit name/integer exponent pairs or as errors, and it considers formulas to be either simple values, cell references, formulas joined together by operators, or function invocations. With this base the unit checker can go a long way toward checking a spreadsheet. However there are two serious complications.

First, Excel allows for embedded tables and matrices. Tables don't necessarily describe formulas explicitly but only indirectly so that the program has to trace back to find them. Matrices complicate the whole notion of propagating errors since for operations like matrix multiplication the unit checker has to make sure all the units in a row or column are uniform.

Second, Excel allows for circular formulas. Solving this problem requires resorting to a type of unification in which the unit checker

  1. generates constraints containing unit variables and unit expressions,
  2. builds equivalence classes from these constraints,
  3. maps the simplified constraints into a system of linear equations and solves it.

Tudor gave a little demonstration on some spreadsheets and described the results of running XeLda on a larger collection that had been published in the literature. On the whole the program behaves in a way that an Excel user would probably find understandable. In addition to coloring the cells according to their error types XeLda can also paint cells red in order to highlight dependency relationships among them. (In other words it is easy for a user to identify all the cells that contribute values to a single specific one.)

The system is not complete yet. Tasks that still need addressing are

  1. Identifying more unit transformer rules. Excel has many hundreds of functions and XeLda does not have rules for all of them yet.
  2. Identifying ways of making the program easier to use.
  3. Automatically converting between systems of units. (centimeters to inches for example)
  4. Making the program faster. The unit checker itself is fast but the COM automation feature mentioned above that MysterX uses makes XeLda intolerably slow for large spreadsheets.