Protected methods (by alaric)
In Object-oriented programming, things defined in a class can have one of (usually) three access levels: public, private, or protected. Public things are accessible to all users of the class, private things are only accessible from other things defined within the same class - and protected things are accessible from within the class and its subclasses.
However, I have noticed that protected methods are often rather insufficiently documented.
The problem is not the behaviour of the method itself, it's the nature of the abstraction layers within the class implementation. Namely, it's often not specified what will happen if you override the method. Do the other methods defined in the same class actually invoke the protected method where applicable, as they should, or do they go dipping into the private fields of the class to have their effect directly? If the latter, then overriding the protected method may not correctly alter the behaviour of the class as you might expect.
For example, a dodgy statistics toolkit might have a DataSet
class, with public methods to add a datum to the set, and to extract various means and deviations. Generally, most of these can be computed from a handful of properties such as the number of data in the set, their sum, the sum of their squares, the sum of their inverses, and so on; therefore, the author of DataSet
might provide protected methods to obtain these 'internal' sums, so that subclasses may add extra types of mean in future.
However, say you'd like to use these statistics tools, but it would be impractical for you to iterator through every datum telling the DataSet
about it. Perhaps there's a few million of them in an SQL database, so it's very easy for you to execute a query of the form:
SELECT COUNT(*), SUM(val), SUM(val*val), SUM(1.0/val) FROM table;
...and get the intermediate values immediately. The author of the DataSet
class made the mistake of tightly binding two independent things - a tally of various sums and a method to update them all for a new datum, and a bunch of methods that generate useful statistics from those sums, so you're forced to subclass DataSet
, ignore the addDatum(float datum)
method (and the private fields it uses to store the sums in), and override the protected getCount()
, getSum()
, getSumSquares()
, getSumInverses()
, and so on methods with ones that access your precomputed sums, so the public getArithmeticMean()
, getHarmonicMean()
, and getStandardDeviation()
methods can be used.
But until you test it, you've know way of knowing if getArithmeticMean()
actually calls the protected getCount()
and getSum()
, or just refers directly to the private fields updated by addDatum(float datum)
. getCount()
and friends might have been provided purely to let you define your own public getX()
methods, rather than intended as an override point to change the behaviour of the class.
And since the author of DataSet
is clearly - like most programmers - not the sort of person to have actually sat down and designed a propely future-proof class hierarchy, you've no way of telling if the fact they've failed to mark the protected methods as static, sealed, or final (depending on your language) is an intentional invitation to override them, or just laziness.
How can we make this better in practice? Is it worth worrying about, or should we just educate programmers to design better interfaces?
I don't know, myself, but I'm wondering if it'd be worth defining some kind of internal boundary in a class. A bunch of private fields, along with the public, private, and protected methods that touch them, could be sealed within such a boundary, and the compiler would enforce that access to private things could not cross such boundaries, and that private things could only be declared within such a boundary. In this case, the lazy DataSet
developer would be forced to create a boundary to enclose the private count
, sum
, etc. fields, and they'd be forced to put the public addDatum(float datum)
method as well as the protected getCount()
et al within the same boundary. But hopefully, this would jog their mind into thinking there's no reason to put getArithmeticMean()
within the same boundary, since it does not need direct access to the count
and sum
fields - it could go outside the boundary and use the protected getter methods instead. But as a user of the class, you would be able to see from its public and protected method list that getArithmeticMean()
was outside the boundary, and thus know that overriding the protected methods would correctly alter its behaviour, since it could not be directly accessing the private fields.
A bad programmer can still just create one boundary per class and put everything in, but the private-boundary system might discourage unnecessary access to private fields when protected or public getters and setters can be used instead, and expose useful information about dependencies to subclassers so they can tell what the effect of overriding a method is upon other methods they do not override.
By Alex B, Mon 4th Sep 2006 @ 7:15 pm
For more mind-bending complexity of 'protected' in Java, see http://www.cs.ru.nl/ftfjp/2004/Protected.pdf