Test whether using the template parameter makes any difference to speed, and rationalise which units which do and don't need them. Template objects are messy when you try to use pointers to them, you have to include the whole template shebang in the pointer handling.
Possible option for output to R/2R DAC circuit, like http://blog.makezine.com/2008/05/29/makeit-protodac-shield-fo/ This would limit dynamic range to 8 bit, but would remove the 16384Hz pwm carrier frequency noise which can be a problem in some applications, requiring filtering to remove (see the Mozzi wiki for filter schematics).
have a parameter to set whether it's single or repeating, so start doesn't have to be called for repeats. Pro: simpler user programming. Con: would require an if..then every time ready() is called.
Find out if there is a way to calculate and predict the output range for a particular input range and filter length, whether it can be used precisely or requires experimenting to find useful values in each application. Specialise templates for unsigned types.
Check if 8 bit templates can work efficiently with a higher res smoothness - as is they don't have enough resolution to work well at audio rate. See if Line might be more useful in most cases.