November 14, 2011

New Version of Anti-patterns Parser

I have a new version of the anti-patterns parser.

New Features
1. Rules Library: Anti-patterns rules have been moved out of the parser to an external file. Users can therefore add, modify and delete rules to the library without the inconvenience of having to touch the parser. This can also help individuals customize and eliminate false negatives that the parser usually catches (while being generic).

2. Hints: The old parser could only identify anti-patterns in Perl and Shell scripts. This was difficult for users new to the tool and especially for non-experts of Perl and Shell to understand and act upon the output. The new version provides one or more hints to enable fixing each anti-pattern (partly based on Programming Anti-patterns in Perl and Programming Anti-patterns in Shell). For a complete idea, however, the user may still have to refer the and scripts.

3. Level: Each anti-pattern also has a level associated with it -- IGNORE, WARN, ERROR. This is currently not being used by the parser itself, but it will hopefully help users to parse for anti-patterns in severity levels of their interest.

The new version also benefits from a few bug fixes, dead code removals, and miscellaneous improvements. Most of all, I finally learnt how to apply Perl's map and nested map, yay! I think the parser is more readable now, and I hope I haven't abused map. You tell me.

An anti-pattern rule can be defined using the keywords LANG, REGEX, HINT and LEVEL. Lines not starting with a keyword are ignored as comments, and anti-patterns should be separated by comments. LANG can either be "perl" or "shell". REGEX is a Perl regular expression. LANG and REGEX are required keywords. HINT and LEVEL are optional. The anti-patterns parser doesn't try to check for errors in the rules library, so the user will have to take care of that aspect.

Because of the addition of a rules library, and the option for users to use their own custom library, the parser now needs an extra input.
$ ./  file|dir|pkg

I have deleted all previous files related to anti-patterns. You can download the latest zipped folder here. It contains updated versions of,, and a new anti-patterns.lib. I am inexplicably hesitant about hosting this on Github. Please note that this is being released under MIT License, thanks to the approval of Symantec Corporation.

The feature of suggesting one or more hints comes at a price: performance. Because the parser should now check against every anti-pattern, and not break immediately after finding the first one, the parser is several times slower than the old version. This is noticeable while running the parser on a large directory or package, though it may not be while running on a file. By the time I figure out a solution to this (please help!), if you want to disable multiple hints (and are satisfied by just one hint), comment out the only "else" blocks that are part of parsePerl and parseShell subroutines.

The new parser hasn't yet been extensively tested, so there are bound to be bugs. Feel free to report bugs, and suggest new anti-patterns and features. Thank you in advance.

All credit for this release goes to Rocky Ren, a Symantec colleague. Many people have made feature requests, but Rocky went several steps further. He not only requested specific features, but also suggested how those features could be implemented -- using his own prototype -- making me guilty enough to include his suggestions. The rules file and its format are also entirely his idea. Many thanks to Rocky.

June 23, 2011

The Fuss About Programming Anti-patterns

Andrew Koenig started talking about programming anti-patterns more than 15 years ago. It probably never got as popular as it should have been because it addresses the mistakes in syntactically correct software and we can imagine that the number of wrong ways to do something might be an inexhaustible list compared to the right ways. I still think it is a useful concept. Even experts are known to make mistakes, all the time. Majority of the performance enhancements I came across in my limited experience in fact fall under bug fixes, whether it is because the solution wasn't designed well enough or wasn't implemented well enough.

Given a system, improving its performance in general corresponds to improving the performance of its slowest component. This needs the identification of the slowest component to begin with using profiling at some level. Profiling, being time-consuming, isn't always done until a system (an operation on the system) feels poorly performant, and such feelings get mixed up and get slower to perceive as the thickness (complexity) of a system increases.

For this reason I like the approach of checking for known programming anti-patterns using static code analysis. Static code analysis may get slower than build times, but is faster than test suite execution on built images. For now. The day the list of anti-patterns grows infinitely long, and along with it the static code analysis cycle time, the approach won't be feasible (though tiering of anti-patterns based on severity and history of occurrences seems promising).

Hence my interest in static code analyzers like Coverity and FindBugs, though I am yet to explore them well enough to actually know them. I am not aware of any major work along similar lines for Shell and Perl. I think it's worth exploring because there's a significant amount of software written in them (definitely in Symantec Corporation) and they're already slower than compiled languages.

Shell Scripts
The famous todo software is fully written in Bash. It has a lot of anti-patterns, a few of them being various usages of "echo | grep", "echo | sed", "echo | tr" itself. The longest piping anti-pattern in is of the form: "echo | sed | eval sort | sed | awk | sed | eval cat". It takes a list of todo items (echo), pads them appropriately with leading zeroes (sed), sorts them (eval sort), color codes the done items (sed), does something more related to color coding (awk), nullifies a few strings (sed), and then gets the final list (eval cat usually). Doesn't seem like the most natural way to do. If nothing, and if my superficial understanding isn't totally off, all the seds can be merged into the awk.

The Shell scripts that are part of the Cygwin installation on my computer have over a hundred anti-patterns of type "echo | sed" alone, and another hundred of "expr", without having all the packages. Bash Completion is one of the packages which might benefit significantly from fixing Shell anti-patterns, and it might translate into better response times. e.g. Command completion for "gcc --" with gcc v3.4.4-999 and bash-completion v1.3-1 creates 12 new processes, 6 of them due to a sed, sort and tr and their prerequisite bashes.

Perl Scripts
Fedora users will be familiar with dvd::rip, largely written in Perl. Even if one-tenth of its $command lines refer to Unix commands, they account a large number of anti-patterns. I'm not familiar with the command-line utilities related to multimedia, but there seem to be several avoidable usages of "cd", "convert", "echo", "ffmpeg", "ls", "mkdir", "rm", "umask", "which". (I've not gone through the source code, but glanced at the parsed code -- which I'll soon come to -- so I'm likely to be mistaken.)

On Solaris, the SUNWwebminu package (11.10.0 version that shipped with Solaris 10) has a surprising number and a wide variety of Perl anti-patterns using -- cat, chown, cd, cp, echo, find, grep, hostname, mv, ps, pwd, rm, rsh, sed, ssh, uname -- you name it. It could be the package that benefits the most from an overhaul in this direction.

The above mentions are only examples of programming anti-patterns out there in the vast universe of software that is being written, shipped, used. That is understandably because even experts are known to make mistakes, all the time. What we need to work on are mechanisms that can minimize those.

Below is a table of counts of common Unix commands found in scripts across various OS installations that I had access to. They are incomplete and likely full of false negatives, and the OSs were not full installations. I'm sure you understand my preference to not hit you with the versions, package names, their versions, etc. They don't mean much. They don't mean nothing either.

AIX PerlAIX ShellCygwin PerlCygwin ShellHPUX PerlHPUX ShellRHEL PerlRHEL ShellSolaris PerlSolaris Shell
Total Files2880Total Files1832Total Files2361Total Files1563Total Files7630Total Files7533Total Files1262Total Files2518Total Files5600Total Files5271
Size (MB)27Size (MB)13.5Size (MB)25Size (MB)8.5Size (MB)57.75Size (MB)84.25Size (MB)11.5Size (MB)13Size (MB)28.25Size (MB)20.5

Anti-patterns Parser
Symantec Corporation gave me permission to share this study with the community. Here I am with a hope to widen this discussion and learn something. As part of it, is a dirty parser that I wrote and extensively used along the way, and somehow embarrassed of. I'm sharing it only to convey a better idea of how easy it is to catch some of the programming anti-patterns. Before using it, understand that the parser is very incomplete, incorrect (defined how?), without any warranties or guarantees, and all that cal. I won't even recommend using it, but do take a look.

From all the code that I've read so far, I can see this barely scratches the surface. Apart from continuing to find and add programming anti-patterns in Shell and Perl, my next steps are to move on to Java and C in line with my personal and company interests. Please point me in possible directions and reach out to me if you share my interests.

June 10, 2011

Programming Anti-patterns in Perl

I will spare you with another round of the first half of this post and jump straight to the table of anti-patterns and alternatives in Perl. Check out for detailed examples. These barely scratch the surface for a rich language like Perl, whose motto itself is TMTOWTDI. My Perl knowledge is still amateurish and these concentrate almost exclusively on the "Accessing memory << Calling a function << Forking a process" guideline. This might be useful for anybody switching from Shell to Perl. So I hope you'll have a lot more to contribute.

30th June, 2011: TimeRatio is the ratio of time taken by Alternative to that by Anti-pattern, as taken from two different trials, using Perl 5.8 on Solaris 10 (Sun T5120). I hope these time ratios will highlight better why especially some of the anti-patterns are to be avoided. My suggestion is to not take these numbers on face value.

awkopen, split, close0.780.79
catopen, close28.4828.72
cpuse File::Copy "cp"1.851.87
chmodchmod (Perl)335.75343.30
finduse File::Find0.550.55
grepopen, grep (Perl), close2.562.57
headopen, break, close42.1242.60
hostnameuse Sys::Hostname10361.9410451.25
killkill (Perl)55.0450.84
ln -ssymlink1010.70995.73
lsopendir, closedir32.8724.17
mkdirmkdir (Perl)13.2112.93
mkdir -puse File::Path9.859.87
mvuse File::Copy "move"32.5332.65
pinguse Net::Ping1.040.65
ps -elfuse Proc::ProcessTable3.843.84
rmdirrmdir (Perl)20.1319.88
rm -ruse File::Path9.009.02
sedopen, s/find/replace/, close25.5425.76
sleepsleep (Perl)2825.822824.30
sortopen, sort (Perl), close4.314.18
tailopen, close2.712.69
touchopen, close65.4165.41
umaskumask (Perl)7749.437742.40
unameuse Config13662.5613652.62
uniqopen, unless seen, close2.332.35
wc -lopen, close6.796.82