November 14, 2011

New Version of Anti-patterns Parser


I have a new version of the anti-patterns parser.

New Features
1. Rules Library: Anti-patterns rules have been moved out of the parser to an external file. Users can therefore add, modify and delete rules to the library without the inconvenience of having to touch the parser. This can also help individuals customize and eliminate false negatives that the parser usually catches (while being generic).

2. Hints: The old parser could only identify anti-patterns in Perl and Shell scripts. This was difficult for users new to the tool and especially for non-experts of Perl and Shell to understand and act upon the output. The new version provides one or more hints to enable fixing each anti-pattern (partly based on Programming Anti-patterns in Perl and Programming Anti-patterns in Shell). For a complete idea, however, the user may still have to refer the alternatives.pl and alternatives.sh scripts.

3. Level: Each anti-pattern also has a level associated with it -- IGNORE, WARN, ERROR. This is currently not being used by the parser itself, but it will hopefully help users to parse for anti-patterns in severity levels of their interest.

The new version also benefits from a few bug fixes, dead code removals, and miscellaneous improvements. Most of all, I finally learnt how to apply Perl's map and nested map, yay! I think the parser is more readable now, and I hope I haven't abused map. You tell me.

Rules
An anti-pattern rule can be defined using the keywords LANG, REGEX, HINT and LEVEL. Lines not starting with a keyword are ignored as comments, and anti-patterns should be separated by comments. LANG can either be "perl" or "shell". REGEX is a Perl regular expression. LANG and REGEX are required keywords. HINT and LEVEL are optional. The anti-patterns parser doesn't try to check for errors in the rules library, so the user will have to take care of that aspect.

Usage
Because of the addition of a rules library, and the option for users to use their own custom library, the parser now needs an extra input.
$ ./anti-patterns.pl  file|dir|pkg

Download
I have deleted all previous files related to anti-patterns. You can download the latest zipped folder here. It contains updated versions of anti-patterns.pl, alternatives.pl, alternatives.sh and a new anti-patterns.lib. I am inexplicably hesitant about hosting this on Github. Please note that this is being released under MIT License, thanks to the approval of Symantec Corporation.

Notes
The feature of suggesting one or more hints comes at a price: performance. Because the parser should now check against every anti-pattern, and not break immediately after finding the first one, the parser is several times slower than the old version. This is noticeable while running the parser on a large directory or package, though it may not be while running on a file. By the time I figure out a solution to this (please help!), if you want to disable multiple hints (and are satisfied by just one hint), comment out the only "else" blocks that are part of parsePerl and parseShell subroutines.

The new parser hasn't yet been extensively tested, so there are bound to be bugs. Feel free to report bugs, and suggest new anti-patterns and features. Thank you in advance.

Credits
All credit for this release goes to Rocky Ren, a Symantec colleague. Many people have made feature requests, but Rocky went several steps further. He not only requested specific features, but also suggested how those features could be implemented -- using his own prototype -- making me guilty enough to include his suggestions. The rules file and its format are also entirely his idea. Many thanks to Rocky.

June 23, 2011

The Fuss About Programming Anti-patterns

Andrew Koenig started talking about programming anti-patterns more than 15 years ago. It probably never got as popular as it should have been because it addresses the mistakes in syntactically correct software and we can imagine that the number of wrong ways to do something might be an inexhaustible list compared to the right ways. I still think it is a useful concept. Even experts are known to make mistakes, all the time. Majority of the performance enhancements I came across in my limited experience in fact fall under bug fixes, whether it is because the solution wasn't designed well enough or wasn't implemented well enough.

Given a system, improving its performance in general corresponds to improving the performance of its slowest component. This needs the identification of the slowest component to begin with using profiling at some level. Profiling, being time-consuming, isn't always done until a system (an operation on the system) feels poorly performant, and such feelings get mixed up and get slower to perceive as the thickness (complexity) of a system increases.

For this reason I like the approach of checking for known programming anti-patterns using static code analysis. Static code analysis may get slower than build times, but is faster than test suite execution on built images. For now. The day the list of anti-patterns grows infinitely long, and along with it the static code analysis cycle time, the approach won't be feasible (though tiering of anti-patterns based on severity and history of occurrences seems promising).

Hence my interest in static code analyzers like Coverity and FindBugs, though I am yet to explore them well enough to actually know them. I am not aware of any major work along similar lines for Shell and Perl. I think it's worth exploring because there's a significant amount of software written in them (definitely in Symantec Corporation) and they're already slower than compiled languages.

Shell Scripts
The famous todo software is fully written in Bash. It has a lot of anti-patterns, a few of them being various usages of "echo | grep", "echo | sed", "echo | tr" itself. The longest piping anti-pattern in todo.sh is of the form: "echo | sed | eval sort | sed | awk | sed | eval cat". It takes a list of todo items (echo), pads them appropriately with leading zeroes (sed), sorts them (eval sort), color codes the done items (sed), does something more related to color coding (awk), nullifies a few strings (sed), and then gets the final list (eval cat usually). Doesn't seem like the most natural way to do. If nothing, and if my superficial understanding isn't totally off, all the seds can be merged into the awk.

The Shell scripts that are part of the Cygwin installation on my computer have over a hundred anti-patterns of type "echo | sed" alone, and another hundred of "expr", without having all the packages. Bash Completion is one of the packages which might benefit significantly from fixing Shell anti-patterns, and it might translate into better response times. e.g. Command completion for "gcc --" with gcc v3.4.4-999 and bash-completion v1.3-1 creates 12 new processes, 6 of them due to a sed, sort and tr and their prerequisite bashes.

Perl Scripts
Fedora users will be familiar with dvd::rip, largely written in Perl. Even if one-tenth of its $command lines refer to Unix commands, they account a large number of anti-patterns. I'm not familiar with the command-line utilities related to multimedia, but there seem to be several avoidable usages of "cd", "convert", "echo", "ffmpeg", "ls", "mkdir", "rm", "umask", "which". (I've not gone through the source code, but glanced at the parsed code -- which I'll soon come to -- so I'm likely to be mistaken.)

On Solaris, the SUNWwebminu package (11.10.0 version that shipped with Solaris 10) has a surprising number and a wide variety of Perl anti-patterns using -- cat, chown, cd, cp, echo, find, grep, hostname, mv, ps, pwd, rm, rsh, sed, ssh, uname -- you name it. It could be the package that benefits the most from an overhaul in this direction.

The above mentions are only examples of programming anti-patterns out there in the vast universe of software that is being written, shipped, used. That is understandably because even experts are known to make mistakes, all the time. What we need to work on are mechanisms that can minimize those.

Below is a table of counts of common Unix commands found in scripts across various OS installations that I had access to. They are incomplete and likely full of false negatives, and the OSs were not full installations. I'm sure you understand my preference to not hit you with the versions, package names, their versions, etc. They don't mean much. They don't mean nothing either.

AIX PerlAIX ShellCygwin PerlCygwin ShellHPUX PerlHPUX ShellRHEL PerlRHEL ShellSolaris PerlSolaris Shell
Packages25Packages103Packages19Packages95Packages27Packages56Packages86Packages175Packages42Packages227
Total Files2880Total Files1832Total Files2361Total Files1563Total Files7630Total Files7533Total Files1262Total Files2518Total Files5600Total Files5271
Size (MB)27Size (MB)13.5Size (MB)25Size (MB)8.5Size (MB)57.75Size (MB)84.25Size (MB)11.5Size (MB)13Size (MB)28.25Size (MB)20.5
Files421Files759Files285Files403Files796Files692Files313Files673Files684Files963
Total1935Total20684Total257Total2370Total841Total23137Total255Total9733Total1566Total13907
CommandCountCommandCountCommandCountCommandCountCommandCountCommandCountCommandCountCommandCountCommandCountCommandCount
/dev/null410/dev/null5281/dev/null89/dev/null531/dev/null248echo5776/dev/null105echo2539/dev/null1088echo2823
grep326cat5105cat41sed381cat105cat3948cat26/dev/null2037rsh190/dev/null2605
echo137grep2607pwd30echo371echo85sed3869ifconfig22sed1582mount167grep1431
rm99echo2414echo27cat226rm81/dev/null3628echo20grep1232cat161sed1419
mkdir93awk2383cp15basename193grep50grep2349find18cat1112cd131cat1224
hostname92cut1348hostname12grep185hostname46awk1011uname15awk281umount100expr742
ls89sed1087grep12awk137pwd44eval977ls14uname257uname73awk733
awk85rm579date12expr116cp44expr607grep14expr244df72cut528
cp81expr466cd12dirname68find41uname401pwd12eval237ps61basename466
uname71ls344chmod11sort62ps31basename389ps12sort211rm59uname406
date69sort290bc10uname59uname29rm377cp12ls181ifconfig59cd327
cat62egrep290ls9ssh52cd27cut363date11cut175hostname59dirname310
cd58basename277rm8date50ls18egrep324sort8basename174pwd39ls305
find55cp233find8cut48eval18ps262hostname7find161touch35pwd277
egrep49wc228tr6pwd44date18cp249mount6egrep148echo35rm272
pwd43tail212eval6eval42bc18ssh193mkdir6date142ssh34egrep268
ifconfig40head198uname5ps40ssh16ls182basename5tail105find34eval259
basename38cd186ps5hostname27tr15pwd161mv4pwd87cp31mount251
mount35tr172sed4cd27netstat15find157eval4cd81ping25sort197
rsh34ps167netstat4egrep26kill15sort154cd4dirname75grep18date192
head34mount165cut4wc22sed14tr146wc3ps74tr17find120
sed30hostname161wc3chmod19awk13date142tr3tr72ls16hostname119
cut28uname155sort3ls18sort10wc122ping3rm71date16ps91
tail26df146head3tr17touch9cd104ln3wc65chown15tr89
netstat26date144dirname2kill17ln9tail98df3head54pkginfo13head73
umount25mv135chown2rm15chown9head96cut3uniq49mv12pkginfo70
mv25mkdir123ping1find15mv8dirname82awk3kill36chmod11mkdir70
ps21fgrep122mv1tail13mkdir7kill70rsh2hostname31tail8wc65
chmod21dirname121df1umask12cut7hostname68rm2mount30sort8tail65
kill16kill118cp12head4bc65head2netstat27prtvtoc8fgrep65
dirname16find94touch9du4fgrep49chown2mkdir26kill7cp60
ping13pwd93mkdir8dirname4ln48umount1ln21sed6kill55
ln12umount76head8chmod4uniq45touch1umask19eval6df49
wc11eval71mount4tail3mount36kill1cp19mkdir5mv48
sort11ln65fgrep4ifconfig3mkdir35egrep1mv17ln5svccfg42
tr10uniq57uniq3egrep3touch26dirname1touch16head5chmod40
touch10du43sleep3scp2sleep24chmod1pkginfo15dirname5pgrep39
df9chmod43rsh3ping2mv21ssh14cut5ssh38
ssh7netstat42pgrep3mount2chmod17scp10wc4uniq36
eval7umask24ln3df2umask13chmod7netstat4ln32
chown6touch21bc3cksum2ifconfig11umount6expr2umask29
rcp5ifconfig19mv2basename2chown11ifconfig5du2ifconfig27
vmstat2rsh13chown2wc1scp10sleep4uniq1chown23
uniq2ping12ifconfig1umount1ping9fgrep4scp1touch22
scp2ssh7netstat9psrinfo3basename1svcadm18
fgrep2sleep7du6ping3umount16
bc2bc6umount5pgrep2ping13
svcs4pkginfo4df2svcs11
pkginfo4df4vmstat1cksum11
cksum3cksum4sar1bc10
chown3svcs3isainfo1sleep8
vmstat1psrinfo1du1isainfo7
bc1cksum1netstat5
basename1chown1du4
awk1bc1iostat3
/dev/null1prtvtoc2
sar1
psrinfo1

Anti-patterns Parser
Symantec Corporation gave me permission to share this study with the community. Here I am with a hope to widen this discussion and learn something. As part of it, anti-patterns.pl is a dirty parser that I wrote and extensively used along the way, and somehow embarrassed of. I'm sharing it only to convey a better idea of how easy it is to catch some of the programming anti-patterns. Before using it, understand that the parser is very incomplete, incorrect (defined how?), without any warranties or guarantees, and all that cal. I won't even recommend using it, but do take a look.

From all the code that I've read so far, I can see this barely scratches the surface. Apart from continuing to find and add programming anti-patterns in Shell and Perl, my next steps are to move on to Java and C in line with my personal and company interests. Please point me in possible directions and reach out to me if you share my interests.

June 10, 2011

Programming Anti-patterns in Perl

I will spare you with another round of the first half of this post and jump straight to the table of anti-patterns and alternatives in Perl. Check out alternatives.pl for detailed examples. These barely scratch the surface for a rich language like Perl, whose motto itself is TMTOWTDI. My Perl knowledge is still amateurish and these concentrate almost exclusively on the "Accessing memory << Calling a function << Forking a process" guideline. This might be useful for anybody switching from Shell to Perl. So I hope you'll have a lot more to contribute.

30th June, 2011: TimeRatio is the ratio of time taken by Alternative to that by Anti-pattern, as taken from two different trials, using Perl 5.8 on Solaris 10 (Sun T5120). I hope these time ratios will highlight better why especially some of the anti-patterns are to be avoided. My suggestion is to not take these numbers on face value.

Anti-patternAlternativeTimeRatio1TimeRatio2
awkopen, split, close0.780.79
basenames/.*\///859.47872.42
catopen, close28.4828.72
cdchdir1588.341577.63
cpuse File::Copy "cp"1.851.87
chmodchmod (Perl)335.75343.30
cutsplit306.35307.05
datelocaltime200.32214.04
dirnames/\/^\/*$//325.41324.56
echoprint4008.824006.22
finduse File::Find0.550.55
grepopen, grep (Perl), close2.562.57
headopen, break, close42.1242.60
hostnameuse Sys::Hostname10361.9410451.25
idgetpwnam43.3843.28
killkill (Perl)55.0450.84
ln -ssymlink1010.70995.73
lsopendir, closedir32.8724.17
mkdirmkdir (Perl)13.2112.93
mkdir -puse File::Path9.859.87
mvuse File::Copy "move"32.5332.65
pinguse Net::Ping1.040.65
ps -elfuse Proc::ProcessTable3.843.84
pwd$ENV{'PWD'}3575.623742.53
rmunlink75.0574.07
rmdirrmdir (Perl)20.1319.88
rm -ruse File::Path9.009.02
sedopen, s/find/replace/, close25.5425.76
sleepsleep (Perl)2825.822824.30
sortopen, sort (Perl), close4.314.18
tailopen, close2.712.69
touchopen, close65.4165.41
umaskumask (Perl)7749.437742.40
unameuse Config13662.5613652.62
uniqopen, unless seen, close2.332.35
wc -lopen, close6.796.82