I
Unique like U all, or so I wish to believe.
November 14, 2011
New Version of Anti-patterns Parser
I have a new version of the anti-patterns parser.
New Features
1. Rules Library: Anti-patterns rules have been moved out of the parser to an external file. Users can therefore add, modify and delete rules to the library without the inconvenience of having to touch the parser. This can also help individuals customize and eliminate false negatives that the parser usually catches (while being generic).
2. Hints: The old parser could only identify anti-patterns in Perl and Shell scripts. This was difficult for users new to the tool and especially for non-experts of Perl and Shell to understand and act upon the output. The new version provides one or more hints to enable fixing each anti-pattern (partly based on Programming Anti-patterns in Perl and Programming Anti-patterns in Shell). For a complete idea, however, the user may still have to refer the alternatives.pl and alternatives.sh scripts.
3. Level: Each anti-pattern also has a level associated with it -- IGNORE, WARN, ERROR. This is currently not being used by the parser itself, but it will hopefully help users to parse for anti-patterns in severity levels of their interest.
The new version also benefits from a few bug fixes, dead code removals, and miscellaneous improvements. Most of all, I finally learnt how to apply Perl's map and nested map, yay! I think the parser is more readable now, and I hope I haven't abused map. You tell me.
Rules
An anti-pattern rule can be defined using the keywords LANG, REGEX, HINT and LEVEL. Lines not starting with a keyword are ignored as comments, and anti-patterns should be separated by comments. LANG can either be "perl" or "shell". REGEX is a Perl regular expression. LANG and REGEX are required keywords. HINT and LEVEL are optional. The anti-patterns parser doesn't try to check for errors in the rules library, so the user will have to take care of that aspect.
Usage
Because of the addition of a rules library, and the option for users to use their own custom library, the parser now needs an extra input.
$ ./anti-patterns.pl file|dir|pkg
Download
I have deleted all previous files related to anti-patterns. You can download the latest zipped folder here. It contains updated versions of anti-patterns.pl, alternatives.pl, alternatives.sh and a new anti-patterns.lib. I am inexplicably hesitant about hosting this on Github. Please note that this is being released under MIT License, thanks to the approval of Symantec Corporation.
Notes
The feature of suggesting one or more hints comes at a price: performance. Because the parser should now check against every anti-pattern, and not break immediately after finding the first one, the parser is several times slower than the old version. This is noticeable while running the parser on a large directory or package, though it may not be while running on a file. By the time I figure out a solution to this (please help!), if you want to disable multiple hints (and are satisfied by just one hint), comment out the only "else" blocks that are part of parsePerl and parseShell subroutines.
The new parser hasn't yet been extensively tested, so there are bound to be bugs. Feel free to report bugs, and suggest new anti-patterns and features. Thank you in advance.
Credits
All credit for this release goes to Rocky Ren, a Symantec colleague. Many people have made feature requests, but Rocky went several steps further. He not only requested specific features, but also suggested how those features could be implemented -- using his own prototype -- making me guilty enough to include his suggestions. The rules file and its format are also entirely his idea. Many thanks to Rocky.
July 27, 2011
Silence: A Virtue or a Vice? - II
NOTE: I have noticed for some time now that Silence: A Virtue or a Vice? is the most popular post on this blog and also ranks very highly for a search engine query. Given its popularity and its implied importance I thought of revisiting it after six years. My punditry is as disputable as ever.
First, semantics. I won't address whether silence is a virtue or a vice. Verbs are less confusing than nouns. To be, or not to be silent, that is the question: Whether 'tis nobler in the mind to ...
"Being silent" is an action (or sometimes an inaction). For convenience, I will classify actions into two types: context-free and context-sensitive. I can't think of good examples of context-free actions: eating chocolate, parents admonishing, managers complaining. Examples of context-sensitive actions: compiling source code, wiping mouth or nose against the sleeve, getting married. For the second category, the what (the actions) can be related to how, when, where, who(m), why.
"Being silent" is highly context-sensitive and its virtuousness or viciousness "depends". A few examples of its context-sensitivity:
How: indifferently, angrily, empathically
When: while watching a movie, after a fight
Where: in a party, in a meeting, in private
Who(m): who I am, who you are (to me)
Why: to hurt, to tolerate, to support
The complexity and subjectivity is evident.
The resolution is known in many ordinary contexts: answering a phone, being silent in the library, being yourself at home. I guess we do what we know. The resolution seems undecidable in some contexts: meeting a grieving friend, greeting a stranger daily seen in the elevator, getting caught while making fun of a colleague. Without a strong reason to do the opposite, I would stick to doing what I am, lest I should further lose balance in an awkward situation. There are some other contexts when there is a clear conflict, between the resolution that we think is more appropriate and the one we tend to choose because that is what we are. The conflict leads to this unpleasant dilemma, whether it is right or wrong, virtuous or vicious.
There is a part of this dilemma that is not rational (I think) but central to it: personality. It is difficult to choose, much less accept, a resolution that goes against one's personality. e.g. I have been described in various psychological profilings to be introverted, reserved, private, etc. along with their qualifiers like strong, primary, very clear. It is fair to say that I have a predilection for silence.
There is however one distinction I realized recently -- this is starting to read like an infomercial -- that diminished to me the magnitude of this conflict (and others of this kind). It is the distinction between personality and behavior. The former is about who a person is; the latter is about how a person is.
Predilection for silence may be part of my personality, but to be or not to be silent is behavioral. Today, as an introvert who speaks up in group meetings far more often than I ever did, I don't think my personality has changed so much (in this aspect), but my behavior has (to a certain extent).
I know it is easier said than done. That is the reason why I'm saying it.
June 23, 2011
The Fuss About Programming Anti-patterns
Andrew Koenig started talking about programming anti-patterns more than 15 years ago. It probably never got as popular as it should have been because it addresses the mistakes in syntactically correct software and we can imagine that the number of wrong ways to do something might be an inexhaustible list compared to the right ways. I still think it is a useful concept. Even experts are known to make mistakes, all the time. Majority of the performance enhancements I came across in my limited experience in fact fall under bug fixes, whether it is because the solution wasn't designed well enough or wasn't implemented well enough.
Given a system, improving its performance in general corresponds to improving the performance of its slowest component. This needs the identification of the slowest component to begin with using profiling at some level. Profiling, being time-consuming, isn't always done until a system (an operation on the system) feels poorly performant, and such feelings get mixed up and get slower to perceive as the thickness (complexity) of a system increases.
For this reason I like the approach of checking for known programming anti-patterns using static code analysis. Static code analysis may get slower than build times, but is faster than test suite execution on built images. For now. The day the list of anti-patterns grows infinitely long, and along with it the static code analysis cycle time, the approach won't be feasible (though tiering of anti-patterns based on severity and history of occurrences seems promising).
Hence my interest in static code analyzers like Coverity and FindBugs, though I am yet to explore them well enough to actually know them. I am not aware of any major work along similar lines for Shell and Perl. I think it's worth exploring because there's a significant amount of software written in them (definitely in Symantec Corporation) and they're already slower than compiled languages.
Shell Scripts
The famous todo software is fully written in Bash. It has a lot of anti-patterns, a few of them being various usages of "echo | grep", "echo | sed", "echo | tr" itself. The longest piping anti-pattern in todo.sh is of the form: "echo | sed | eval sort | sed | awk | sed | eval cat". It takes a list of todo items (echo), pads them appropriately with leading zeroes (sed), sorts them (eval sort), color codes the done items (sed), does something more related to color coding (awk), nullifies a few strings (sed), and then gets the final list (eval cat usually). Doesn't seem like the most natural way to do. If nothing, and if my superficial understanding isn't totally off, all the seds can be merged into the awk.
The Shell scripts that are part of the Cygwin installation on my computer have over a hundred anti-patterns of type "echo | sed" alone, and another hundred of "expr", without having all the packages. Bash Completion is one of the packages which might benefit significantly from fixing Shell anti-patterns, and it might translate into better response times. e.g. Command completion for "gcc --" with gcc v3.4.4-999 and bash-completion v1.3-1 creates 12 new processes, 6 of them due to a sed, sort and tr and their prerequisite bashes.
Perl Scripts
Fedora users will be familiar with dvd::rip, largely written in Perl. Even if one-tenth of its $command lines refer to Unix commands, they account a large number of anti-patterns. I'm not familiar with the command-line utilities related to multimedia, but there seem to be several avoidable usages of "cd", "convert", "echo", "ffmpeg", "ls", "mkdir", "rm", "umask", "which". (I've not gone through the source code, but glanced at the parsed code -- which I'll soon come to -- so I'm likely to be mistaken.)
On Solaris, the SUNWwebminu package (11.10.0 version that shipped with Solaris 10) has a surprising number and a wide variety of Perl anti-patterns using -- cat, chown, cd, cp, echo, find, grep, hostname, mv, ps, pwd, rm, rsh, sed, ssh, uname -- you name it. It could be the package that benefits the most from an overhaul in this direction.
The above mentions are only examples of programming anti-patterns out there in the vast universe of software that is being written, shipped, used. That is understandably because even experts are known to make mistakes, all the time. What we need to work on are mechanisms that can minimize those.
Below is a table of counts of common Unix commands found in scripts across various OS installations that I had access to. They are incomplete and likely full of false negatives, and the OSs were not full installations. I'm sure you understand my preference to not hit you with the versions, package names, their versions, etc. They don't mean much. They don't mean nothing either.
| AIX Perl | AIX Shell | Cygwin Perl | Cygwin Shell | HPUX Perl | HPUX Shell | RHEL Perl | RHEL Shell | Solaris Perl | Solaris Shell | ||||||||||
| Packages | 25 | Packages | 103 | Packages | 19 | Packages | 95 | Packages | 27 | Packages | 56 | Packages | 86 | Packages | 175 | Packages | 42 | Packages | 227 |
| Total Files | 2880 | Total Files | 1832 | Total Files | 2361 | Total Files | 1563 | Total Files | 7630 | Total Files | 7533 | Total Files | 1262 | Total Files | 2518 | Total Files | 5600 | Total Files | 5271 |
| Size (MB) | 27 | Size (MB) | 13.5 | Size (MB) | 25 | Size (MB) | 8.5 | Size (MB) | 57.75 | Size (MB) | 84.25 | Size (MB) | 11.5 | Size (MB) | 13 | Size (MB) | 28.25 | Size (MB) | 20.5 |
| Files | 421 | Files | 759 | Files | 285 | Files | 403 | Files | 796 | Files | 692 | Files | 313 | Files | 673 | Files | 684 | Files | 963 |
| Total | 1935 | Total | 20684 | Total | 257 | Total | 2370 | Total | 841 | Total | 23137 | Total | 255 | Total | 9733 | Total | 1566 | Total | 13907 |
| Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count |
| /dev/null | 410 | /dev/null | 5281 | /dev/null | 89 | /dev/null | 531 | /dev/null | 248 | echo | 5776 | /dev/null | 105 | echo | 2539 | /dev/null | 1088 | echo | 2823 |
| grep | 326 | cat | 5105 | cat | 41 | sed | 381 | cat | 105 | cat | 3948 | cat | 26 | /dev/null | 2037 | rsh | 190 | /dev/null | 2605 |
| echo | 137 | grep | 2607 | pwd | 30 | echo | 371 | echo | 85 | sed | 3869 | ifconfig | 22 | sed | 1582 | mount | 167 | grep | 1431 |
| rm | 99 | echo | 2414 | echo | 27 | cat | 226 | rm | 81 | /dev/null | 3628 | echo | 20 | grep | 1232 | cat | 161 | sed | 1419 |
| mkdir | 93 | awk | 2383 | cp | 15 | basename | 193 | grep | 50 | grep | 2349 | find | 18 | cat | 1112 | cd | 131 | cat | 1224 |
| hostname | 92 | cut | 1348 | hostname | 12 | grep | 185 | hostname | 46 | awk | 1011 | uname | 15 | awk | 281 | umount | 100 | expr | 742 |
| ls | 89 | sed | 1087 | grep | 12 | awk | 137 | pwd | 44 | eval | 977 | ls | 14 | uname | 257 | uname | 73 | awk | 733 |
| awk | 85 | rm | 579 | date | 12 | expr | 116 | cp | 44 | expr | 607 | grep | 14 | expr | 244 | df | 72 | cut | 528 |
| cp | 81 | expr | 466 | cd | 12 | dirname | 68 | find | 41 | uname | 401 | pwd | 12 | eval | 237 | ps | 61 | basename | 466 |
| uname | 71 | ls | 344 | chmod | 11 | sort | 62 | ps | 31 | basename | 389 | ps | 12 | sort | 211 | rm | 59 | uname | 406 |
| date | 69 | sort | 290 | bc | 10 | uname | 59 | uname | 29 | rm | 377 | cp | 12 | ls | 181 | ifconfig | 59 | cd | 327 |
| cat | 62 | egrep | 290 | ls | 9 | ssh | 52 | cd | 27 | cut | 363 | date | 11 | cut | 175 | hostname | 59 | dirname | 310 |
| cd | 58 | basename | 277 | rm | 8 | date | 50 | ls | 18 | egrep | 324 | sort | 8 | basename | 174 | pwd | 39 | ls | 305 |
| find | 55 | cp | 233 | find | 8 | cut | 48 | eval | 18 | ps | 262 | hostname | 7 | find | 161 | touch | 35 | pwd | 277 |
| egrep | 49 | wc | 228 | tr | 6 | pwd | 44 | date | 18 | cp | 249 | mount | 6 | egrep | 148 | echo | 35 | rm | 272 |
| pwd | 43 | tail | 212 | eval | 6 | eval | 42 | bc | 18 | ssh | 193 | mkdir | 6 | date | 142 | ssh | 34 | egrep | 268 |
| ifconfig | 40 | head | 198 | uname | 5 | ps | 40 | ssh | 16 | ls | 182 | basename | 5 | tail | 105 | find | 34 | eval | 259 |
| basename | 38 | cd | 186 | ps | 5 | hostname | 27 | tr | 15 | pwd | 161 | mv | 4 | pwd | 87 | cp | 31 | mount | 251 |
| mount | 35 | tr | 172 | sed | 4 | cd | 27 | netstat | 15 | find | 157 | eval | 4 | cd | 81 | ping | 25 | sort | 197 |
| rsh | 34 | ps | 167 | netstat | 4 | egrep | 26 | kill | 15 | sort | 154 | cd | 4 | dirname | 75 | grep | 18 | date | 192 |
| head | 34 | mount | 165 | cut | 4 | wc | 22 | sed | 14 | tr | 146 | wc | 3 | ps | 74 | tr | 17 | find | 120 |
| sed | 30 | hostname | 161 | wc | 3 | chmod | 19 | awk | 13 | date | 142 | tr | 3 | tr | 72 | ls | 16 | hostname | 119 |
| cut | 28 | uname | 155 | sort | 3 | ls | 18 | sort | 10 | wc | 122 | ping | 3 | rm | 71 | date | 16 | ps | 91 |
| tail | 26 | df | 146 | head | 3 | tr | 17 | touch | 9 | cd | 104 | ln | 3 | wc | 65 | chown | 15 | tr | 89 |
| netstat | 26 | date | 144 | dirname | 2 | kill | 17 | ln | 9 | tail | 98 | df | 3 | head | 54 | pkginfo | 13 | head | 73 |
| umount | 25 | mv | 135 | chown | 2 | rm | 15 | chown | 9 | head | 96 | cut | 3 | uniq | 49 | mv | 12 | pkginfo | 70 |
| mv | 25 | mkdir | 123 | ping | 1 | find | 15 | mv | 8 | dirname | 82 | awk | 3 | kill | 36 | chmod | 11 | mkdir | 70 |
| ps | 21 | fgrep | 122 | mv | 1 | tail | 13 | mkdir | 7 | kill | 70 | rsh | 2 | hostname | 31 | tail | 8 | wc | 65 |
| chmod | 21 | dirname | 121 | df | 1 | umask | 12 | cut | 7 | hostname | 68 | rm | 2 | mount | 30 | sort | 8 | tail | 65 |
| kill | 16 | kill | 118 | cp | 12 | head | 4 | bc | 65 | head | 2 | netstat | 27 | prtvtoc | 8 | fgrep | 65 | ||
| dirname | 16 | find | 94 | touch | 9 | du | 4 | fgrep | 49 | chown | 2 | mkdir | 26 | kill | 7 | cp | 60 | ||
| ping | 13 | pwd | 93 | mkdir | 8 | dirname | 4 | ln | 48 | umount | 1 | ln | 21 | sed | 6 | kill | 55 | ||
| ln | 12 | umount | 76 | head | 8 | chmod | 4 | uniq | 45 | touch | 1 | umask | 19 | eval | 6 | df | 49 | ||
| wc | 11 | eval | 71 | mount | 4 | tail | 3 | mount | 36 | kill | 1 | cp | 19 | mkdir | 5 | mv | 48 | ||
| sort | 11 | ln | 65 | fgrep | 4 | ifconfig | 3 | mkdir | 35 | egrep | 1 | mv | 17 | ln | 5 | svccfg | 42 | ||
| tr | 10 | uniq | 57 | uniq | 3 | egrep | 3 | touch | 26 | dirname | 1 | touch | 16 | head | 5 | chmod | 40 | ||
| touch | 10 | du | 43 | sleep | 3 | scp | 2 | sleep | 24 | chmod | 1 | pkginfo | 15 | dirname | 5 | pgrep | 39 | ||
| df | 9 | chmod | 43 | rsh | 3 | ping | 2 | mv | 21 | ssh | 14 | cut | 5 | ssh | 38 | ||||
| ssh | 7 | netstat | 42 | pgrep | 3 | mount | 2 | chmod | 17 | scp | 10 | wc | 4 | uniq | 36 | ||||
| eval | 7 | umask | 24 | ln | 3 | df | 2 | umask | 13 | chmod | 7 | netstat | 4 | ln | 32 | ||||
| chown | 6 | touch | 21 | bc | 3 | cksum | 2 | ifconfig | 11 | umount | 6 | expr | 2 | umask | 29 | ||||
| rcp | 5 | ifconfig | 19 | mv | 2 | basename | 2 | chown | 11 | ifconfig | 5 | du | 2 | ifconfig | 27 | ||||
| vmstat | 2 | rsh | 13 | chown | 2 | wc | 1 | scp | 10 | sleep | 4 | uniq | 1 | chown | 23 | ||||
| uniq | 2 | ping | 12 | ifconfig | 1 | umount | 1 | ping | 9 | fgrep | 4 | scp | 1 | touch | 22 | ||||
| scp | 2 | ssh | 7 | netstat | 9 | psrinfo | 3 | basename | 1 | svcadm | 18 | ||||||||
| fgrep | 2 | sleep | 7 | du | 6 | ping | 3 | umount | 16 | ||||||||||
| bc | 2 | bc | 6 | umount | 5 | pgrep | 2 | ping | 13 | ||||||||||
| svcs | 4 | pkginfo | 4 | df | 2 | svcs | 11 | ||||||||||||
| pkginfo | 4 | df | 4 | vmstat | 1 | cksum | 11 | ||||||||||||
| cksum | 3 | cksum | 4 | sar | 1 | bc | 10 | ||||||||||||
| chown | 3 | svcs | 3 | isainfo | 1 | sleep | 8 | ||||||||||||
| vmstat | 1 | psrinfo | 1 | du | 1 | isainfo | 7 | ||||||||||||
| bc | 1 | cksum | 1 | netstat | 5 | ||||||||||||||
| basename | 1 | chown | 1 | du | 4 | ||||||||||||||
| awk | 1 | bc | 1 | iostat | 3 | ||||||||||||||
| /dev/null | 1 | prtvtoc | 2 | ||||||||||||||||
| sar | 1 | ||||||||||||||||||
| psrinfo | 1 | ||||||||||||||||||
Anti-patterns Parser
Symantec Corporation gave me permission to share this study with the community. Here I am with a hope to widen this discussion and learn something. As part of it, anti-patterns.pl is a dirty parser that I wrote and extensively used along the way, and somehow embarrassed of. I'm sharing it only to convey a better idea of how easy it is to catch some of the programming anti-patterns. Before using it, understand that the parser is very incomplete, incorrect (defined how?), without any warranties or guarantees, and all that cal. I won't even recommend using it, but do take a look.
From all the code that I've read so far, I can see this barely scratches the surface. Apart from continuing to find and add programming anti-patterns in Shell and Perl, my next steps are to move on to Java and C in line with my personal and company interests. Please point me in possible directions and reach out to me if you share my interests.
Subscribe to:
Posts (Atom)