Andrew Koenig started talking about programming anti-patterns more than 15 years ago. It probably never got as popular as it should have been because it addresses the mistakes in syntactically correct software and we can imagine that the number of wrong ways to do something might be an inexhaustible list compared to the right ways. I still think it is a useful concept. Even experts are known to make mistakes, all the time. Majority of the performance enhancements I came across in my limited experience in fact fall under bug fixes, whether it is because the solution wasn't designed well enough or wasn't implemented well enough.
Given a system, improving its performance in general corresponds to improving the performance of its slowest component. This needs the identification of the slowest component to begin with using profiling at some level. Profiling, being time-consuming, isn't always done until a system (an operation on the system) feels poorly performant, and such feelings get mixed up and get slower to perceive as the thickness (complexity) of a system increases.
For this reason I like the approach of checking for known programming anti-patterns using static code analysis. Static code analysis may get slower than build times, but is faster than test suite execution on built images. For now. The day the list of anti-patterns grows infinitely long, and along with it the static code analysis cycle time, the approach won't be feasible (though tiering of anti-patterns based on severity and history of occurrences seems promising).
Hence my interest in static code analyzers like Coverity and FindBugs, though I am yet to explore them well enough to actually know them. I am not aware of any major work along similar lines for Shell and Perl. I think it's worth exploring because there's a significant amount of software written in them (definitely in Symantec Corporation) and they're already slower than compiled languages.
Shell Scripts
The famous todo software is fully written in Bash. It has a lot of anti-patterns, a few of them being various usages of "echo | grep", "echo | sed", "echo | tr" itself. The longest piping anti-pattern in todo.sh is of the form: "echo | sed | eval sort | sed | awk | sed | eval cat". It takes a list of todo items (echo), pads them appropriately with leading zeroes (sed), sorts them (eval sort), color codes the done items (sed), does something more related to color coding (awk), nullifies a few strings (sed), and then gets the final list (eval cat usually). Doesn't seem like the most natural way to do. If nothing, and if my superficial understanding isn't totally off, all the seds can be merged into the awk.
The Shell scripts that are part of the Cygwin installation on my computer have over a hundred anti-patterns of type "echo | sed" alone, and another hundred of "expr", without having all the packages. Bash Completion is one of the packages which might benefit significantly from fixing Shell anti-patterns, and it might translate into better response times. e.g. Command completion for "gcc --" with gcc v3.4.4-999 and bash-completion v1.3-1 creates 12 new processes, 6 of them due to a sed, sort and tr and their prerequisite bashes.
Perl Scripts
Fedora users will be familiar with dvd::rip, largely written in Perl. Even if one-tenth of its $command lines refer to Unix commands, they account a large number of anti-patterns. I'm not familiar with the command-line utilities related to multimedia, but there seem to be several avoidable usages of "cd", "convert", "echo", "ffmpeg", "ls", "mkdir", "rm", "umask", "which". (I've not gone through the source code, but glanced at the parsed code -- which I'll soon come to -- so I'm likely to be mistaken.)
On Solaris, the SUNWwebminu package (11.10.0 version that shipped with Solaris 10) has a surprising number and a wide variety of Perl anti-patterns using -- cat, chown, cd, cp, echo, find, grep, hostname, mv, ps, pwd, rm, rsh, sed, ssh, uname -- you name it. It could be the package that benefits the most from an overhaul in this direction.
The above mentions are only examples of programming anti-patterns out there in the vast universe of software that is being written, shipped, used. That is understandably because even experts are known to make mistakes, all the time. What we need to work on are mechanisms that can minimize those.
Below is a table of counts of common Unix commands found in scripts across various OS installations that I had access to. They are incomplete and likely full of false negatives, and the OSs were not full installations. I'm sure you understand my preference to not hit you with the versions, package names, their versions, etc. They don't mean much. They don't mean nothing either.
| AIX Perl | AIX Shell | Cygwin Perl | Cygwin Shell | HPUX Perl | HPUX Shell | RHEL Perl | RHEL Shell | Solaris Perl | Solaris Shell | ||||||||||
| Packages | 25 | Packages | 103 | Packages | 19 | Packages | 95 | Packages | 27 | Packages | 56 | Packages | 86 | Packages | 175 | Packages | 42 | Packages | 227 |
| Total Files | 2880 | Total Files | 1832 | Total Files | 2361 | Total Files | 1563 | Total Files | 7630 | Total Files | 7533 | Total Files | 1262 | Total Files | 2518 | Total Files | 5600 | Total Files | 5271 |
| Size (MB) | 27 | Size (MB) | 13.5 | Size (MB) | 25 | Size (MB) | 8.5 | Size (MB) | 57.75 | Size (MB) | 84.25 | Size (MB) | 11.5 | Size (MB) | 13 | Size (MB) | 28.25 | Size (MB) | 20.5 |
| Files | 421 | Files | 759 | Files | 285 | Files | 403 | Files | 796 | Files | 692 | Files | 313 | Files | 673 | Files | 684 | Files | 963 |
| Total | 1935 | Total | 20684 | Total | 257 | Total | 2370 | Total | 841 | Total | 23137 | Total | 255 | Total | 9733 | Total | 1566 | Total | 13907 |
| Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count | Command | Count |
| /dev/null | 410 | /dev/null | 5281 | /dev/null | 89 | /dev/null | 531 | /dev/null | 248 | echo | 5776 | /dev/null | 105 | echo | 2539 | /dev/null | 1088 | echo | 2823 |
| grep | 326 | cat | 5105 | cat | 41 | sed | 381 | cat | 105 | cat | 3948 | cat | 26 | /dev/null | 2037 | rsh | 190 | /dev/null | 2605 |
| echo | 137 | grep | 2607 | pwd | 30 | echo | 371 | echo | 85 | sed | 3869 | ifconfig | 22 | sed | 1582 | mount | 167 | grep | 1431 |
| rm | 99 | echo | 2414 | echo | 27 | cat | 226 | rm | 81 | /dev/null | 3628 | echo | 20 | grep | 1232 | cat | 161 | sed | 1419 |
| mkdir | 93 | awk | 2383 | cp | 15 | basename | 193 | grep | 50 | grep | 2349 | find | 18 | cat | 1112 | cd | 131 | cat | 1224 |
| hostname | 92 | cut | 1348 | hostname | 12 | grep | 185 | hostname | 46 | awk | 1011 | uname | 15 | awk | 281 | umount | 100 | expr | 742 |
| ls | 89 | sed | 1087 | grep | 12 | awk | 137 | pwd | 44 | eval | 977 | ls | 14 | uname | 257 | uname | 73 | awk | 733 |
| awk | 85 | rm | 579 | date | 12 | expr | 116 | cp | 44 | expr | 607 | grep | 14 | expr | 244 | df | 72 | cut | 528 |
| cp | 81 | expr | 466 | cd | 12 | dirname | 68 | find | 41 | uname | 401 | pwd | 12 | eval | 237 | ps | 61 | basename | 466 |
| uname | 71 | ls | 344 | chmod | 11 | sort | 62 | ps | 31 | basename | 389 | ps | 12 | sort | 211 | rm | 59 | uname | 406 |
| date | 69 | sort | 290 | bc | 10 | uname | 59 | uname | 29 | rm | 377 | cp | 12 | ls | 181 | ifconfig | 59 | cd | 327 |
| cat | 62 | egrep | 290 | ls | 9 | ssh | 52 | cd | 27 | cut | 363 | date | 11 | cut | 175 | hostname | 59 | dirname | 310 |
| cd | 58 | basename | 277 | rm | 8 | date | 50 | ls | 18 | egrep | 324 | sort | 8 | basename | 174 | pwd | 39 | ls | 305 |
| find | 55 | cp | 233 | find | 8 | cut | 48 | eval | 18 | ps | 262 | hostname | 7 | find | 161 | touch | 35 | pwd | 277 |
| egrep | 49 | wc | 228 | tr | 6 | pwd | 44 | date | 18 | cp | 249 | mount | 6 | egrep | 148 | echo | 35 | rm | 272 |
| pwd | 43 | tail | 212 | eval | 6 | eval | 42 | bc | 18 | ssh | 193 | mkdir | 6 | date | 142 | ssh | 34 | egrep | 268 |
| ifconfig | 40 | head | 198 | uname | 5 | ps | 40 | ssh | 16 | ls | 182 | basename | 5 | tail | 105 | find | 34 | eval | 259 |
| basename | 38 | cd | 186 | ps | 5 | hostname | 27 | tr | 15 | pwd | 161 | mv | 4 | pwd | 87 | cp | 31 | mount | 251 |
| mount | 35 | tr | 172 | sed | 4 | cd | 27 | netstat | 15 | find | 157 | eval | 4 | cd | 81 | ping | 25 | sort | 197 |
| rsh | 34 | ps | 167 | netstat | 4 | egrep | 26 | kill | 15 | sort | 154 | cd | 4 | dirname | 75 | grep | 18 | date | 192 |
| head | 34 | mount | 165 | cut | 4 | wc | 22 | sed | 14 | tr | 146 | wc | 3 | ps | 74 | tr | 17 | find | 120 |
| sed | 30 | hostname | 161 | wc | 3 | chmod | 19 | awk | 13 | date | 142 | tr | 3 | tr | 72 | ls | 16 | hostname | 119 |
| cut | 28 | uname | 155 | sort | 3 | ls | 18 | sort | 10 | wc | 122 | ping | 3 | rm | 71 | date | 16 | ps | 91 |
| tail | 26 | df | 146 | head | 3 | tr | 17 | touch | 9 | cd | 104 | ln | 3 | wc | 65 | chown | 15 | tr | 89 |
| netstat | 26 | date | 144 | dirname | 2 | kill | 17 | ln | 9 | tail | 98 | df | 3 | head | 54 | pkginfo | 13 | head | 73 |
| umount | 25 | mv | 135 | chown | 2 | rm | 15 | chown | 9 | head | 96 | cut | 3 | uniq | 49 | mv | 12 | pkginfo | 70 |
| mv | 25 | mkdir | 123 | ping | 1 | find | 15 | mv | 8 | dirname | 82 | awk | 3 | kill | 36 | chmod | 11 | mkdir | 70 |
| ps | 21 | fgrep | 122 | mv | 1 | tail | 13 | mkdir | 7 | kill | 70 | rsh | 2 | hostname | 31 | tail | 8 | wc | 65 |
| chmod | 21 | dirname | 121 | df | 1 | umask | 12 | cut | 7 | hostname | 68 | rm | 2 | mount | 30 | sort | 8 | tail | 65 |
| kill | 16 | kill | 118 | cp | 12 | head | 4 | bc | 65 | head | 2 | netstat | 27 | prtvtoc | 8 | fgrep | 65 | ||
| dirname | 16 | find | 94 | touch | 9 | du | 4 | fgrep | 49 | chown | 2 | mkdir | 26 | kill | 7 | cp | 60 | ||
| ping | 13 | pwd | 93 | mkdir | 8 | dirname | 4 | ln | 48 | umount | 1 | ln | 21 | sed | 6 | kill | 55 | ||
| ln | 12 | umount | 76 | head | 8 | chmod | 4 | uniq | 45 | touch | 1 | umask | 19 | eval | 6 | df | 49 | ||
| wc | 11 | eval | 71 | mount | 4 | tail | 3 | mount | 36 | kill | 1 | cp | 19 | mkdir | 5 | mv | 48 | ||
| sort | 11 | ln | 65 | fgrep | 4 | ifconfig | 3 | mkdir | 35 | egrep | 1 | mv | 17 | ln | 5 | svccfg | 42 | ||
| tr | 10 | uniq | 57 | uniq | 3 | egrep | 3 | touch | 26 | dirname | 1 | touch | 16 | head | 5 | chmod | 40 | ||
| touch | 10 | du | 43 | sleep | 3 | scp | 2 | sleep | 24 | chmod | 1 | pkginfo | 15 | dirname | 5 | pgrep | 39 | ||
| df | 9 | chmod | 43 | rsh | 3 | ping | 2 | mv | 21 | ssh | 14 | cut | 5 | ssh | 38 | ||||
| ssh | 7 | netstat | 42 | pgrep | 3 | mount | 2 | chmod | 17 | scp | 10 | wc | 4 | uniq | 36 | ||||
| eval | 7 | umask | 24 | ln | 3 | df | 2 | umask | 13 | chmod | 7 | netstat | 4 | ln | 32 | ||||
| chown | 6 | touch | 21 | bc | 3 | cksum | 2 | ifconfig | 11 | umount | 6 | expr | 2 | umask | 29 | ||||
| rcp | 5 | ifconfig | 19 | mv | 2 | basename | 2 | chown | 11 | ifconfig | 5 | du | 2 | ifconfig | 27 | ||||
| vmstat | 2 | rsh | 13 | chown | 2 | wc | 1 | scp | 10 | sleep | 4 | uniq | 1 | chown | 23 | ||||
| uniq | 2 | ping | 12 | ifconfig | 1 | umount | 1 | ping | 9 | fgrep | 4 | scp | 1 | touch | 22 | ||||
| scp | 2 | ssh | 7 | netstat | 9 | psrinfo | 3 | basename | 1 | svcadm | 18 | ||||||||
| fgrep | 2 | sleep | 7 | du | 6 | ping | 3 | umount | 16 | ||||||||||
| bc | 2 | bc | 6 | umount | 5 | pgrep | 2 | ping | 13 | ||||||||||
| svcs | 4 | pkginfo | 4 | df | 2 | svcs | 11 | ||||||||||||
| pkginfo | 4 | df | 4 | vmstat | 1 | cksum | 11 | ||||||||||||
| cksum | 3 | cksum | 4 | sar | 1 | bc | 10 | ||||||||||||
| chown | 3 | svcs | 3 | isainfo | 1 | sleep | 8 | ||||||||||||
| vmstat | 1 | psrinfo | 1 | du | 1 | isainfo | 7 | ||||||||||||
| bc | 1 | cksum | 1 | netstat | 5 | ||||||||||||||
| basename | 1 | chown | 1 | du | 4 | ||||||||||||||
| awk | 1 | bc | 1 | iostat | 3 | ||||||||||||||
| /dev/null | 1 | prtvtoc | 2 | ||||||||||||||||
| sar | 1 | ||||||||||||||||||
| psrinfo | 1 | ||||||||||||||||||
Anti-patterns Parser
Symantec Corporation gave me permission to share this study with the community. Here I am with a hope to widen this discussion and learn something. As part of it, anti-patterns.pl is a dirty parser that I wrote and extensively used along the way, and somehow embarrassed of. I'm sharing it only to convey a better idea of how easy it is to catch some of the programming anti-patterns. Before using it, understand that the parser is very incomplete, incorrect (defined how?), without any warranties or guarantees, and all that cal. I won't even recommend using it, but do take a look.
From all the code that I've read so far, I can see this barely scratches the surface. Apart from continuing to find and add programming anti-patterns in Shell and Perl, my next steps are to move on to Java and C in line with my personal and company interests. Please point me in possible directions and reach out to me if you share my interests.
0 comments:
Post a Comment