May 30, 2011

Use vxperf2 to Analyze Text Databases

1st July, 2011: GitHub is the new home of vxperf2. Download vxperf2.

In my job I daily have to deal with many different kinds of logs, mostly generated by OS monitoring utilities, and I never found a single tool that summarizes and visualizes data from all those logs. e.g. Spreadsheets are great for typeperf logs on Windows, ksar for sar logs on Linux, and Esxplot for esxtop logs on ESX. But I don't know of one tool that works well for all of them.

I initially wrote a Shell-based tool called plotmon to extract, summarize, and visualize fields (using Gnuplot) of specific interest from a few particular logs (mainly pidstat, sar, top). I then ported it to Perl for better performance. Its drawback was that it handled each log on a case-by-case basis. As the number of OSs (ESX, RHEL, Solaris, Windows), the number of OS monitoring utilties (esxtop, iostat, mpstat, netstat, pidstat, prstat, sar, typeperf, vmstat), and fields of interest that I occasionally had to come across increased, naive plotmon was no longer viable. So I discarded it and picked vxperf2.

vxperf2 is a Perl-based tool that I wrote, maintain, and use a lot for "analyzing" logs. vxperf2 can process any log as long as it is a text database. (Text database: Imagine an SQL database with several different tables in it. For each of the tables, fire "select * from ", and redirect all output to a single text file. That text file is a text database.)

While even Unix top output on the same OS differ in their formats when using two different toprcs, they and all other logs generated by all OS monitoring utilities (et al) can be trivially modified into text databases through minor changes (mostly reformatting). So the way to process any log is through two steps: first modify the log into a text database (simple though log-specific formatting: log2db) and then use vxperf2 on the text database.

vxperf2 takes a rules file containing six keywords: y-axis, z-axis, plot, only, offset, points. y-axis is about the fields of interest (columns of a table) that are to be summarized, z-axis is the field each of whose values has a different meaning (e.g. PID), plot is about the list of fields that are to be plotted together (against timeline), only is about the z-axis values at which previously specified plots are to be plotted, offset is about the number of readings from the beginning that are to be ignored, points is about the number of readings from the offset that are to be considered. If you read the documentation and try to use the tool you'll get a fair idea of the need for these keywords and the flexibility they provide. You will see that vxperf2 can summarize (basic functions count, max, min, sum, avg), visualize (plot), navigate (skip data points), zoom in (consider a subset of data points) and compare (within and across logs) any specified subset of fields of interest. You can also check out the "examples" directory in the tarball for examples of logs, their equivalent text databases, and rules files.

I've been using this tool heavily for most of my work involving log analysis, and so are a few others in my group. Symantec Corporation gave me the permission to release it as open source almost four months ago. The idea of getting it through CPAN or creating a repository on Github has been putting me off since then. I guess this is a place as good as any to start sharing vxperf2 with the outside world. I'm releasing this under GNU GPL today. The tarball contains no malware, and one 14-word README file that should get you started. Ask me if you have any doubts.

My knowledge of Perl is amateurish at best, and this project was one of the major ways in which I learnt some basics, especially two-dimensional data structures in Perl. (vxperf2 uses an array of arrays, an array of hashes, a hash of arrays, and a hash of hashes.) There are bound to be a bunch of bugs, so don't hesitate to holler. On the other hand, development of new features (each keyword, e.g.) has been entirely driven by my immediate needs, so your wishes may not necessaribly be granted immediately (by me).

4 comments:

  1. +1 for open-sourcing it. You should really put this in Git though! You'll probably get a lot more contributions that way.

    ReplyDelete
  2. Github? Sourceforge? CPAN? Google Code? Which one's more suitable? (CPAN, I think.)

    ReplyDelete
  3. Github for its simplicity of use. I don't know how good the community in CPAN is.

    ReplyDelete
  4. Thank you, Sameer. I moved vxperf2 to GitHub.

    ReplyDelete