BLOP -- Plotting data

Graphs are automatically deleted!!!

In the following paragraphs many plot commands will be described, which plot data from arrays, files, output of commands, etc. All of these have the common property, that they set the autodel flag of the resulting graph to true. Therefore, whenever the frame containing this graph is cleared, this graph will be deleted, any pointer to this will be invalid!

Plotting data from a file:

Numerical data from a file (or from the output of a command) can be plotted using the plot or mplot commands. The difference between the two is that plot clears the current frame before it plots anything, while mplot does not (the initial letter 'm' stands for 'multi-'), so it can be used to plot several graphs in a frame. Otherwise they function in exactly the same way.

The first argument of these functions is a string (var). If this string begins with '<<', then the remaining part of the string is interpreted as the data to be plotted. (This is like a here-document in the shells. Note, that there is a difference between running blop in interactive [when commands are read from the keyboard] or non-interactive mode [when blop processes a script file]: in the script file a string can be split into several lines:

int main()
{
  plot("<<
          1 1
          2 4
          3 9").ds(points());
}

whereas in the interactive mode it does not work: in this case the whole string is processed as having no linebreaks, so there will be only one single point at (1,1)! In interactive mode you can provide linebreaks with \n ). If the string ends with |, then the previous part of the string is interpreted as a command to be executed by /bin/sh, and the data to be plotted is taken from the output of this command. Otherwise the string is interpreted as the name of a file, the content of which is to be plotted.

These functions create graphs from the supplied data, and put them into the current frame. Note that this does not result in an immediate recalculation of axis ranges or anything else: only a reference to the created graph is stored in the frame, and everything will be calculated at the last point, when the canvas is printed to a terminal.

If no other arguments are provided for the plot(...) command, all columns of the datafile will be read into the graph. Otherwise you have to provide at least 2 further arguments (which are functions). These determine, which columns (or what combinations of which columns) of the datafile will be read into the graph. The number of these arguments determine the number of columns in the created graph (for example if one wants to plot data with errorbars, this drawstyle requires the graph to have 3 columns (the x-value, the y-value and the error), so one has to provide 3 arguments after the filename to the plot command). These functions will be called for each line of the datafile, with their 1st argument being the value of the first field in the line, their 2nd argument being the value of the second field in the line, etc. The special functions _1, _2, ... etc - when evaluated with many arguments -, return their 1st, 2nd, ... etc argument, so these can be used to refer to the 1st, 2nd, ... etc column of the datafile. The special function _0 evaluates to the current linenumber (0-based) of the file. See the documentation of functions about the possibilities (you can plot functions written in C/C++ as well, so there are practically no limits), and for more details.

The plot and mplot functions return a reference to the created graph, so further operations (setting the style of the graph) can be called on it:
plot("datafile",_1,_2).drawstyle(lines()).linewidth(1.5*MM);

This is a short example how to plot several files from a loop, if you have the files 'data_xx.dat', where 'xx' is an index (always two digits):

color c[] = {black, red, green, blue};
int i=0; 
for(var file="data_08.dat"; file(5,6) < 12; file(5,6)++)
{
  mplot(file).ds(points()).pointcolor(c[i++]);
}

Or another way (in this example you must have the files 'data_8.dat', 'data_9.dat', 'data_10.dat', etc).

for(int i = 8; i < 12; ++i)  
{
  mplot("data_" & i & ".dat");
}

If you want to provide the data to be plotted in the script file, and you want to reuse it, the recommended way is to store it in a variable:

var data = "<<
  1 2 3
  4 5 6
  7 8 9";
mplot(data,_1,_2).ds(lines());
mplot(data,_1,_3).ds(lines());

Plotting data stored in an array

Numerical data stored in an array or in a std::vector<var> can be also plotted with the following functions:

dgraph &plot(int n, double *x, [double *y, double *z, double *w,]
	     const function &f1[=unset],
	     const function &f2=unset,
	     const function &f3=unset,
	     const function &f4=unset);

dgraph &plot(int n, var *x, [var *y, var *z, var *w,]
	     const function &f1[=unset],
	     const function &f2=unset,
	     const function &f3=unset,
	     const function &f4=unset);

dgraph &plot(const vector<var> &x, [const vector<var> &y, const vector<var> &z, const vector<var> &w,]
	     const function &f1[=unset],
	     const function &f2=unset,
	     const function &f3=unset,
	     const function &f4=unset);

The expression

[double *y, double *z, double *w,]

in the above function declarations means, for example, that there is a version of this function with only one array (double *x), and also with 2,3 and 4 arrays. In the case of the function with only a single array, at least the first of the transformation functions (f1) must be provided (which can have several components...) because one can not plot a single column, one needs at least 2 columns to be plotted (x/y values)

All of these functions have an mplot version as well (see above). These functions create a dgraph, store it into the current frame, and return a reference to the created dgraph for further properties to be set.

The arguments f1 ... f4 can be used to carry out a transformation on the provided data values before storing them into the dgraph. The special function _0 evaluates to the running index in the arrays (0-based), and _1, _2 etc evaluate to the current value of the x, y ... etc arrays. See the documentation of functions to learn more about functions. The following code for example creates a dgraph with 2 columns:

array x;
// ... fill this array somehow
plot(x,_0*10,_1); // plot the values of the array versus 10 times the index

Plotting a self-prepared dgraph

The plot and mplot commands also exist with a single dgraph argument. In this case the user can prepare his dgraph in any way, and then plot this graph by these functions. In this case care has to be taken about the following points:

The scope of the dgraph should not expire by the time when the canvas is printed to a terminal. Otherwise the canvas (the frame in it) will have an invalid pointer to an already desctucted object, causing a segfault:

if(some_condition)
{
  dgraph g;
  g.add(1,1);
  g.add(2,2);
  mplot(g);   // store a pointer to 'g' in the current frame

}  // the variable 'g' is deleted here (C++), current frame
   // stores an invalid pointer
... 
blopeps::print("output.beps");  // ==> segfault

Keep in mind, that both plot and mplot simply store a pointer in the current frame to the dgraph. That is, in the following code (which seems to plot the same graph twice, with two different styles), the same graph is plotted indeed twice, but with the same drawstyle (since the second plot command returns the reference to the same dgraph, and the drawstyle command on it will set (overwrite) its drawstyle:
```
dgraph g;
g.add(1,1);
g.add(2,2);
mplot(g).drawstyle(lines());
mplot(g).drawstyle(points()); // plots g once more, but overwrites g's drawstyle
```

Plotting a block of a datafile

Blocks in a data file are continuous regions of data, which are limited by empty lines. To plot only a given block of a file, use the block(int) member function of dgraph:

plot("datafile").block(2);

The first part of this statement will create a dgraph and read the whole content of "datafile" into this graph, the second part will remove all data from this graph, which is outside of the 2nd block.

Settings in the datafile

It is possible to store settings in a datafile. In the future many possibilities will be included. These will be single lines in the datafile, introduced by a double hashmark, followed by a command. These are currently available:

##LEGEND:

The rest of the line will be used as the legend of the plot

##XRANGE: ##YRANGE:

Two numbers must follow, they specify the x/y range, respectively

##TITLES:

The rest of the line will be used to specify the axis titles. For example, with the following line:

##TITLES: "time [ns]"  "distance" "velocity"

if one plots the file containing this line, the axis titles will be automatically set:

plot("file",_1,_3);    // xtitle: "time [ns]", ytitle: "velocity"
plot("file",_1,_3*_3); // xtitle: "time [ns]", ytitle: "velocity*velocity"

The titles specified on this line are transformed to LaTeX using the tolatex function (that is, all LaTeX-specific control characters, like $, _, etc are escaped)

##LTITLES:

The same as ##TITLES:, but the LaTeX control characters are not escaped. The text can contain any valid LaTeX input

Customized plotting of anything

Let me first show the problem which has lead to this feature. I had a compiled simulation program, which produced a lot of data (values on a 3D mesh). At the end of the simulation I wanted to make a possibility to visualize it: plot 1D or 2D graphs (data along a line or a plane of the 3D space). One option would be to dump all this data into a file, which contains the following columns: x y z value. Visualizing this data afterwards from a blop script is possible; for example to show data on a x-y plane, one could use conditional plotting (plot data of columns 1,2,4 requiring that the value of the 3rd column (z) has a certain value). This is, however, extremely slow: blop will first of all read the whole file (unnecessarily), and reject those lines, which do not fit the condition. Second, there is a lot of double<->string conversion. Another problem is, that the coordinates might not be chosen to be on an equal-distance mesh with 'easy' values. That is, the z values are for example 1.423e-32, 0.53689, 1.07378, etc. It is difficult to remember these numbers (and to guess correctly the exact format, in which it was written out), and to impose the constraint correctly:

plot_if("datafile",_3==0.53689,_1,_2,_4).ds(cboxes());

So it would be better to plot data directly from memory, from the compiled program. If the data is stored in a 3D array, one can fix the index of the z coordinate, and loop over only the necessary values in the so-specified x-y plane. No problem, one can do this, create a graph and plot it from the compiled program. However, I want to have the possibility to set the scale, set the axis titles, colors, pointstyles, etc - interactively. That is, at the end of the simulation I want to run the blop interpreter from the compiledprogram. But the interpreter knows nothing about functions and data in the compiled program. One would need a mechanism to make a link between the compiled program's data/functions and the interpreter. This can be done in the following way:

Create a class in the compiled program, which is derived from plotcmd_interpreter. This class has to define the plottable *run(const var &cmd) member function:
```
class myinterpreter : public plotcmd_interpreter
{
  public:
    plottable *run(const var &cmd);
};
```
This run function should do the real job: depending on its argument (cmd), it should read data from the memory, or do whatsoever, and create a graph (normally a dgraph), and return its pointer. It can also return a 0-pointer.

Set this interpreter to be used:

plotcmd_interpreter::set(new myinterpreter);

Now everything is set up. One can start the interpreter from the compiled program:
```
G__setothermain(0);
G__main(argc, argv);
```
Within this interpreter one can then call the
```
plotcmd(const var &cmd);
mplotcmd(const var &cmd);
```
functions. These will do exactly what one expects: call myinterpreter's run function, and plot the graph that it has created

Plotting many files (using a filename pattern)

Several data files can be plotted with a single command, using a filename pattern, which is evaluated by the shell (using the command echo pattern). The returned object (plottables, in plural) is a 'collection' of graphs, calling style-changing functions on it (for example drawstyle, etc) changes all the graphs' style).

plottables &plot_many(const var &filename_pattern,
                      const function &f1, 	  
                      const function &f2=unset, 	  
                      const function &f3=unset, 	  
                      const function &f4=unset, 	  
                      const function &f5=unset, 	  
                      const function &f6=unset);
plottables &mplot_many(const var &filename_pattern,
                      const function &f1, 	  
                      const function &f2=unset, 	  
                      const function &f3=unset, 	  
                      const function &f4=unset, 	  
                      const function &f5=unset, 	  
                      const function &f6=unset);

By default (if you do not call any color-changing function), they are displayed with automatically alternating colors. For example, plot all .dat files with linespoints, and a sequence of colors:

plot_many("*.dat").ds(linespoints())

Plotting many graphs from a single file (grouping by values)

Imagine the situation that a data file contains the x,y values in the first two columns, and a parameter in the third column. We would like to plot the x,y values for each value of this parameter separately. This can be done in a complicated loop (selecting those lines from the file, where the 3rd column takes a certain value). But this is cumbersome. There is an easier way to do this in blop:

plot_groups(const var  &filename,
          const function &grouping_value,
          const function &f1 = unset, ..., const function &f6 = unset);

mplot_groups(const var  &filename,
          const function &grouping_value,
          const function &f1 = unset, ..., const function &f6 = unset);

functions are the solution!!! The argument grouping_value specifies the function, which will be called on each dataline, and the returned value will be used to identify the different groups. For example if grouping_value is _3, then the 3rd column's values will identify the groups. This value will be also used as the legend of the graphs.

To additionally impose a condition on the lines of the datafile, use the following functions:

plot_groups_if(const var  &filename,
             const function &grouping_value,
             const function &condition,
             const function &f1 = unset, ..., 
             const function &f6 = unset);

mplot_groups_if(const var  &filename,
             const function &grouping_value,
             const function &condition,
             const function &f1 = unset, ..., 
             const function &f6 = unset);

The function 'condition' will be evaluated on each line, and only those lines will be accepted, for which this function returns non-0

If one wants to plot for example the first two columns (y versus x) for different values of the 3rd column separately, but only for a certain set of the values of the 3rd column, one can use the contained_in function as the condition:

plot_many("filename", _3, contained_in(split("1 2 3 4"))(_3), _1, _2);

Well, this example would not run interactively due to some bugs in CINT, but a workaround for this is to write:

function condition = contained_in(split("1 2 3 4"));
condition = condition(_3);

What is this all, please? Let's go step by step:

The split("1 2 3 4") command produces an array consisting of these 4 numbers
The contained_in function's argument should be a function (blop's function), so this array will be automatically converted to a function (via its constructor), which will have these constants as its return-values

Currently the user can not specify the drawstyle of these graphs, they are automatically determined in a sequential order (color and pointtype changes). Later versions will allow this somehow.

Ignoring data points from the file

One often wants to ignore certain data points in a datafile. For example, if one plots the output file of a C program, it might contain values like nan or inf, which - in most cases - should be simply ignored. Values to be ignored can be added one-by-one to a global list by calling

ignore::add("value");

After this call any subsequent plot("filename",...) commands will feel this effect. By default, the following values are contained in the ignore list: nan, inf, -inf.

Plotting data satisfying a condition:

can be done using the following functions.

plot_if(const var &filename, const function &condition, const function &col1, const function &col2 [,const function &col3, const function &col4]);
mplot_if(const var &filename, const function &condition, const function &col1, const function &col2 [,const function &col3, const function &col4]);

The second argument (a function) will be called on each data row of the file (the 1st argument being the first entry in the row, etc). If this function returns non-0 value, that data row will be plotted.

In the case of many drawstyles the 'continuity' of the input data is also important. For example the lines style will normally break the continuous line at empty lines of data. In the case when one plots data with a condition, the following happens:

At empty lines, commented lines (beginning with #), and lines containing a value to be ignored the data is broken, as for normal plotting
At lines not satisfying the condition the data will be broken according to a flag, which can be set via the static dgraph::falsecondition_break(bool) function. The default value is false for this.

Be careful, however, setting this flag to true. If your data is the following, for example:

then the following piece of code (which plots column 3 vs column 1 for those lines, where col2 == 0) will produce an empty plot:

dgraph::falsecondition_break(true);
plot_if("data",_2==0.0,_1,_3).ds(lines());

Why? Because the data is non-continuous in the 2nd column, and every second line will be skipped due to a false condition, producing a breakpoint in the data - and also in the line to be plotted. However, a line with only single separated points can not be drawn. It requires at least 2 consecutive points without an intermediate breakpoint;

In the example below the 3rd column of the file is plotted vs. the 2nd column, for those rows, where the entry in the first column (_1) is less than 2.

plot_if("filename", _1<2, _2, _3).drawstyle(points());

Such conditions, which could not be realized this way, can be done for example using auxiliary programs, such as awk:

plot("awk 'some_condition {print}' filename |");

Making graphs permanent

As explained somewhere else, every plot(...) command clears the previously plotted graphs, and one needs to call mplot(...) in order to plot over existing graphs. Imagine, however, the situation, that you have a reference curve, which you would like to have always on your figure, and plot other curves on top of this. Then you should first plot(...) your reference curve (this clears all previous graphs), and then mplot(...) your other curve. This is cumbersome.

To make your life easier, blop can make graphs permanent: they will not be erased at the plot(...) commands (only the non-permanent graphs). Therefore, plot your reference curve like this:

plot("datafile").permanent(true);

After this, every subsequent plot(...) commands will erase all other graphs, except this one.

Ordering of graphs

Normally, graphs are plotted on top of each other in the same order as they are plotted. You can change this:

plot("datafile").level(10);

Graphs with higher level will be plotted on the top of other graphs with lower levels. The default level is 10.

Duplicating graphs

The drawstyles in blop are usually written for a single purpose: draw the graph with lines or points or histogram, etc - but not combined. If one wants to have a combined effect (like plotting data with a histogram style AND also errorbars), one has to plot the same data twice, once with both drawstyles. To make this task easier, the graphs implement the dup member function: this funcion duplicates the given graph with its current settings, and adds it to the same frame. It returns a reference to the duplicate, so any further settings influence the second instance. For example, to plot data with histogram and errorbars:

plot("datafile",_1,_2,_3).ds(histo()).dup().ds(syerrorbars());

Note that in the plot command you need to specify all columns which are needed by any of the two drawstyles. The histo drawstyle only needs the first two columns, so it will ignore the 3rd one. It will only be used by syerrorbars (of course one can omit the specification of the columns, then all columns of the datafile will be read and available for the drawstyles) This will of course create a legend for both instances, you may want to set one of those to empty:

plot("datafile",_1,_2,_3).ds(histo()).legend("some data")
    .dup().ds(syerrorbars()).legend("");

Or, alternatively, you can switch to multilegend mode (in this case set the legend for the first instance - that is, before the .dup() command - so that the second instance inherits this legend)

set::multilegend(true);
plot("datafile").ds(histo()).legend("some data").dup().ds(syerrorbars());