Profiler示例(GNU Octave(版本10.1.0))

13.7性能探查器示例¶

下面,我们将给出一个探查器会话的简短示例。详见分析,用于探查器函数indetail的文档。考虑代码:

global N A;

N = 300;
A = rand (N, N);

function xt = timesteps (steps, x0, expM)
  global N;

  if (steps == 0)
    xt = NA (N, 0);
  else
    xt = NA (N, steps);
    x1 = expM * x0;
    xt(:, 1) = x1;
    xt(:, 2 : end) = timesteps (steps - 1, x1, expM);
  endif
endfunction

function foo ()
  global N A;

  initial = @(x) sin (x);
  x0 = (initial (linspace (0, 2 * pi, N)))';

  expA = expm (A);
  xt = timesteps (100, x0, expA);
endfunction

function fib = bar (N)
  if (N <= 2)
    fib = 1;
  else
    fib = bar (N - 1) + bar (N - 2);
  endif
endfunction

如果我们执行两个主要函数,我们得到:

tic; foo; toc;
⇒ Elapsed time is 2.37338 seconds.

tic; bar (20); toc;
⇒ Elapsed time is 2.04952 seconds.

但这并没有提供太多关于这段时间花在这里的的信息;例如,是否对的单个调用expm比递归时间步进本身更昂贵。为了获得更详细的图片,我们可以使用探查器。

profile on;
foo;
profile off;

data = profile ("info");
profshow (data, 10);

这将打印一张表格,如下所示:

   #  Function Attr     Time (s)        Calls
---------------------------------------------
   7      expm             1.034            1
   3  binary *             0.823          117
  41  binary \             0.188            1
  38  binary ^             0.126            2
  43 timesteps    R        0.111          101
  44        NA             0.029          101
  39  binary +             0.024            8
  34      norm             0.011            1
  40  binary -             0.004          101
  33   balance             0.003            1

分量是已经执行的单个函数(只有10个最重要的函数),以及每个函数的一些信息。分量类似于binary *表示运算符,而其他分量是普通函数。它们包括两个内置组件,如expm以及我们自己的日常生活(例如timesteps). 从这里的侧面,我们可以立即推断出expm占用了最大比例的处理时间,即使它只是调用。第二个代价高昂的运算是子程序中的矩阵向量积timesteps. ⁶

然而,时间并不是个人资料中唯一可用的信息。属性列向我们显示timesteps递归地调用它自己。这在这里的例子中可能并不显著(因为无论如何都很清楚),但在更复数的环境中可能会有所帮助。至于为什么会有一个binary \在输出中,我们也可以很容易地阐明这一点。请注意data是一个结构体数组(Structure Arrays)其中包含字段FunctionTable。这存储了所示配置文件的原始数据。表第一列中的数字给出了索引,在该索引下可以找到所示的函数。正在查找data.FunctionTable(41)给予:

  scalar structure containing the fields:

    FunctionName = binary \
    TotalTime =  0.18765
    NumCalls =  1
    IsRecursive = 0
    Parents =  7
    Children = [](1x0)

在这里,我们再次看到表中的信息,但有其他字段Parents和Children这两个数组都包含已直接调用所讨论函数的函数的索引(其为分量7,expm,在这种情况下)或被它调用(没有函数)。因此,反斜杠运算符已从内部使用expm.

现在让我们来看看bar。为此,我们启动一个刷新配置文件会话(profile on这样做;在重新启动探查器之前删除旧数据):

profile on;
bar (20);
profile off;

profshow (profile ("info"));

这提供了:

   #            Function Attr     Time (s)        Calls
-------------------------------------------------------
   1                 bar    R        2.091        13529
   2           binary <=             0.062        13529
   3            binary -             0.042        13528
   4            binary +             0.023         6764
   5             profile             0.000            1
   8               false             0.000            1
   6              nargin             0.000            1
   7           binary !=             0.000            1
   9 __profiler_enable__             0.000            1

不出所料,bar也是递归的。在以最佳方式递归计算斐波那契数的过程中,它被调用了13529次,大部分时间都花在bar它本身

最后,假设我们想探查两者的执行情况foo和bar在一起因为我们已经为收集了运行时数据bar,我们可以在不清除现有数据的情况下重新启动探查器,并收集有关的丢失统计信息foo。这是通过以下方式完成的:

profile resume;
foo;
profile off;

profshow (profile ("info"), 10);

正如您在下表中看到的,现在我们将两个配置文件混合在一起。

   #  Function Attr     Time (s)        Calls
---------------------------------------------
   1       bar    R        2.091        13529
  16      expm             1.122            1
  12  binary *             0.798          117
  46  binary \             0.185            1
  45  binary ^             0.124            2
  48 timesteps    R        0.115          101
   2 binary <=             0.062        13529
   3  binary -             0.045        13629
   4  binary +             0.041         6772
  49        NA             0.036          101

脚注

6.

我们只知道它是二进制乘法运算符,但幸运的是,这里的运算符只出现在代码中的一个位置,因此我们知道哪种情况需要很长时间。如果有多个位置,我们将不得不使用层次结构体配置文件来找出占用时间的确切位置,这在本例中没有涵盖。