Wait the light to fall

beauty-is-best

焉知非鱼

狂热的,贪婪的本性是太少

20 年前,我首次涉足 Perl 编程的工作之一是使用一种纯文本工具,分析其结构并为给定的线宽整齐地设置其格式的工具。 这是一个中等复杂的换行应用程序,我每天都使用它来整理电子邮件通信,软件文档和博客条目。

因此,第 19 届 Perl 每周挑战赛的第二项任务-实现"贪婪的"换行算法-在许多方面对我来说是一个老朋友。

贪婪的换行只是将输入文本中的每个单词都添加到输出的当前行中,除非这样做会导致输出行超出所需的最大行宽,在这种情况下,它会在该点断开行并继续填充第二行,等等。 因此,一个 45 列的贪婪包装的段落如下所示:

  It is a truth universally acknowledged, that
  a single man in possession of a good fortune
  must be in want of a wife. However little
  known the feelings or views of such a man may
  be on his first entering a neighbourhood,
  this truth is so well fixed in the minds of
  the surrounding families, that he is
  considered the rightful property of some one
  or other of their daughters.

生成的文本有些不平衡,并且在右边距处进行了粗糙处理,但在要求的宽度内,并且可读性强。 而且该算法非常简单,可以在一个 Raku 语句中实现:

sub MAIN (:$width = 80) {
    $*IN.slurp.words
        .join(' ')
        .comb(/ . ** {1..$width}  )>  [' ' | $] /)
        .join("\n")
        .say
}

我们采用 STDIN 输入流($*IN),对整个输入进行混搭(.slurp),将其分解为单词(.words)。 我们用每个(.join(' '))之间的单个空格重新连接这些单词,并将文本分成不超过宽度字符(.comb(/.** {1 .. $width}) 的行,假设每一行也以前面是空格或字符串结尾()> [' '| $])的单词边界。 最后,我们用换行符(.join("\n"))将这些行重新连接起来并打印它们(.say)。

这是解决特定挑战的合理单行方案,但我们可以做得更好。

首先,存在一个我们尚未处理的隐患。 也就是说,如果您是一位健康状况堪忧的威尔士前矿工,该怎么办呢?

  Look you, I shall have to be terminating my interdisciplinary
  investigation of consanguineous antidisestablishmentarianism
  in Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch.
  For I've just been electrophotomicrographically diagnosed with
  pseudopneumonoultramicroscopicsilicovolcanoconiosis, isn't it?

重新格式化此输入格式时,我们的单行语句解决方案惨遭失败,因为它无法正确分开城镇或疾病的过长名称。 由于每个字符都有 45 个以上的字符,因此正则表达式必须跳过很长的单词,并根据需要省略尽可能多的前导字符,直到再次找到 45 个结尾字符后跟一个空格。 这样我们得到:

  Look you, I shall have to be terminating my
  interdisciplinary investigation of
  consanguineous antidisestablishmentarianism
  in
  yngyllgogerychwyrndrobwllllantysiliogogogoch.
  For I've just been
  electrophotomicrographically diagnosed with
  neumonoultramicroscopicsilicovolcanoconiosis,
  isn't it?

除了被斩首的单词之外,当算法被迫将诸如 “consanguineous” 和 “electrophotomicrographically” 之类的下划线转移到下一行时,我们还得到了一个荒谬的不平衡右边界。

当然,解决这两个问题并不难。 我们只是给正则表达式一个后备选项:如果由于单词太长而不能在单词边界处断行,我们允许它在内部使该单词断开,前提是该断开距离两端都不太近的单词(例如,$minbreak 中的至少五个字符)。

我们还限制它以不小于指定宽度(即 $minwidth)的 80% 的形式折断常规行,以避免在右边距处出现那些文本裂缝:

sub MAIN (:$width) {
    say greedy-wrap( $*IN.slurp, :$width );
}

sub greedy-wrap( $text,
                :$width    = 80,
                :$minwidth = floor(0.8 * $width),
                :$minbreak = 5,
) {
    $text.words.join(' ')
         .comb(/ . ** {1..$width}  $
               | . ** {$minwidth..$width}  )> ' '
               | . ** {$minbreak..$width}
                      <before \S ** {$minbreak}>
               /)
         .join("\n")
}

在这个版本中,.comb 正则表达式规定,除了最后一行(. ** {$minwidth..$width})之外,我们必须用文字填充至少80%的要求宽度(. ** {1. $width}$),除此之外,我们可以取任何数量的字符,只要我们至少取5个字符(. ** {$minbreak..$width}),并且在下一行的开始处也留下至少5个可见字符(<before \S ** {$minbreak}>)。

这个版本会产生一个更统一的包装:

  Look you, I shall have to be terminating my
  interdisciplinary investigation of consangui
  neous antidisestablishmentarianism in
  Llanfairpwllgwyngyllgogerychwyrndrobwllllanty
  siliogogogoch. For I've just been electrophot
  omicrographically diagnosed with pseudopneumo
  noultramicroscopicsilicovolcanoconiosis,
  isn't it?

除了较长的词现在被毫不客气地砍掉了,甚至连插值的共轭词的礼貌都没有。所以,我们需要在管道中多加一个步骤,在需要的地方加上连字符:

sub greedy-wrap( $text,
                :$width    = 80,
                :$minwidth = floor(0.8 * $width),
                :$minbreak = 5
) {
    $text.words.join(' ')
         .match(/ . ** {1..$width}  $
                | . ** {$minwidth..$width}  )> ' '
                | . ** {$minbreak..$width-1}
                       <broken=before \S ** {$minbreak}>
                /, :global)
         .map({ $^word.<broken> ?? "$^word-" !! $^word })
         .join("\n")
}

在这个版本中,我们使用全局的 .match 而不是 .comb 来将文本分成几行,因为我们需要将长词打碎,只差一个字符的最大宽度(. ** {$minbreak..$width-1}),然后将这些行标记为已打碎(<broken=before \S ** {$minwidth}>),再给这些行加上一个连字符($^word.<broken>??"$^word-"!$^word)。

这就产生了:

  Look you, I shall have to be terminating my
  interdisciplinary investigation of consangui-
  neous antidisestablishmentarianism in
  Llanfairpwllgwyngyllgogerychwyrndrobwllllant-
  ysiliogogogoch. For I've just been electroph-
  otomicrographically diagnosed with pseudopne-
  umonoultramicroscopicsilicovolcanoconiosis,
  isn't it?

你好,TEX

即使我们做了改进,贪婪的换行算法也经常产生丑陋的不平衡段落。例如:

  No one would have believed, in the last years
  of the nineteenth century, that human affairs
  were being watched from the timeless worlds
  of space. No one could have dreamed that we
  were being scrutinised as someone with a
  microscope studies creatures that swarm and
  multiply in a drop of water. And yet, across
  the gulf of space, minds immeasurably
  superior to ours regarded this Earth with
  envious eyes, and slowly, and surely, they
  drew their plans against us...

1981 年,唐纳德-克努斯和迈克尔-普拉斯公布了一种将文本断行的算法,作为 TEX 排版系统的一部分加以实施。该算法考虑了文本中每一个可能插入断行的点,然后找到这些点的子集,产生最均匀的整体结果。

当然,这比贪婪算法的先入为主的方法要复杂得多,成本也高得多。事实上,由于它要考虑从 N 个词中的每一个词开始建立一条线,并运行到 N-M 个后续词中的每一个词,因此,它显然要需要 O(N²) 的空间和时间来计算,而贪婪算法则是节俭的 O(N)。在一个典型的段落上,比如上面的例子,TEX 算法的运行速度要慢60倍左右。

但由于大多数段落都很短(50到100字),N² 的成本通常是可以接受的。所以这里是该方法的一个简单版本,在 Raku 中:

    sub TeX-wrap ($text, :$width = 80, :$minbreak = 5 ) {
        # Extract individual words, hyphenating if necessary...
        my @words = $text.words.map: {
                        my @breaks = .comb: $width-$minbreak;
                        @breaks[0..*-2] »~=» '-';
                        |@breaks;
                    };

        # Compute handy text statistics...
        my @word-len   = @words».chars;
        my $word-count = @words.elems;

        # These track EOL gaps, plus cost and position of breaks...
        my @EOL-gap    = [0 xx $word-count+1] xx $word-count+1;
        my @line-cost  = [0 xx $word-count+1] xx $word-count+1;
        my @total-cost =  0 xx $word-count+1;
        my @break-pos  =  0 xx $word-count+1;

        # Build table of EOL gaps for lines from word i to word j...
        for 1..$word-count -> $i {
            @EOL-gap[$i][$i] = $width - @word-len[$i-1];
            for $i+1 .. $word-count -> $j {
                @EOL-gap[$i][$j]
                    = @EOL-gap[$i][$j-1] - @word-len[$j-1] - 1;
            }
        }

        # Work out the cost of a line built from word i to word j...
        for  1..$word-count -> $i {
        for $i..$word-count -> $j {
            # Overlength lines are infinitely expensive...
            if @EOL-gap[$i][$j] < 0 {
                @line-cost[$i][$j] = Inf;
            }

            # A short final line costs nothing...
            elsif $j == $word-count && @EOL-gap[$i][$j] >= 0 {
                @line-cost[$i][$j] = 0;
            }

            # Cost of other lines is sum-of-squares of EOL gaps...
            else {
                @line-cost[$i][$j] = @EOL-gap[$i][$j]²;
            }
        }
        }

        # Walk through cost table, finding the least-cost path...
        @total-cost[0] = 0;
        for 1..$word-count -> $j {
            @total-cost[$j] = Inf;
            for 1..$j -> $i {
                # Do words i to j (as a line) reduce total cost???
                my $line-ij-cost = @total-cost[$i-1]
                                 + @line-cost[$i][$j];

                if $line-ij-cost < @total-cost[$j] {
                    @total-cost[$j] = $line-ij-cost;
                    @break-pos[$j]  = $i;
                }
            }
        }

        # Extract minimal-cost lines backwards from final line...
        return join "\n", reverse gather loop {
            state $end-word = $word-count;
            my $start-word = @break-pos[$end-word] - 1;
            take @words[$start-word..$end-word-1].join(' ');
            $end-word = $start-word or last;
        }
    }

它比贪婪算法慢,也复杂得多,但是,就像生活中的许多其他方面一样,你得到了你所付出的东西……因为它也能产生更好的行包,比如这些:

  No one would have believed, in the last years
  of the nineteenth century, that human affairs
  were being watched from the timeless worlds
  of space. No one could have dreamed that
  we were being scrutinised as someone with
  a microscope studies creatures that swarm
  and multiply in a drop of water. And yet,
  across the gulf of space, minds immeasurably
  superior to ours regarded this Earth with
  envious eyes, and slowly, and surely, they
  drew their plans against us...

  It is a truth universally acknowledged, that
  a single man in possession of a good fortune
  must be in want of a wife. However little
  known the feelings or views of such a man
  may be on his first entering a neighbourhood,
  this truth is so well fixed in the minds
  of the surrounding families, that he is
  considered the rightful property of some one
  or other of their daughters.

  Look you, I shall have to be terminating
  my interdisciplinary investigation of
  consanguineous antidisestablishmentarianism
  in Llanfairpwllgwyngyllgogerychwyrndrobwlll-
  lantysiliogogogoch. For I've just been
  electrophotomicrographically diagnosed with
  pseudopneumonoultramicroscopicsilicovolc-
  anoconiosis, isn't it?

慢就是顺,顺就是快

你得到了你所支付的东西,但没有理由为这些好处支付过高的费用。Knuth/Plass 算法被广泛使用,因此一直是广泛优化努力的主题。现在已经设计出了在线性时间和空间中运行的版本,尽管内在的复杂性总是要到某个地方去,而且它通常会在代码本身…作为 O(N³) 不可理解。

但并不是所有的优化方案都是脑洞大开的复杂。例如,有一种优雅的O(N * width)算法,它隐含地将文本转换为一个有向图,其中每个节点是一个词,每个边的权重是在该词处断行的成本。然后,通过计算图中最短的路径,可以在线性时间内找到最优的断点。

在 Raku 中,是这样的:

    sub shortest-wrap ($text, :$width = 80, :$minbreak = 5) {
        # Extract and hyphenate individual words (as for TeX)...
        my @words = $text.words.map: {
                        my @breaks = .comb: $width-$minbreak;
                        @breaks[0..*-2] »~=» '-';
                        |@breaks;
                    };
        my $word-count = @words.elems;

        # Compute index positions from start of text to each word...
        my @word-offset = [+] 0, |@words».chars;

        # These track minimum cost, and optimal break positions...
        my @minimum   = flat 0, Inf xx $word-count;
        my @break-pos = 0 xx $word-count+1;

        # Walk through text tracking minimum cost...
        for    0..$word-count -> $i {
        for $i+1..$word-count -> $j {
            # Compute line width for line from word i to word j...
            my $line-ij-width
              = @word-offset[$j] - @word-offset[$i] + $j - $i - 1;

            # No need to track cost for lines wider than maximum...
            last if $line-ij-width > $width;

            # Cost of line increases with square of EOL gap...
            my $cost = @minimum[$i] + ($width - $line-ij-width)²;

            # Track least cost and optimal break position...
            if $cost < @minimum[$j] {
                @minimum[$j]   = $cost;
                @break-pos[$j] = $i;
            }
        }
        }

        # Extract minimal-cost lines backwards (as for TeX)...
        return join "\n", reverse gather loop {
            state $end-word = $word-count;
            my $start-word = @break-pos[$end-word];
            take @words[$start-word..$end-word-1].join(' ');
            $end-word = $start-word or last;
        }
    }

这种方法有时会对断行进行优化,与 TEX 算法略有不同,但总体上总是保持相同的"平衡"外观。例如:

  No one would have believed, in the last
  years of the nineteenth century, that human
  affairs were being watched from the timeless
  worlds of space. No one could have dreamed
  that we were being scrutinised as someone
  with a microscope studies creatures that
  swarm and multiply in a drop of water.
  And yet, across the gulf of space, minds
  immeasurably superior to ours regarded this
  Earth with envious eyes, and slowly, and
  surely, they drew their plans against us...

这两种"最佳拟合"算法的主要区别在于,最短路径方法试图平衡它所建立的所有线路,包括最后一条线路,所以它倾向于产生一个"正方形"包装,一般线路较短,但最后一条线路较长。

它的运行速度也比 TEX 方法快五倍(但仍比贪婪算法慢十倍)。

惩罚寡妇和孤儿

到目前为止,我们所研究的三种方法都存在一个微妙的问题:它们各自只针对一件事进行优化,而 Greedy wrapping 则是针对最大行宽进行优化,而 TEX wrapping 和 shortest-path wrapping 都是针对最大行平衡(即最小的粗糙度)进行优化。Greedy wrapping 优化了最大的行宽,而 TEX wrapping 和最短路径 wrapping 都优化了最大的行平衡(即最小的粗糙度)。

但是,尽管这些特性都是可取的,我们可能还希望在我们的包装文本中看到其他的排版特性。因为有许多其他的方式可以让一段文字变得丑陋。

  Now is the winter of our discontent made
  glorious summer by this sun of York; and
  all the clouds that lour'd upon our
  house in the deep bosom of the ocean
  buried. Now are our brows bound with
  victorious wreaths; our bruised arms
  hung up for monuments; our stern
  alarums changed to merry meetings, our
  dreadful marches to delightful
  measures.

除了行文的不均匀性令人不安外,这种包袱也有轻微的刺激性,因为它反复在语法上不合适的地方断行,使单个词(如 “and”, “buried”, “our” 和 “measures”))在视觉上与其短语的其他部分隔离开来。

孤立的词在行尾称为寡妇,在行首称为孤儿。通过换行将其与适当的上下文隔离开来,它们会使生成的代码看起来很笨拙,而且格式化很差,特别是当(像这里一样)一个鳏夫也构成了一个段落的最后一行。

通常情况下,可以通过提前或延后一个字的断行来避免产生鳏夫和孤儿。

  Now is the winter of our discontent made
  glorious summer by this sun of York; and all
  the clouds that lour'd upon our house in
  the deep bosom of the ocean buried. Now are
  our brows bound with victorious wreaths;
  our bruised arms hung up for monuments;
  our stern alarums changed to merry meetings,
  our dreadful marches to delightful measures.

…但为了达到这种效果,我们的换行算法不仅要意识到它所创建的线条的宽度和平衡,还要意识到文本的内容,以及它选择在哪里断行的美学后果。在实际应用中,这意味着它需要一个更复杂的成本函数来优化。

贪婪算法试图最小化的成本函数只是每行末尾的间隙长度之和:

sub cost (@lines, $width) {
    sum ($width «-« @lines».chars)
}

相比之下,TEX 和最短路径算法试图通过最小化平方和来减少线端间隙长度的变化:

sub cost (@lines, $width) {
    sum ($width «-« @lines».chars)»²
}

但我们可以通过实现和应用更复杂的成本函数,轻松最小化一系列包装行的其他属性。例如,让我们重新设计贪婪算法(我们最快的替代方案),以改善其整体的行平衡,同时减少它在包装文本中留下的寡妇和孤儿的数量。

我们要使用的成本函数是这样的:

    sub cost (@lines, $width) {
          ($width «-« @lines.head(*-1)».chars)»³».abs.sum
        * @lines³
        * (1 + 10 * ( @lines.grep(ORPHANS)
                    + @lines.grep(WIDOWS)
                    )
          )³;
    }

它计算的给定行数的成本是通过量化,然后乘以包装段落的三个理想特征得出的。

统一的包装行,测量为每行的行末间隙的立方之和,除了最后一行:

($width «-« @lines.head(*-1)».chars)»³».abs.sum

所产生的段落的紧凑程度,用总行数的立方来衡量: @lines³

寡妇和孤儿的数量,以找到的孤立词总数的十倍的立方来衡量:

(1 + 10 * ( @lines.grep(ORPHANS) + @lines.grep(WIDOWS) ) )³

与理想线条的零成本相比,成本函数使用立方体而不是平方体,以更快地提高引入多个不需要的特征所产生的惩罚。应用于寡妇和孤儿的十因子反映了对它们的特别强大的美学反对(调整这个数字以适应你个人的排版热情水平)。

孤儿和寡妇的检测方法如下:

sub ORPHANS {/ ^^  \S+  <[.!?,;:]>  [\s | $$] /}
sub WIDOWS  {/ <[.!?,;:]>  \s+  \S+  $$       /}

孤字是指行首的一个单字(^^\S+),后面是任何短语结束的标点符号(<[。!?,;:]>),后面是空格或行尾([\s | $$])。寡妇是紧跟在标点符号后面的一个单字(<[.!?,;:]> \s+ \S+),也是在行尾($$)。

有了这个更复杂的成本函数,我们现在可以同时优化结构特性和美学特性。我们还可以扩展这个函数来惩罚其他不需要的人工制品,比如在介绍性介词后断裂的短语,分割的不定式,或者在行末悬空的文章:

    sub ESTRANGED { / \s [for|with|by|from|as|to|a|the] $$/ }

    sub cost (@lines, $width) {
          ($width «-« @lines.head(*-1)».chars)»³».abs.sum
        * @lines³
        * (1 + 10 * ( @lines.grep(ORPHANS)
                    + @lines.grep(WIDOWS)
                    + @lines.grep(ESTRANGED)
                    )
          )³;
    }

为了使用这样一个复杂的成本函数来优化线路包络,我们需要一种方法来生成备选包络…然后我们可以评估、比较并从中选择。但是,贪婪的包装方法(事实上,TEX 算法和最短路径技术也是如此)总是只能生成一个包装。我们怎样才能得到更多呢?

一个简单快捷的方法是使用贪婪的方法来生成那些额外的包裹,但是要改变它包裹的宽度。例如,如果我们将同一文本包装到45列,然后再包装到相继缩短的宽度,就像这样:

    for 45...40 -> $width {
        my $wrapping = greedy-wrap($text, :$width);
        my $cost     = cost($wrapping.lines, $width);

        say "[$width columns --> cost: $cost]";
        say "$wrapping\n";
    }

…我们得到:

  [45 columns --> cost: 40768]
  Far back in the mists of ancient time, in the
  great and glorious days of the former
  Galactic Empire, life was wild, rich and
  largely tax free.

  [44 columns --> cost: 10051712]
  Far back in the mists of ancient time, in
  the great and glorious days of the former
  Galactic Empire, life was wild, rich and
  largely tax free.

  [43 columns --> cost: 3662912]
  Far back in the mists of ancient time, in
  the great and glorious days of the former
  Galactic Empire, life was wild, rich and
  largely tax free.

  [42 columns --> cost: 851840]
  Far back in the mists of ancient time, in
  the great and glorious days of the former
  Galactic Empire, life was wild, rich and
  largely tax free.

  [41 columns --> cost: 85184]
  Far back in the mists of ancient time, in
  the great and glorious days of the former
  Galactic Empire, life was wild, rich and
  largely tax free.

  [40 columns --> cost: 2752]
  Far back in the mists of ancient time,
  in the great and glorious days of the
  former Galactic Empire, life was wild,
  rich and largely tax free.

40列包装显然能产生最平衡、最少的孤本或寡本,这一点反映在其最小的成本价值上。当然,我们不再利用整个可用的宽度,但为了大幅增加视觉吸引力,减少10%的行长似乎是可以接受的代价。

更有趣的是,以这种方式产生的40列替代方案也比更复杂的 TEX 算法所产生的包装更好看(不幸的是,TEX 算法把第一行末尾的 “in"给忽略了)。

  Far back in the mists of ancient time, in
  the great and glorious days of the former
  Galactic Empire, life was wild, rich and
  largely tax free.

迭代的贪婪方案也比最短路径的方法要好,这种方法把 “时间"寡头化,把 “生命"孤儿化,把行数包装得比要求的45列少了整整20%:

  Far back in the mists of ancient
  time, in the great and glorious days
  of the former Galactic Empire, life
  was wild, rich and largely tax free.

此外,尽管现在技术上是 O(N²)–因为O(N) greedy-wrap 函数现在必须被调用 N/10 次–迭代 greedy 技术仍然比 TEX 算法快 25%,比最短路径方法快近 75%。

但我们还可以做得更好。请注意,当我们将封装宽度从45减少到40时,较窄的边距只是有时改变了产生的封装(在这种情况下,只在45、44和40列)。因此,我们实际上做了两倍于严格意义上必要的工作来寻找最佳宽度。

事实证明,如果前一个包装中最长的行的宽度等于或短于下一个候选宽度,那么尝试下一个候选宽度总是白费力气…因为它必然再次产生完全相同的包装。

所以,我们可以通过跟踪每个包装的实际宽度,只在比这个宽度短的情况下尝试后续的候选宽度来改进我们的搜索循环。而且,如果我们在搜索的过程中还能跟踪到目前为止最好的包装(即成本最低的包装),那么我们就会有一个完整的迭代贪婪包装算法:

    sub iterative-wrap ($text, :$width = 80) {
        # Track the best wrapping we find...
        my $best-wrapping;

        # Allow any width down to 90% of that specified...
        for $width...floor(0.9 * $width) -> $next-width {
            # Only try widths that can produce new wrappings...
            state $prev-max-width = Inf;
            next if $next-width > $prev-max-width;

            # Build the wrapping and evaluate it...
            my $wrapping = greedy-wrap($text, :width($next-width));
            my $cost     = cost($wrapping.lines, $next-width);

            # Keep the wrapping only if it's the best so far...
            state $lowest-cost = Inf;
            if $cost < $lowest-cost {
                $best-wrapping = $wrapping;
                $lowest-cost   = $cost;
            }

            # Try one character narrower next time...
            $prev-max-width = $wrapping.lines».chars.max - 1;
        }

        # Send back the prettiest one we found...
        return $best-wrapping;
    }

通过对跳过非生产性宽度的优化,现在这个方案比 TEX 算法快了2.5倍,比最短路径方法快了25%。

作为最后一步,我们可以用更干净、更短、更 “原生"的 Raku 风格重写上面的代码,这可能也会使它更容易维护:

    sub iterative-wrap ($text, :$width = 80) {
        # Return the least-cost candidate wrapping...
        return min :by{.cost}, gather loop {
            # Start at specified width; stop at 90% thereof...
            state $next-width = $width;
            last if $next-width < floor(0.9 * $width);

            # Create and evaluate another candidate...
            my $wrapping = greedy-wrap($text, :width($next-width));
            my $cost     = cost($wrapping.lines, $next-width);

            # Gather it, annotating it with its score...
            role Cost { has $.cost }
            take $wrapping but Cost($cost);

            # Try one character narrower next time...
            $next-width = $wrapping.lines».chars.max - 1;
        }
    }

在这个版本中,我们在一个无条件的循环中生成每一个候选的包装,从指定的宽度开始(state $next-width = $width),到该宽度的90%结束(last if $next-width < floor(0.9 * $width))。

我们贪婪地创建每一个包装,并与之前完全一样地评估它,但随后我们只是简单地积累包装,用自己的成本来注释它(take $wrapping but Cost($cost))。

Cost 角色给我们提供了一个简单的方法,在不弄乱字符串本身的情况下,将成本信息添加到包含包装的字符串中。角色是一个方法和属性的集合,可以作为一个组件添加到现有的类中。其他语言也有类似的构造,但把它们称为"接口"或"特质"或"协议扩展"或"混合”。

在这种情况下,我们只是通过使用 infix but 操作符将额外的成本跟踪功能添加到包装字符串中…它将左操作数转化为一种从左操作数的 Str 类派生出来的新对象,但是(咳咳!)具有由作为右操作数的角色指定的额外行为。

因此,我们的 gather 循环收集了一连串的 wrapping 字符串,每个字符串现在都有一个额外的 .cost 方法来报告它的成本,然后我们可以应用内置的 min 函数来选择并返回循环产生的最佳 wrapping(return min :by{.cost} gather for {...})

我们新的迭代-wrap子程序的代码比原来的贪婪-wrap实现要长7倍,慢7倍。但它产生的结果也至少漂亮了七倍。而这是一个非常值得的权衡。

Damian