考虑到新的行和空格字符,将Ruby中的字符串切成最大长度的字符串

to94eoyn  于 5个月前  发布在  Ruby
关注(0)|答案(4)|浏览(64)

我有这么长的文字:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vestibulum id augue id mattis. Praesent congue nisi quam, ac gravida enim viverra non. Vestibulum id interdum sapien, vitae volutpat nisi. Praesent id euismod ipsum. Vestibulum et pulvinar urna, id venenatis ante. Cras commodo eget ligula sit amet mattis. Maecenas congue turpis urna, a bibendum sem consequat nec.

字符串
我想把它分成(最多50个字符/行):

Lorem ipsum dolor sit amet, consectetur adipiscing 
elit. Duis vestibulum id augue id mattis. Praesent
congue nisi quam, ac gravida enim viverra non. 
Vestibulum id interdum sapien, vitae volutpat nisi. 
Praesent id euismod ipsum. Vestibulum et pulvinar 
urna, id venenatis ante. Cras commodo eget ligula 
sit amet mattis. Maecenas congue turpis urna, a 
bibendum sem consequat nec.


我不想分裂的话,所以我不想:

Lorem ipsum dolor sit amet, consectetur adipiscing
 elit. Duis vestibulum id augue id mattis. Praesen
t congue nisi quam, ac gravida enim viverra non. V
estibulum id interdum sapien, vitae volutpat nisi.
 Praesent id euismod ipsum. Vestibulum et pulvinar
 urna, id venenatis ante. Cras commodo eget ligula
 sit amet mattis. Maecenas congue turpis urna, a b
ibendum sem consequat nec.


我尝试使用String#index、String#split等。
没有栏杆,谢谢。
我不明白为什么贪婪在我的最佳解决方案中不起作用:

str = "Lorem ipsum [etc.]"
p str
       .scan(/[^ \n]{1,50}/)
       .join("\n")


然而:

str = "(again)and again)and again)"

p str.scan(/\(.+\)/).first
# => "(again)and again)and again)"

p str.scan(/\(.+?\)/).first
# => "(again)"

# My hope:
p str.scan(/\(.{1,15}\)/).first
# => "(again)and again)"


多谢帮忙!

bq9c1y66

bq9c1y661#

text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis vestibulum id augue id mattis. Praesent congue nisi quam, ac gravida enim viverra non. Vestibulum id interdum sapien, vitae volutpat nisi. Praesent id euismod ipsum. Vestibulum et pulvinar urna, id venenatis ante. Cras commodo eget ligula sit amet mattis. Maecenas congue turpis urna, a bibendum sem consequat nec."

def split_string(string, max_length)
  string.scan(/\b.{1,#{max_length}}\b/m).join("\n")
end

puts split_string(text, 50)

字符串

5cnsuln7

5cnsuln72#

对以下最佳解决方案进行基准测试:

user     system      total        real
Smart Reg (by Cary Swoveland):     1.531920   0.000824   1.532744 (  1.534340)
Simple Reg (by Phil)         :     0.744567   0.000159   0.744726 (  0.745223)
split_string (by Rajagopalan):     0.587626   0.000095   0.587721 (  0.588453)
With index (by Phil)         :     0.696723   0.000051   0.696774 (  0.697468)

字符串
剧本:

require 'benchmark'
MAX_LEN = 70
str = "Exercitation incididunt tempor cupidatat aliquip eu sed ut dolore dolor incididunt amet deserunt culpa dolor cupidatat aliquip do adipisicing ut. Exercitation incididunt tempor cupidatat aliquip eu sed ut dolore dolor incididunt amet deserunt culpa dolor cupidatat aliquip do adipisicing ut. Exercitation incididunt tempor cupidatat aliquip eu sed ut dolore dolor incididunt amet deserunt culpa dolor cupidatat aliquip do adipisicing ut."
# split_string (by Rajagopalan)
def split_string(string, max_length)
  string
   .scan(/\b.{1,#{max_length}}\b/m)
   .join("\n")
end
# puts split_string(str, MAX_LEN)
# With index (by Phil)
def with_indexes(str, max_length)
  segs = []
  o = 0
  start_o = 0
  last_o  = nil
  while o = str.index(' ', o + 1)
    # puts "o = #{o}"
    if o - start_o > max_length
      segs << str[start_o...last_o]
      start_o = last_o + 1
    end
    last_o = o
  end
  segs << str[start_o..-1]
  segs.join("\n")
end
# puts with_indexes(str, MAX_LEN)
# Simple Reg (by Phil)
def with_simple_reg(str, max_length)
  str.scan(/(?:^| )(.{1,#{max_length}})(?:$| )/.freeze)
    .join("\n")
end
# puts with_simple_reg(str, MAX_LEN)
# Smart Reg (by Cary Swoveland)
def with_smart_reg(str, max_length)
  rgx = /
    \S         # match a non-whitespace char
    .{0,#{max_length}}    # match 49 chars
    (?<=\S)    # positive lookbehind asserts previous char is a non-whitespace
    \S         # match a non-whitepace char
    \K         # reset start of match and discard all consumed chars
    (?:\z|\s+) # match the end of the string or >= 1 whitespace chars
  /x           # invoke free-spacing regex definition mode
  s = str.gsub(rgx) { |s| s.empty? ? "" : "\n" }
end
# puts with_smart_reg(str, MAX_LEN)
n = 100000
Benchmark.bm(32) do |x|
  x.report("Smart Reg (by Cary Swoveland): ") { n.times do ; with_smart_reg(str, MAX_LEN)  end }
  x.report("Simple Reg (by Phil)         : ") { n.times do ; with_simple_reg(str, MAX_LEN) end }
  x.report("split_string (by Rajagopalan): ") { n.times do ; split_string(str, MAX_LEN)    end }
  x.report("With index (by Phil)         : ") { n.times do ; with_indexes(str, MAX_LEN)    end }
end
wecizke3

wecizke33#

如果str保存字符串,

rgx = /
  \S         # match a non-whitespace char
  .{0,48}    # match up to 48 chars
  \S         # match a non-whitespace
  \K         # reset start of match and discard all consumed chars
  (?:\z|\s+) # match the end of the string or >= 1 whitespace chars
/x           # invoke free-spacing regex definition mode
s = str.gsub(rgx) { |s| s.empty? ? "" : "\n" }
puts s
Lorem ipsum dolor sit amet, consectetur adipiscing
elit. Duis vestibulum id augue id mattis. Praesent
congue nisi quam, ac gravida enim viverra non.
Vestibulum id interdum sapien, vitae volutpat
nisi. Praesent id euismod ipsum. Vestibulum et
pulvinar urna, id venenatis ante. Cras commodo
eget ligula sit amet mattis. Maecenas congue
turpis urna, a bibendum sem consequat nec.

Demo
假设最后一行包含至少两个非空白字符(例如,句点前面有一个字母)。如果最后一行可能包含一个空白字符,则将.{0,48}\S替换为.{0,49}(?<=\S)

yvt65v4c

yvt65v4c4#

我的解决方案:

text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. […]"
text.scan(/(?:^| )(.{1,50})(?:$| )/)

字符串

相关问题