Question

Прямо сейчас я вижу, содержит ли предложение конкретное слово, разбивая предложение на массив, а затем выполняя включение, чтобы увидеть, содержит ли оно слово. Что-то вроде:

"This is my awesome sentence.".split(" ").include?('awesome')

Но мне интересно, какой самый быстрый способ сделать это с помощью фразы. Как если бы я хотел увидеть, если предложение «Это мое удивительное предложение». содержит фразу "мое удивительное предложение". Я скребу предложения и сравниваю очень большое количество фраз, поэтому скорость важна.

the Tin Man · Answer 1 · 14 января 2011

Вот несколько вариантов:

require 'benchmark'

lorem = ('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut' # !> unused literal ignored
        'enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in' # !> unused literal ignored
        'reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident,' # !> unused literal ignored
        'sunt in culpa qui officia deserunt mollit anim id est laborum.' * 10) << ' foo'


lorem.split.include?('foo') # => true
lorem['foo']                # => "foo"
lorem.include?('foo')       # => true
lorem[/foo/]                # => "foo"
lorem[/fo{2}/]              # => "foo"
lorem[/foo$/]               # => "foo"
lorem[/fo{2}$/]             # => "foo"
lorem[/fo{2}\Z/]            # => "foo"
/foo/.match(lorem)[-1]      # => "foo"
/foo$/.match(lorem)[-1]     # => "foo"
/foo/ =~ lorem              # => 621

n = 500_000

puts RUBY_VERSION
puts "n=#{ n }"
Benchmark.bm(25) do |x|
  x.report("array search:")             { n.times { lorem.split.include?('foo') } }
  x.report("literal search:")           { n.times { lorem['foo']                } }
  x.report("string include?:")          { n.times { lorem.include?('foo')       } }
  x.report("regex:")                    { n.times { lorem[/foo/]                } }
  x.report("wildcard regex:")           { n.times { lorem[/fo{2}/]              } }
  x.report("anchored regex:")           { n.times { lorem[/foo$/]               } }
  x.report("anchored wildcard regex:")  { n.times { lorem[/fo{2}$/]             } }
  x.report("anchored wildcard regex2:") { n.times { lorem[/fo{2}\Z/]            } }
  x.report("/regex/.match")             { n.times { /foo/.match(lorem)[-1]      } }
  x.report("/regex$/.match")            { n.times { /foo$/.match(lorem)[-1]     } }
  x.report("/regex/ =~")                { n.times { /foo/ =~ lorem              } }
  x.report("/regex$/ =~")               { n.times { /foo$/ =~ lorem             } }
  x.report("/regex\Z/ =~")              { n.times { /foo\Z/ =~ lorem            } }
end

И результаты для Ruby 1.9.3:

1.9.3
n=500000
                                user     system      total        real
array search:              12.960000   0.010000  12.970000 ( 12.978311)
literal search:             0.800000   0.000000   0.800000 (  0.807110)
string include?:            0.760000   0.000000   0.760000 (  0.758918)
regex:                      0.660000   0.000000   0.660000 (  0.657608)
wildcard regex:             0.660000   0.000000   0.660000 (  0.660296)
anchored regex:             0.660000   0.000000   0.660000 (  0.664025)
anchored wildcard regex:    0.660000   0.000000   0.660000 (  0.664897)
anchored wildcard regex2:   0.320000   0.000000   0.320000 (  0.328876)
/regex/.match               1.430000   0.000000   1.430000 (  1.424602)
/regex$/.match              1.430000   0.000000   1.430000 (  1.434538)
/regex/ =~                  0.530000   0.000000   0.530000 (  0.538128)
/regex$/ =~                 0.540000   0.000000   0.540000 (  0.536318)
/regexZ/ =~                 0.210000   0.000000   0.210000 (  0.214547)

И 1.8.7:

1.8.7
n=500000
                               user     system      total        real
array search:             21.250000   0.000000  21.250000 ( 21.296039)
literal search:            0.660000   0.000000   0.660000 (  0.660102)
string include?:           0.610000   0.000000   0.610000 (  0.612433)
regex:                     0.950000   0.000000   0.950000 (  0.946308)
wildcard regex:            2.840000   0.000000   2.840000 (  2.850198)
anchored regex:            0.950000   0.000000   0.950000 (  0.951270)
anchored wildcard regex:   2.870000   0.010000   2.880000 (  2.874209)
anchored wildcard regex2:  2.870000   0.000000   2.870000 (  2.868291)
/regex/.match              1.470000   0.000000   1.470000 (  1.479383)
/regex$/.match             1.480000   0.000000   1.480000 (  1.498106)
/regex/ =~                 0.680000   0.000000   0.680000 (  0.677444)
/regex$/ =~                0.700000   0.000000   0.700000 (  0.704486)
/regexZ/ =~                0.700000   0.000000   0.700000 (  0.701943)

ИтакИсходя из результатов, поиск по фиксированной строке, такой как 'foobar'['foo'], медленнее, чем использование регулярного выражения 'foobar'[/foo/], что медленнее, чем эквивалент 'foobar' =~ /foo/.

Исходное решение OP плохо работает, потому что оно пересекает строкудважды: один раз, чтобы разделить его на отдельные слова, и второй раз, итерируя массив, ища фактическое целевое слово.Его производительность будет ухудшаться с увеличением размера строки.

Редактировать: одна вещь, которая меня интересует в производительности Ruby, заключается в том, что привязанное регулярное выражение немного медленнее, чем нефиксированное регулярное выражение.В Perl, когда я впервые запустил такой тест несколько лет назад, все было наоборот.

Вот обновленная версия с использованием Fruity.Различные выражения возвращают разные результаты.Любой может быть использован, если вы хотите увидеть, существует ли целевая строка.Если вы хотите увидеть, находится ли значение в конце строки, например, это тестирование, или узнать местоположение цели, то некоторые из них определенно быстрее других, поэтому выберите соответственно.

require 'fruity'

TARGET_STR = (' ' * 100) + ' foo'

TARGET_STR['foo']            # => "foo"
TARGET_STR[/foo/]            # => "foo"
TARGET_STR[/fo{2}/]          # => "foo"
TARGET_STR[/foo$/]           # => "foo"
TARGET_STR[/fo{2}$/]         # => "foo"
TARGET_STR[/fo{2}\Z/]        # => "foo"
TARGET_STR[/fo{2}\z/]        # => "foo"
TARGET_STR[/foo\Z/]          # => "foo"
TARGET_STR[/foo\z/]          # => "foo"
/foo/.match(TARGET_STR)[-1]  # => "foo"
/foo$/.match(TARGET_STR)[-1] # => "foo"
/foo/ =~ TARGET_STR          # => 101
/foo$/ =~ TARGET_STR         # => 101
/foo\Z/ =~ TARGET_STR        # => 101
TARGET_STR.include?('foo')   # => true
TARGET_STR.index('foo')      # => 101
TARGET_STR.rindex('foo')     # => 101


puts RUBY_VERSION
puts "TARGET_STR.length = #{ TARGET_STR.length }"

puts
puts 'compare fixed string vs. unanchored regex'
compare do 
  fixed_str        { TARGET_STR['foo'] }
  unanchored_regex { TARGET_STR[/foo/] }
end

puts
puts 'compare /foo/ to /fo{2}/'
compare do
  unanchored_regex  { TARGET_STR[/foo/]   }
  unanchored_regex2 { TARGET_STR[/fo{2}/] }
end

puts
puts 'compare unanchored vs. anchored regex' # !> assigned but unused variable - delay
compare do 
  unanchored_regex      { TARGET_STR[/foo/]    }
  anchored_regex_dollar { TARGET_STR[/foo$/]   }
  anchored_regex_Z      { TARGET_STR[/foo\Z/] }
  anchored_regex_z      { TARGET_STR[/foo\z/] }
end

puts
puts 'compare /foo/, match and =~'
compare do
  unanchored_regex    { TARGET_STR[/foo/]           }
  unanchored_match    { /foo/.match(TARGET_STR)[-1] }
  unanchored_eq_match { /foo/ =~ TARGET_STR         }
end

puts
puts 'compare fixed, unanchored, Z, include?, index and rindex'
compare do
  fixed_str        { TARGET_STR['foo']          }
  unanchored_regex { TARGET_STR[/foo/]          }
  anchored_regex_Z { TARGET_STR[/foo\Z/]        }
  include_eh       { TARGET_STR.include?('foo') }
  _index           { TARGET_STR.index('foo')    }
  _rindex          { TARGET_STR.rindex('foo')   }
end

В результате:

# >> 2.2.3
# >> TARGET_STR.length = 104
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 2x ± 0.1
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 8192 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 19.999999999999996% ± 10.0%
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 8192 times. Test will take about 1 second.
# >> unanchored_eq_match is faster than unanchored_regex by 2x ± 0.1 (results differ: 101 vs foo)
# >> unanchored_regex is faster than unanchored_match by 3x ± 0.1
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 32768 times. Test will take about 3 seconds.
# >> _rindex is similar to include_eh (results differ: 101 vs true)
# >> include_eh is faster than _index by 10.000000000000009% ± 10.0% (results differ: true vs 101)
# >> _index is faster than fixed_str by 19.999999999999996% ± 10.0% (results differ: 101 vs foo)
# >> fixed_str is faster than anchored_regex_Z by 39.99999999999999% ± 10.0%
# >> anchored_regex_Z is similar to unanchored_regex

Изменение размера строки открывает полезные сведения.

Изменение до 1000 символов:

# >> 2.2.3
# >> TARGET_STR.length = 1004
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 4096 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 50.0% ± 10.0%
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 2048 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 1 second.
# >> anchored_regex_z is faster than anchored_regex_Z by 10.000000000000009% ± 10.0%
# >> anchored_regex_Z is faster than unanchored_regex by 3x ± 0.1
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 4096 times. Test will take about 1 second.
# >> unanchored_eq_match is similar to unanchored_regex (results differ: 1001 vs foo)
# >> unanchored_regex is faster than unanchored_match by 2x ± 0.1
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 32768 times. Test will take about 4 seconds.
# >> _rindex is faster than anchored_regex_Z by 2x ± 1.0 (results differ: 1001 vs foo)
# >> anchored_regex_Z is faster than include_eh by 2x ± 0.1 (results differ: foo vs true)
# >> include_eh is faster than fixed_str by 10.000000000000009% ± 10.0% (results differ: true vs foo)
# >> fixed_str is similar to _index (results differ: foo vs 1001)
# >> _index is similar to unanchored_regex (results differ: 1001 vs foo)

Увеличение до 10 000:

# >> 2.2.3
# >> TARGET_STR.length = 10004
# >> 
# >> compare fixed string vs. unanchored regex
# >> Running each test 512 times. Test will take about 1 second.
# >> fixed_str is faster than unanchored_regex by 39.99999999999999% ± 10.0%
# >> 
# >> compare /foo/ to /fo{2}/
# >> Running each test 256 times. Test will take about 1 second.
# >> unanchored_regex2 is similar to unanchored_regex
# >> 
# >> compare unanchored vs. anchored regex
# >> Running each test 8192 times. Test will take about 3 seconds.
# >> anchored_regex_z is similar to anchored_regex_Z
# >> anchored_regex_Z is faster than unanchored_regex by 21x ± 1.0
# >> unanchored_regex is similar to anchored_regex_dollar
# >> 
# >> compare /foo/, match and =~
# >> Running each test 256 times. Test will take about 1 second.
# >> unanchored_eq_match is similar to unanchored_regex (results differ: 10001 vs foo)
# >> unanchored_regex is faster than unanchored_match by 10.000000000000009% ± 10.0%
# >> 
# >> compare fixed, unanchored, Z, include?, index and rindex
# >> Running each test 32768 times. Test will take about 18 seconds.
# >> _rindex is faster than anchored_regex_Z by 2x ± 0.1 (results differ: 10001 vs foo)
# >> anchored_regex_Z is faster than include_eh by 15x ± 1.0 (results differ: foo vs true)
# >> include_eh is similar to _index (results differ: true vs 10001)
# >> _index is similar to fixed_str (results differ: 10001 vs foo)
# >> fixed_str is faster than unanchored_regex by 39.99999999999999% ± 10.0%

greggreg · Answer 2 · 14 января 2011

Вы можете легко проверить, содержит ли строка другую строку в квадратных скобках, например:

irb(main):084:0> "This is my awesome sentence."["my awesome sentence"]
=> "my awesome sentence"
irb(main):085:0> "This is my awesome sentence."["cookies for breakfast?"]
=> nil

вернет подстроку, если найдет, или nil, если не найдет. Это должно быть очень быстро.

Phrogz · Answer 3 · 14 января 2011

Вот ответ без ответа, показывающий эталонный тест для кода @TheTinMan для Ruby 1.9.2 на OS X. Обратите внимание на разницу в относительной производительности, особенно улучшения во 2-м и 3-м тестах.

                               user     system      total        real
array search:              7.960000   0.000000   7.960000 (  7.962338)
literal search:            0.450000   0.010000   0.460000 (  0.445905)
string include?:           0.400000   0.000000   0.400000 (  0.400932)
regex:                     0.510000   0.000000   0.510000 (  0.512635)
wildcard regex:            0.520000   0.000000   0.520000 (  0.514800)
anchored regex:            0.510000   0.000000   0.510000 (  0.513328)
anchored wildcard regex:   0.520000   0.000000   0.520000 (  0.517759)
/regex/.match              0.940000   0.000000   0.940000 (  0.943471)
/regex$/.match             0.940000   0.000000   0.940000 (  0.936782)
/regex/ =~                 0.440000   0.000000   0.440000 (  0.446921)
/regex$/ =~                0.450000   0.000000   0.450000 (  0.447904)

Я запустил эти результаты с Benchmark.bmbm, но результаты не отличаются между циклом репетиции и фактическим временем, показанным выше.

user535617 · Answer 4 · 14 января 2011

Если вы не знакомы с регулярными выражениями, я думаю, что они могут решить вашу проблему здесь:

http://www.regular -expressions.info / ruby.html

По сути, вы создадите объект регулярного выражения, ищущий «удивительный» (скорее всего, без учета регистра), а затем вы можете сделать

/regex/.match(string)

Для возврата данных о совпадении. Если вы хотите вернуть индекс, в котором находится символ, вы можете сделать это:

match = "This is my awesome sentence." =~ /awesome/
puts match   #This will return the index of the first letter, so the first a in awesome

Я бы прочитал статью для более подробной информации, хотя она объясняет это лучше, чем я. Если вы не хотите в этом разбираться и просто хотите использовать его, я бы порекомендовал это:

http://www.ruby -doc.org / ядро / классов / Regexp.html

Поиск, содержит ли предложение конкретную фразу в Ruby

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 4 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Поиск, содержит ли предложение конкретную фразу в Ruby

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 4 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов