Объяснение в комментарии. Я положил его туда, потому что он интерпретируется как жирный шрифт или что-то в этом роде, и это портит сообщение.
# I need to capture text that is
# enclosed in tags that are both <b> and
# <i>, but if there is more than one
# text enclosed in <i> in the same <b>
# block, then I only want the text
# enclosed in the first <i> tag, For
# example, for the following line:
#
# <b> <i> Important text here </i>
# irrelevant text everywhere else <i>
# irrelevant text here </i> </b> <b>
# <i> Also Important </i> not important
# <i> not important </i> </b>
#
# I want to retrieve only:
# - Important text here
# - Also Important
#
# I also must not retrieve text inside an
# <h2> block. I have been trying to
# delete the block with nodes.delete(nodes. search('h2')),
# but it doesn't actually delete the h2 block
require "rubygems"
require "nokogiri"
html = <<EOT
<b><i> Important text here </i> more text <i> not important text here </i> </b>
<b> <i> Also Important </i> more text <i> not important </i> </b>
<h2><b> <i> I don't want this text either</i></b></h2>
EOT
doc = Nokogiri::HTML(html)
nodes = doc.search('b i')
nodes.each { |e| puts e }
# Expected output:
# Important text here
# Also Important