извлечь строки между вопросом и ответом - PullRequest
0 голосов
/ 18 февраля 2019
Question No. 01 
Which of the following has more fire resisting characteristics? 
(A) Marble 
(B) Lime stone 
(C) Compact sand stone 
(D) Granite 
Answer: Option C 

Question No. 02 
The rocks which are formed due to cooling of magma at a considerable depth from earth's surface are called 
(A) Plutonic rocks 
(B) Hypabyssal rocks 
(C) Volcanic rocks 
(D) Igneous rocks 
Answer: Option A 

Question No. 03 
Plywood has the advantage of 
(A) Greater tensile strength in longer direction 
(B) Greater tensile strength in shorter direction 
(C) Same tensile strength in all directions 
(D) None of the above Answer: Option C 

Я пытался извлечь вопрос между вопросом № \ d + и ответом: опция в формате списка

with open('Building materials.txt','r') as lines:
    for line in lines:
        if re.search('Question No. (\d+)',line):
            print line.split()

Вывод:

['Which of the following has more fire resisting characteristics?\n(A) Marble \n(B) Lime stone \n(C) Compact sand stone \n(D) Granite','The rocks which are formed due to cooling of magma at a considerable depth from earth's surface are called \n(A) Plutonic rocks \n(B) Hypabyssal rocks \n(C) Volcanic rocks \n(D) Igneous rocks']

Ответы [ 3 ]

0 голосов
/ 18 февраля 2019
"""
This question works if your schema is always the same, meaning...
Question Number
Question
Answer 1
Answer 2
Answer N
...
Good answer.

It doesn't care the number of answer you can have.
"""

if __name__ == '__main__':
    #   Opening your text file.
    with open('file.txt', 'r') as f:
        #   You're getting a list of lines out of it.
        lines = f.readlines()

    #   You want to split your text into blocks.
    #   You know that each blocks are separated by double '\n'.
    #   First, you join all the lines and then, resplit it using the
    #   token you identified.
    lines = ''.join(lines).split('\n\n')

    #   Here, we use the index to change the item in-place.
    for index in range(len(lines)):
        #   First : lines[index].split('\n')[1:-1]
        #   It will split the line using the inner '\n', and strip out
        #   The header, and the answer of your question.
        #   Then, rejoin using the '\n' that has been stripped by split.
        lines[index] = '\n'.join( lines[index].split('\n')[1:-1] )

    #   What stays is what you asked.
    for line in lines:
        print(type(line))
        print(line, end='\n\n')
    # <class 'str'>
    # Which of the following has more fire resisting characteristics? 
    # (A) Marble 
    # (B) Lime stone 
    # (C) Compact sand stone 
    # (D) Granite 

    # <class 'str'>
    # The rocks which are formed due to cooling of magma at a considerable depth from earth's surface are called 
    # (A) Plutonic rocks 
    # (B) Hypabyssal rocks 
    # (C) Volcanic rocks 
    # (D) Igneous rocks 

    # <class 'str'>
    # Plywood has the advantage of 
    # (A) Greater tensile strength in longer direction 
    # (B) Greater tensile strength in shorter direction 
    # (C) Same tensile strength in all directions 
    # (D) None of the above

Если у вас есть строгая схема, то есть такая же схема, как я показывал ранее И у вас строго всегда есть 4 возможности, вы можете сделать ...

if __name__ == '__main__':
    #   Opening your text file.
    with open('file.txt', 'r') as f:
        #   You're getting a list of lines out of it.
        lines = f.readlines()

    #   Create an empty list to store our result.
    my_lines = []
    for index in range(1, len(lines), 8):
        #   Since we exactly know where each line will be, we
        #   jump from blocks to blocks keeping only the first line of interest
        #   as our index.
        #   Plus, as the number of lines needed will always be the same, only
        #   keep a fixed amount of line, then join them all.
        my_lines.append( ''.join(lines[index : index+5]) )

    for line in my_lines:
        print(line)
    # Which of the following has more fire resisting characteristics? 
    # (A) Marble 
    # (B) Lime stone 
    # (C) Compact sand stone 
    # (D) Granite 

    # The rocks which are formed due to cooling of magma at a considerable depth from earth's surface are called 
    # (A) Plutonic rocks 
    # (B) Hypabyssal rocks 
    # (C) Volcanic rocks 
    # (D) Igneous rocks 

    # Plywood has the advantage of 
    # (A) Greater tensile strength in longer direction 
    # (B) Greater tensile strength in shorter direction 
    # (C) Same tensile strength in all directions 
    # (D) None of the above
0 голосов
/ 18 февраля 2019

Вы можете использовать

^Question[^\d\r\n]+
(?P<nr>\d+)\s+
(?P<block>[\s\S]+?)(?=^Answer|\Z)

с флагом verbose и multiline, см. демонстрацию на regex101.com .


In Python:
import re
rx = re.compile(r'''
    ^Question[^\d\r\n]+
    (?P<nr>\d+)\s+
    (?P<block>[\s\S]+?)(?=^$|\Z)''', re.M | re.X)

for m in rx.finditer(your_data_as_string_here):
    print(m.group('nr'), m.group('block'))
0 голосов
/ 18 февраля 2019

Это извлекает файл построчно и сохраняет его в массиве.

with open(fname) as f:
    content = f.readlines()

Если вы хотите избавиться от разрыва строки (если хотите, то хотите), вы можете просто извлечь последний символс каждой строки.

for i in range(content):
    content[i] = content[i][:-1]
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...