Чтение строки, содержащей вкладки - PullRequest
1 голос
/ 09 июля 2019

Я получил файл из https://www.clres.com/db/parses/oec/abaft.parse, используя urllib3. У него есть вкладки, а затем \ r \ n. В Python 2.7 я использовал StringIO, но это не доступно в Python 3.7.

Я пытался использовать IO, так как StringIO был удален.

http = urllib3.PoolManager(timeout=10.0)
r = http.urlopen('GET', url, preload_content=False)
remote_file = r.data
memory_file = remote_file.decode('utf-8')
prep_sents = get_sentences(memory_file)
def get_sentence(memory_file):
    sentence = []
    for line in memory_file:
        if not re.match(r'\s*\r?\n', line):

Я ожидаю получить строку, но вместо этого я получаю только первый токен в строке.

1\tWith\twith\t_\tIN\t_\t0\tROOT\t_\t_\t_\t_\t_\t_\r\n

Ответы [ 2 ]

1 голос
/ 09 июля 2019

В memory_file у вас уже есть данные, загруженные с сервера.Чтобы разделить данные, используйте splitlines() и split():

import urllib3

def get_sentences(memory_file):
    sentences = []
    for line in memory_file.splitlines():
        if not line:
            continue
        sentences.append(line.split())
    return sentences

url = 'https://www.clres.com/db/parses/oec/abaft.parse'
http = urllib3.PoolManager(timeout=10.0)
r = http.urlopen('GET', url, preload_content=False)
remote_file = r.data
memory_file = remote_file.decode('utf-8')

prep_sents = get_sentences(memory_file)

for line in prep_sents:
    print(''.join('{: ^13}'.format(w) for w in line))

Отпечатки:

  1          With         with           _           IN            _            0          ROOT           _            _            _            _            _            _      
  2          this         this           _           DT            _            3           det           _            _            _            _            _            _      
  3        security     security         _           NN            _            1          pcomp          _            _            _            _            _            _      
  4           he           he            _           PRP           _            5          subj           _            _            _            _            _            _      
  5           had         have           _           VBD           _            3          rcmod          _            _            _            _            _            _      
  6       established   establish        _           VBN           _            5           vch           _            _            _            _            _            _      
  7           as           as            _           IN            _            6          prep           _            _            _            _            _            _      
  8           his          his           _          PRP$           _            9          poss           _            _            _            _            _            _      
  9          right        right          _           NN            _            7          pcomp          _            _            _            _            _            _      
 10            a            a            _           DT            _           11           det           _            _            _            _            _            _      
 11         caboose      caboose         _           NN            _            6          dobj           _            _            _            _            _            _      
 12          abaft        abaft          _           IN            _            1          prep           _            _            _            _            _            _      
 13           the          the           _           DT            _           14           det           _            _            _            _            _            _      
 14         funnel       funnel          _           NN            _           12          pcomp          _            _            _            _            _            _      
 15           in           in            _           IN            _           14          prep           _            _            _            _            _            _      
 16           the          the           _           DT            _           17           det           _            _            _            _            _            _      
 17        midships     midships         _           NNS           _           15          pcomp          _            _            _            _            _            _      
 18         Bofors       bofors          _           NNP           _           19           nn            _            _            _            _            _            _      
 19        gunshield    gunshield        _           NN            _           14          appos          _            _            _            _            _            _      
 20          where        where          _           WRB           _           19         relmod          _            _            _            _            _            _      
 21           the          the           _           DT            _           22           det           _            _            _            _            _            _      
 22           gun          gun           _           NN            _           23          subj           _            _            _            _            _            _      
 23           had         have           _           VBD           _           20          whcmp          _            _            _            _            _            _      
 24          been          be            _           VBN           _           23           vch           _            _            _            _            _            _      
 25         removed      remove          _           VBN           _           24           vch           _            _            _            _            _            _      
 26            .            .            _            .            _            1          punct          _            _            _            _            _            _      
  1        Dropping       drop           _           VBG           _           14          advcl          _            _            _            _            _            _      
  2          down         down           _           RP            _            1           prt           _            _            _            _            _            _      
  3          abaft        abaft          _           IN            _            1          prep           _            _            _            _            _            _      
  4           the          the           _           DT            _            5           det           _            _            _            _            _            _      
  5         bridge       bridge          _           NN            _            3          pcomp          _            _            _            _            _            _      
  6            ,            ,            _            ,            _           14          punct          _            _            _            _            _            _      
  7           the          the           _           DT            _            9           det           _            _            _            _            _            _      
  8          first        first          _           JJ            _            9          amod           _            _            _            _            _            _      
  9          thing        thing          _           NN            _           14          subj           _            _            _            _            _            _      
 10           to           to            _           TO            _           11         infmark         _            _            _            _            _            _      
 11          come         come           _           VB            _            9         infmod          _            _            _            _            _            _      
 12          into         into           _           IN            _           11          prep           _            _            _            _            _            _      
 13          view         view           _           NN            _           12          pcomp          _            _            _            _            _            _      
 14           was          be            _           VBD           _            0          ROOT           _            _            _            _            _            _      
 15           the          the           _           DT            _           16           det           _            _            _            _            _            _      
 16         funnel       funnel          _           NN            _           14          arg1           _            _            _            _            _            _      
 17            .            .            _            .            _           14          punct          _            _            _            _            _            _      
  1          When         when           _           WRB           _           21          whadv          _            _            _            _            _            _      
  2            a            a            _           DT            _            3           det           _            _            _            _            _            _      
  3        mainsail     mainsail         _           NN            _            4          subj           _            _            _            _            _            _      
  4           was          be            _           VBD           _            1          whcmp          _            _            _            _            _            _      
  5           set          set           _           VBN           _            4           vch           _            _            _            _            _            _      
  6           up           up            _           RP            _            5           prt           _            _            _            _            _            _      
  7           in           in            _           IN            _            5          prep           _            _            _            _            _            _      
  8           the          the           _           DT            _           10           det           _            _            _            _            _            _      
  9         correct      correct         _           JJ            _           10          amod           _            _            _            _            _            _      
 10          place        place          _           NN            _            7          pcomp          _            _            _            _            _            _      
 11          abaft        abaft          _           IN            _            5          prep           _            _            _            _            _            _      
 12           the          the           _           DT            _           13           det           _            _            _            _            _            _      
 13          genoa        genoa          _           NN            _           11          pcomp          _            _            _            _            _            _      
 14            ,            ,            _            ,            _           21          punct          _            _            _            _            _            _      
 15           the          the           _           DT            _           16           det           _            _            _            _            _            _      
 16         strain       strain          _           NN            _           21          subj           _            _            _            _            _            _      
 17           on           on            _           IN            _           16          prep           _            _            _            _            _            _      
 18           the          the           _           DT            _           20           det           _            _            _            _            _            _      
 19        headsail     headsail         _           NN            _           20           nn            _            _            _            _            _            _      
 20          sheet        sheet          _           NN            _           17          pcomp          _            _            _            _            _            _      
 21           was          be            _           VBD           _            0          ROOT           _            _            _            _            _            _      
 22        observed      observe         _           VBN           _           21           vch           _            _            _            _            _            _      
 23           to           to            _           TO            _           24         infmark         _            _            _            _            _            _      
 24          rise         rise           _           VB            _           22          xcomp          _            _            _            _            _            _      
 25      considerably considerably       _           RB            _           24         advmod          _            _            _            _            _            _      
 26            .            .            _            .            _           21          punct          _            _            _            _            _            _      
  1           The          the           _           DT            _            2           det           _            _            _            _            _            _      
  2        carpenter    carpenter        _           NN            _            3          subj           _            _            _            _            _            _      
  3           had         have           _           VBD           _            0          ROOT           _            _            _            _            _            _      
  4         turned        turn           _           VBN           _            3           vch           _            _            _            _            _            _      
  5           the          the           _           DT            _            6           det           _            _            _            _            _            _      
  6         capstan      capstan         _           NN            _            4          dobj           _            _            _            _            _            _      
  7          just         just           _           RB            _            8         advmod          _            _            _            _            _            _      
  8          abaft        abaft          _           IN            _            4          prep           _            _            _            _            _            _      
  9           the          the           _           DT            _           10           det           _            _            _            _            _            _      
 10        mainmast     mainmast         _           NN            _            8          pcomp          _            _            _            _            _            _      
 11          into         into           _           IN            _           10          prep           _            _            _            _            _            _      
 12            a            a            _           DT            _           15           det           _            _            _            _            _            _      
 13        perfectly    perfectly        _           RB            _           14         advmod          _            _            _            _            _            _      
 14       acceptable   acceptable        _           JJ            _           15          amod           _            _            _            _            _            _      
 15          desk         desk           _           NN            _           11          pcomp          _            _            _            _            _            _      
 16            .            .            _            .            _            3          punct          _            _            _            _            _            _      
  1           The          the           _           DT            _            2           det           _            _            _            _            _            _      
  2          first        first          _           JJ            _           11          subj           _            _            _            _            _            _      
  3           of           of            _           IN            _            2          prep           _            _            _            _            _            _      
  4           two          two           _           CD            _            5           num           _            _            _            _            _            _      
  5         hatches       hatch          _           NNS           _            3          pcomp          _            _            _            _            _            _      
  6           to           to            _           TO            _            5          prep           _            _            _            _            _            _      
  7           the          the           _           DT            _           10           det           _            _            _            _            _            _      
  8         control      control         _           NN            _            9           nn            _            _            _            _            _            _      
  9          room         room           _           NN            _           10           nn            _            _            _            _            _            _      
 10         section      section         _           NN            _            6          pcomp          _            _            _            _            _            _      
 11           is           be            _           VBZ           _            0          ROOT           _            _            _            _            _            _      
 12       immediately  immediately       _           RB            _           11         advmod          _            _            _            _            _            _      
 13          abaft        abaft          _           IN            _           11          arg1           _            _            _            _            _            _      
 14           the          the           _           DT            _           15           det           _            _            _            _            _            _      
 15          sail         sail           _           NN            _           13          pcomp          _            _            _            _            _            _      
 16            ,            ,            _            ,            _           11          punct          _            _            _            _            _            _      
 17          being         be            _           VBG           _           11          advcl          _            _            _            _            _            _      
 18           the          the           _           DT            _           20           det           _            _            _            _            _            _      
 19          main         main           _           JJ            _           20          amod           _            _            _            _            _            _      
 20         access       access          _           NN            _           17          arg1           _            _            _            _            _            _      
 21          into         into           _           IN            _           20          prep           _            _            _            _            _            _      
 22           the          the           _           DT            _           23           det           _            _            _            _            _            _      
 23          boat         boat           _           NN            _           21          pcomp          _            _            _            _            _            _      
 24            .            .            _            .            _           11          punct          _            _            _            _            _            _      
1 голос
/ 09 июля 2019

StringIO доступно в Python 3.7

from io import StringIO

memory_file является строкой, поэтому для получения каждой строки вам нужно split:

for line in memory_file.split('\n'):
    print(line)
...