PDF текст, чтобы преуспеть в отформатированной структуре - PullRequest
1 голос
/ 14 января 2020

У меня есть следующий код, который читает текст из приведенного ниже фрагмента файла pdf

import tika
from tika import parser
import os
import re
mpath = r'C:\Users\XXXXX\Desktop\XXXX'
onlyfiles = [f for f in os.listdir(mpath) if f.endswith('.pdf')]
onlyfiles.sort()
onlyfiles = [os.path.join(mpath, name) for name in onlyfiles]
for s in onlyfiles:
    entry=[]
    fileReader = parser.from_file(s)
    rawList = fileReader['content'].splitlines()
    rawList_1 = list(filter(None, rawList))
    print(rawList)

enter image description here

Мне нужны эти данные в структурированный формат, подобный следующему: -

enter image description here

Вывод, который я получаю с помощью вышеуказанного кода, выглядит следующим образом: -

['4100 S Frontage Rd', 'Interstate Commerce Park', 'Interstate Commerce Park', 'Building 310', 'Lakeland, FL 33815', '100,000 SF', 'For Sale - Active', 'Parking: -', 'Expenses: -', '$5.00-$6.00/nnn', '1', '100,000 SF', '-', '30\'0"', '-', 'Power: -', 'Rail Line: -', '30 ext', '-', '-', 'ESFR', 'Colliers International Tampa Bay Florida / Jan Boltres, CCIM (813) 871-8505 /  Michelle Senner (813) 221-2290 -- 100,000 SF (100,000 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: Gas, Sewer, Water', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'E 1st 100,000 $5.00-$6.00/nnn 09/2020 Negotiable - -Industrial/Direct Colliers International Tampa Bay Florida 100,000  N', 'Building Notes', '100,000 SF for sale pre-construction.  All permits in place.', '2855 Interstate Dr', 'Lakeland Interstate Business', 'Park', 'Lakeland, FL 33805', '56,760 SF', 'For Sale at $6,500,000', '($114.52/SF) - Active', 'Parking: 149 free Surface Spaces are available;  Ratio of', '2.63/1,000 SF', 'Expenses: 2017 Tax @ $0.59/sf', '$7.50-$12.50/nnn', '1', '56,760 SF', '4.87  AC', '26\'0"', '13', 'Power: 800-1600a 3p', 'Rail Line: None', '10 ext', '-', '-', 'Wet', 'Redstone Commercial / Robert Alter (813) 254-6200 X207 -- 30,109 SF (9,800-30,109 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: Gas - Natural, Lighting, Sewer - City, Water - City', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 30,109 $7.50-$12.50/nnn Vacant 5-10 yrs 10 2Flex/Direct Redstone Commercial 9,800-30,109', 'Copyrighted report licensed to CBRE - 1083404.', '11/19/2019', 'Page 1', 'Building Notes', '( P ) 813.254.6200 ( C ) 404.307.1320', '1501 WEST CLEVELAND STREET, SUITE 200', 'TAMPA, FLORIDA 33606', 'REDSTONECOMMERCIAL.COM PROPERTY HIGHLIGHTS', '•57,000 SF - divisible', '•Institutional-quality construction in brand new condition', '•Exceeds Florida Building Commission’s Code Plus designation; fortified above of Class 3 standards, in all structural respects', '•Extremely flexible design to accommodate multiple uses', '•Redundant power capability via separate troughs; ample 480 3-phase service capacity available', '•26’ clear height', '•Typical 40’x40’ bay sizes', '•10 Dock doors and 13 grade-level doors (1 oversized)', '•Isolated rear truck court area designed for added security', '•Abundant parking with potential for up to 4.5/1000 SF', '1940 Longleaf Blvd', 'Build To Suit', 'Lake Wales, FL 33859', '25,000 SF', 'For Sale at $1,600,000', '($50.00/SF) - Active', 'Parking: -', 'Expenses: 2018 Tax @ $0.21/sf', 'Withheld', '-', '32,000 SF', '1.87  AC', '25\'0"', '-', 'Power: -', 'Rail Line: -', '-', '-', '-', 'Yes', 'Commerce Park Realty LLC /  W.Robert W. Richard (508) 892-1000 -- 25,000 SF (12,500-25,000 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: -', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 25,000 Withheld Vacant Negotiable - -Industrial/New Commerce Park Realty LLC 12,500-25,000', 'Building Notes', "Building for lease is on lot #12, 25,000’ available sub-divisible to 12,500. 1 dock per 12,500'", 'Copyrighted report licensed to CBRE - 1083404.', '11/19/2019', 'Page 2', '4152 S Pipkin Rd', 'Parkway Corporate', 'Lakeland, FL 33811', '125,392 SF', 'For Sale at $2,950,000', '($23.53/SF) - Active', 'Parking: -', 'Expenses: -', 'Withheld', '1', '125,392 SF', '8.22  AC', '-', '-', 'Power: -', 'Rail Line: -', '-', '-', '-', 'Yes', 'Buckner Commercial Properties /  A.David A. Buckner (863) 686-7770 -- 125,392 SF (56,192-69,200 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: -', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 69,200 Withheld TBD Negotiable - -Industrial/New Buckner Commercial Properties 69,200  N', 'P 1st 56,192 Withheld TBD Negotiable - -Industrial/New Buckner Commercial Properties 56,192  N', '3595 Recker Hwy', 'Building 1 & 4', 'Winter Haven, FL 33880', '12,900 SF', 'For Sale at $850,000 as part of', 'a portfolio of 3 properties -', 'Active', 'Parking: 10 free Surface Spaces are available;  Ratio of', '0.77/1,000 SF', 'Expenses: 2011 Tax @ $0.09/sf', '$5.65/nnn', '1', '12,900 SF', '0.42  AC', '16\'0"-20\'0"', '9 - 10\'0"w x 12\'0"h', 'Power: -', 'Rail Line: -', 'None', '-', 'None', '-', 'NAI Realvest / Daniel Blackford (407) 427-3432 -- 12,900 SF (12,900 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: -', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'E 1st 12,900 $5.65/nnn 30 Days Negotiable - -Building 1 &', '4', 'Industrial/Direct NAI Realvest 12,900  N', 'Copyrighted report licensed to CBRE - 1083404.', '11/19/2019', 'Page 3', 'Building Notes', '19,920± SF in four buildings on 1.28± acres: Building 1 - 6,400± SF (143’ x 40’); attached to bldg 4; Building 2 - 2,520± SF (24’ x 105’); Building 3 - 4,500± SF (90’ x 54’); Building 4 - 6,500± SF (50’ x 130’); attached', 'to bldg 1.  Office space in building 4.  Clear height: 24’; outside storage.  Grade-level and dock-high overhead doors.  Utilities: Progress Energy & Polk County.  Convenient to US Hwy 92, US Hwy 17, Polk Pkwy', '(SR 570), US Hwy 98 and US Hwy 27.', 'Copyrighted report licensed to CBRE - 1083404.', '11/19/2019', 'Page 4', '\tMultiple Industrial Bldgs per Page w/Photo & Spaces', '\t4100 S Frontage Rd, Lakeland, FL 33815', '\t2855 Interstate Dr, Lakeland, FL 33805', '\t1940 Longleaf Blvd, Lake Wales, FL 33859', '\t4152 S Pipkin Rd, Lakeland, FL 33811', '\t3595 Recker Hwy, Winter Haven, FL 33880']
['4100 S Frontage Rd', 'Interstate Commerce Park', 'Interstate Commerce Park', 'Building 310', 'Lakeland, FL 33815', '100,000 SF', 'For Sale - Active', 'Parking: -', 'Expenses: -', '$5.00-$6.00/nnn', '1', '100,000 SF', '-', '30\'0"', '-', 'Power: -', 'Rail Line: -', '30 ext', '-', '-', 'ESFR', 'Colliers International Tampa Bay Florida / Jan Boltres, CCIM (813) 871-8505 /  Michelle Senner (813) 221-2290 -- 100,000 SF (100,000 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: Gas, Sewer, Water', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'E 1st 100,000 $5.00-$6.00/nnn 09/2020 Negotiable - -Industrial/Direct Colliers International Tampa Bay Florida 100,000  N', 'Building Notes', '100,000 SF for sale pre-construction.  All permits in place.', '2855 Interstate Dr', 'Lakeland Interstate Business', 'Park', 'Lakeland, FL 33805', '56,760 SF', 'For Sale at $6,500,000', '($114.52/SF) - Active', 'Parking: 149 free Surface Spaces are available;  Ratio of', '2.63/1,000 SF', 'Expenses: 2017 Tax @ $0.59/sf', '$7.50-$12.50/nnn', '1', '56,760 SF', '4.87  AC', '26\'0"', '13', 'Power: 800-1600a 3p', 'Rail Line: None', '10 ext', '-', '-', 'Wet', 'Redstone Commercial / Robert Alter (813) 254-6200 X207 -- 30,109 SF (9,800-30,109 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: Gas - Natural, Lighting, Sewer - City, Water - City', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 30,109 $7.50-$12.50/nnn Vacant 5-10 yrs 10 2Flex/Direct Redstone Commercial 9,800-30,109', 'Copyrighted report licensed to CBRE - 1083404.', '11/26/2019', 'Page 1', 'Building Notes', '( P ) 813.254.6200 ( C ) 404.307.1320', '1501 WEST CLEVELAND STREET, SUITE 200', 'TAMPA, FLORIDA 33606', 'REDSTONECOMMERCIAL.COM PROPERTY HIGHLIGHTS', '•57,000 SF - divisible', '•Institutional-quality construction in brand new condition', '•Exceeds Florida Building Commission’s Code Plus designation; fortified above of Class 3 standards, in all structural respects', '•Extremely flexible design to accommodate multiple uses', '•Redundant power capability via separate troughs; ample 480 3-phase service capacity available', '•26’ clear height', '•Typical 40’x40’ bay sizes', '•10 Dock doors and 13 grade-level doors (1 oversized)', '•Isolated rear truck court area designed for added security', '•Abundant parking with potential for up to 4.5/1000 SF', '4116 Logistics Pky', "Florida's Gateway", 'Building E', 'Winter Haven, FL 33880', '407,400 SF', 'For Sale - Active', 'Parking: 215 Surface Spaces are available;  Ratio of 0.52/1,000', 'SF', 'Expenses: -', 'Withheld', '1', '407,400 SF', '932.82  AC', '36\'0"', '4 - 12\'0"w x 14\'0"h', 'Power: 3000a/277-480v 3p', 'Rail Line: CSX', '40 ext', '-', '-', 'ESFR', 'CBRE /  J.David J. Murphy (407) 404-5020 /  Monica P. Wonus (407) 404-5042 -- 407,400 SF (407,400 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: Lighting', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 407,400 Withheld 30 Days Negotiable - -Industrial/Direct CBRE 407,400  N', 'Copyrighted report licensed to CBRE - 1083404.', '11/26/2019', 'Page 2', '1940 Longleaf Blvd', 'Build To Suit', 'Lake Wales, FL 33859', '25,000 SF', 'For Sale at $1,600,000', '($50.00/SF) - Active', 'Parking: -', 'Expenses: 2018 Tax @ $0.21/sf', 'Withheld', '-', '32,000 SF', '1.87  AC', '25\'0"', '-', 'Power: -', 'Rail Line: -', '3 ext', '-', '-', 'Yes', 'Commerce Park Realty LLC /  W.Robert W. Richard (508) 892-1000 -- 25,000 SF (12,500-25,000 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: -', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 25,000 Withheld Vacant Negotiable - -Industrial/New Commerce Park Realty LLC 12,500-25,000', 'Building Notes', "Building for lease is on lot #12, 25,000’ available sub-divisible to 12,500. 1 dock per 12,500'", '4152 S Pipkin Rd', 'Parkway Corporate', 'Lakeland, FL 33811', '125,392 SF', 'For Sale at $2,950,000', '($23.53/SF) - Active', 'Parking: -', 'Expenses: -', 'Withheld', '1', '125,392 SF', '8.22  AC', '-', '-', 'Power: -', 'Rail Line: -', '-', '-', '-', 'Yes', 'Buckner Commercial Properties /  A.David A. Buckner (863) 686-7770 -- 125,392 SF (56,192-69,200 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: -', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'P 1st 69,200 Withheld TBD Negotiable - -Industrial/New Buckner Commercial Properties 69,200  N', 'P 1st 56,192 Withheld TBD Negotiable - -Industrial/New Buckner Commercial Properties 56,192  N', 'Copyrighted report licensed to CBRE - 1083404.', '11/26/2019', 'Page 3', '3595 Recker Hwy', 'Building 1 & 4', 'Winter Haven, FL 33880', '12,900 SF', 'For Sale at $850,000 as part of', 'a portfolio of 3 properties -', 'Active', 'Parking: 10 free Surface Spaces are available;  Ratio of', '0.77/1,000 SF', 'Expenses: 2011 Tax @ $0.09/sf', '$5.65/nnn', '1', '12,900 SF', '0.42  AC', '16\'0"-20\'0"', '9 - 10\'0"w x 12\'0"h', 'Power: -', 'Rail Line: -', 'None', '-', 'None', '-', 'NAI Realvest / Daniel Blackford (407) 427-3432 -- 12,900 SF (12,900 SF)Landlord Rep:', 'Address', 'Building/Park Name', 'SF Avail', 'For Sale ($/SF)', 'Rent/SF/Yr', 'Stories', 'RBA', 'Land', 'Ceiling Height', 'Drive Ins', 'Docks', 'Levelators', 'Crane', 'Sprinkler', 'Utilities: -', 'Leasing CompanyFloor Unit Use/Type Bldg Cntg Rent/SF/YR Occupancy Term Docks Drive-InsSF Avail/Divide?', 'E 1st 12,900 $5.65/nnn 30 Days Negotiable - -Building 1 &', '4', 'Industrial/Direct NAI Realvest 12,900  N', 'Building Notes', '19,920± SF in four buildings on 1.28± acres: Building 1 - 6,400± SF (143’ x 40’); attached to bldg 4; Building 2 - 2,520± SF (24’ x 105’); Building 3 - 4,500± SF (90’ x 54’); Building 4 - 6,500± SF (50’ x 130’); attached', 'to bldg 1.  Office space in building 4.  Clear height: 24’; outside storage.  Grade-level and dock-high overhead doors.  Utilities: Progress Energy & Polk County.  Convenient to US Hwy 92, US Hwy 17, Polk Pkwy', '(SR 570), US Hwy 98 and US Hwy 27.', 'Copyrighted report licensed to CBRE - 1083404.', '11/26/2019', 'Page 4', '\tMultiple Industrial Bldgs per Page w/Photo & Spaces', '\t4100 S Frontage Rd, Lakeland, FL 33815', '\t2855 Interstate Dr, Lakeland, FL 33805', '\t4116 Logistics Pky, Winter Haven, FL 33880', '\t1940 Longleaf Blvd, Lake Wales, FL 33859', '\t4152 S Pipkin Rd, Lakeland, FL 33811', '\t3595 Recker Hwy, Winter Haven, FL 33880']

Как я могу отредактировать приведенный выше код для достижения желаемого результата, Спасибо !!

...