как посчитать частоту слов в словаре? - PullRequest
0 голосов
/ 14 мая 2018

У меня есть словарь, как показано ниже:

[{'mississippi': 1, 'worth': 1, 'reading': 1}, {'commonplace': 1, 'river': 1, 'contrary': 1, 'ways': 1, 'remarkable': 1}, {'considering': 1, 'missouri': 1, 'main': 1, 'branch': 1, 'longest': 1, 'river': 1, 'world--four': 1}, {'seems': 1, 'safe': 1, 'crookedest': 1, 'river': 1, 'part': 1, 'journey': 1, 'uses': 1, 'cover': 1, 'ground': 1, 'crow': 1, 'fly': 1, 'six': 1, 'seventy-five': 1}, {'discharges': 1, 'water': 1, 'st': 1}, {'lawrence': 1, 'twenty-five': 1, 'rhine': 1, 'thirty-eight': 1, 'thames': 1}, {'river': 1, 'vast': 1, 'drainage-basin:': 1, 'draws': 1, 'water': 1, 'supply': 1, 'twenty-eight': 1, 'states': 1, 'territories': 1, 'delaware': 1, 'atlantic': 1, 'seaboard': 1, 'country': 1, 'idaho': 1, 'pacific': 1, 'slope--a': 1, 'spread': 1, 'forty-five': 1, 'degrees': 1, 'longitude': 1}, {'mississippi': 1, 'receives': 1, 'carries': 1, 'gulf': 1, 'water': 1, 'fifty-four': 1, 'subordinate': 1, 'rivers': 1, 'navigable': 1, 'steamboats': 1, 'hundreds': 1, 'flats': 1, 'keels': 1}, {'area': 1, 'drainage-basin': 1, 'combined': 1, 'areas': 1, 'england': 1, 'wales': 1, 'scotland': 1, 'ireland': 1, 'france': 1, 'spain': 1, 'portugal': 1, 'germany': 1, 'austria': 1, 'italy': 1, 'turkey': 1, 'almost': 1, 'wide': 1, 'region': 1, 'fertile': 1, 'mississippi': 1, 'valley': 1, 'proper': 1, 'exceptionally': 1}]

И я хочу изменить его на желаемый результат, как показано ниже, чтобы вычислить оценку сходства между двумя целевыми словами:

river 4
    ground: 1
    journey: 1
    longitude: 1
    main: 1
    world--four: 1
    contrary: 1
    cover: 1
    delaware: 1
    remarkable: 1
    vast: 1
    forty-five: 1
    crookedest: 1
    territories: 1
    spread: 1
    country: 1
    longest: 1
    fly: 1
    atlantic: 1
    crow: 1
    supply: 1
    seems: 1
    idaho: 1
    seaboard: 1
    states: 1
    ways: 1
    degrees: 1
    part: 1
    twenty-eight: 1
    pacific: 1
    branch: 1
    water: 1
    considering: 1
    six: 1
    safe: 1
    commonplace: 1
    draws: 1
    drainage-basin: 1
    uses: 1
    seventy-five: 1
    slope--a: 1
    missouri: 1
mississippi 3
    area: 1
    steamboats: 1
    germany: 1
    reading: 1
    france: 1
    proper: 1
    fifty-four: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    carries: 1
    combined: 1
    flats: 1
    receives: 1
    england: 1
    italy: 1
    scotland: 1
    wales: 1
    almost: 1
    navigable: 1
    austria: 1
    region: 1
    wide: 1
    spain: 1
    subordinate: 1
    drainage-basin: 1
    hundreds: 1
    keels: 1
    portugal: 1
    water: 1
    gulf: 1
    ireland: 1
    rivers: 1
    valley: 1
    fertile: 1
    worth: 1
water 3
    steamboats: 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    fifty-four: 1
    pacific: 1
    vast: 1
    subordinate: 1
    carries: 1
    keels: 1
    flats: 1
    supply: 1
    receives: 1
    atlantic: 1
    forty-five: 1
    river: 1
    rivers: 1
    idaho: 1
    mississippi: 1
    seaboard: 1
    navigable: 1
    discharges: 1
    degrees: 1
    twenty-eight: 1
    drainage-basin: 1
    hundreds: 1
    st: 1
    gulf: 1
    draws: 1
    delaware: 1
    territories: 1
    slope--a: 1
drainage-basin 2
    area: 1
    spread: 1
    country: 1
    states: 1
    mississippi: 1
    longitude: 1
    france: 1
    proper: 1
    vast: 1
    turkey: 1
    forty-five: 1
    areas: 1
    combined: 1
    germany: 1
    exceptionally: 1
    valley: 1
    supply: 1
    fertile: 1
    atlantic: 1
    italy: 1
    river: 1
    idaho: 1
    wales: 1
    almost: 1
    seaboard: 1
    spain: 1
    austria: 1
    region: 1
    degrees: 1
    twenty-eight: 1
    wide: 1
    england: 1
    portugal: 1
    water: 1
    ireland: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    scotland: 1
    slope--a: 1
area 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
journey 1
    ground: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
seems 1
    ground: 1
    journey: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
states 1
    spread: 1
    country: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
slope--a 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
remarkable 1
    contrary: 1
    river: 1
    commonplace: 1
    ways: 1
vast 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    pacific: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
forty-five 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    pacific: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
crookedest 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
carries 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
germany 1
    area: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
longest 1
    main: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
    considering: 1
flats 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    rivers: 1
    receives: 1
supply 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
receives 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
crow 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
scotland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    spain: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
country 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
thames 1
    thirty-eight: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
england 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
navigable 1
    mississippi: 1
    steamboats: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
austria 1
    area: 1
    germany: 1
    mississippi: 1
    france: 1
    proper: 1
    region: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    exceptionally: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
rhine 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    twenty-five: 1
part 1
    ground: 1
    journey: 1
    seems: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
twenty-eight 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
branch 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    missouri: 1
    considering: 1
hundreds 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
st 1
    water: 1
    discharges: 1
considering 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
six 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    fly: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
gulf 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    flats: 1
    rivers: 1
    receives: 1
ireland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    valley: 1
safe 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
commonplace 1
    contrary: 1
    river: 1
    remarkable: 1
    ways: 1
draws 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    supply: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
delaware 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
thirty-eight 1
    thames: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
longitude 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
world--four 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
lawrence 1
    thirty-eight: 1
    thames: 1
    rhine: 1
    twenty-five: 1
ground 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
steamboats 1
    mississippi: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
spread 1
    seaboard: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
idaho 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
reading 1
    mississippi: 1
    worth: 1
almost 1
    area: 1
    germany: 1
    austria: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    mississippi: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
contrary 1
    river: 1
    remarkable: 1
    commonplace: 1
    ways: 1
cover 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
france 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
spain 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
pacific 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
turkey 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
fifty-four 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    hundreds: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
subordinate 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
territories 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    supply: 1
    atlantic: 1
    slope--a: 1
    river: 1
    country: 1
combined 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
exceptionally 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
region 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
twenty-five 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    rhine: 1
rivers 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    receives: 1
fly 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
atlantic 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    river: 1
    supply: 1
    twenty-eight: 1
    idaho: 1
italy 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
main 1
    world--four: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
areas 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
seaboard 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
fertile 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
ways 1
    contrary: 1
    river: 1
    remarkable: 1
    commonplace: 1
discharges 1
    water: 1
    st: 1
degrees 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
wide 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
proper 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
keels 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    water: 1
    fifty-four: 1
    hundreds: 1
    subordinate: 1
    carries: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
portugal 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    ireland: 1
    valley: 1
worth 1
    mississippi: 1
    reading: 1
uses 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    fly: 1
    seventy-five: 1
    river: 1
seventy-five 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    river: 1
    fly: 1
valley 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
missouri 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    world--four: 1
    considering: 1
wales 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1

первая строка - целевое слово и его частота во всем словаре.Ниже приведены связанные слова и их частота в одном предложении с целевым словом.Как и в первом словаре, профиль, связанный с «Миссисипи», будет содержать ссылки на «ценность» и «чтение», а частота их слов в предложении равна 1, но частота слов Миссисипи равна 3 во всем словаре.И я хочу отсортировать частоту слов целевого слова в порядке убывания.Кто-нибудь может помочь?

Ответы [ 2 ]

0 голосов
/ 15 мая 2018

Не ясно ни из ваших желаемых результатов, ни из вашего кода, что именно вы пытаетесь достичь, но если это просто подсчет слов в отдельных предложениях, тогда стратегия должна быть:

  1. Считайте common.txt в set для быстрого поиска.
  2. Прочитайте sample.txt и разделите на ., чтобы получить отдельные предложения.
  3. Удалите все несловарные символы (вам придется определить их или использовать регулярное выражение \b для захвата границ слов) и заменить их пробелами.
  4. Разделите пробел и посчитайте слова, отсутствующие в set из Шаг 1 .

Итак:

import collections

with open("common.txt", "r") as f:  # open the `common.txt` for reading
    common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set

interpunction = ";,'\""  # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))

sentences_counter = []  # a list to hold a word count for each sentence
with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
    # read the whole file to include linebreaks and split on `.` to get individual sentences
    sentences = [s for s in f.read().split(".") if s.strip()]  # ignore empty sentences
    for sentence in sentences:  # iterate over each sentence
        sentence = sentence.translate(trans_table)  # replace the interpunction with spaces
        word_counter = collections.defaultdict(int)  # a string:int default dict for counting
        for word in sentence.split():  # split the sentence and iterate over the words
            if word.lower() not in common_words:  # count only words not in the common.txt
                word_counter[word.lower()] += 1
        sentences_counter.append(word_counter)  # add the current sentence word count

ПРИМЕЧАНИЕ. В Python 2.x используйте string.maketrans() вместо str.maketrans().

В результате будет получено sentences_counter, содержащее счетчик слов для каждого из предложений в sample.txt, где ключ - это фактическое слово, а его ассоциированное значение - это количество слов. Вы можете распечатать результат как:

for i, v in enumerate(sentences_counter):
    print("Sentence #{}:".format(i+1))
    print("\n".join("\t{}: {}".format(w, c) for w, c in v.items()))

Который будет производить (для ваших данных выборки):

Sentence #1:
    area: 1
    drainage-basin: 1
    great: 1
    combined: 1
    areas: 1
    england: 1
    wales: 1
    wide: 1
    region: 1
    fertile: 1
Sentence #2:
    mississippi: 1
    valley: 1
    proper: 1
    exceptionally: 1

Имейте в виду, что (английский) язык более сложен, чем этот, например: " Кошка покачивается * , когда * * злится, поэтому держитесь подальше от it ."будет сильно различаться в зависимости от того, как вы относитесь к апострофу. Кроме того, точка не обязательно обозначает конец предложения. Вам следует изучить NLP , если вы хотите провести серьезный лингвистический анализ.

ОБНОВЛЕНИЕ : Хотя я не вижу смысла повторять каждое слово, повторяя данные (количество не будет изменяться в предложении), если вы хотите напечатать каждое слово и вложить все другие показатели ниже, вы можете просто добавить внутренний цикл при печати:

for i, v in enumerate(sentences_counter):
    print("Sentence #{}:".format(i+1))
    for word, count in v.items():
        print("\t{} {}".format(word, count))
        print("\n".join("\t\t{}: {}".format(w, c) for w, c in v.items() if w != word))

Что даст вам:

Sentence #1:
    area 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    drainage-basin 1
        area: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    great 1
        area: 1
        drainage-basin: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    combined 1
        area: 1
        drainage-basin: 1
        great: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    areas 1
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    england 1
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        wales: 1
        wide: 1
        region: 1
        fertile: 1
    wales 1
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wide: 1
        region: 1
        fertile: 1
    wide 1
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        region: 1
        fertile: 1
    region 1
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        fertile: 1
    fertile 1
        area: 1
        drainage-basin: 1
        great: 1
        combined: 1
        areas: 1
        england: 1
        wales: 1
        wide: 1
        region: 1
Sentence #2:
    mississippi 1
        valley: 1
        proper: 1
        exceptionally: 1
    valley 1
        mississippi: 1
        proper: 1
        exceptionally: 1
    proper 1
        mississippi: 1
        valley: 1
        exceptionally: 1
    exceptionally 1
        mississippi: 1
        valley: 1
        proper: 1

Не стесняйтесь убрать печать номера предложения и уменьшите один из отступов табуляции, чтобы получить что-то более желаемое из вашего вопроса. Вы также можете создать древовидный словарь, вместо того, чтобы печатать все в STDOUT, если это больше того, что вам нравится.

ОБНОВЛЕНИЕ 2 : Если вы хотите, вам не нужно использовать set для common_words. В этом случае он в значительной степени взаимозаменяем с list, так что вы можете использовать понимание списка вместо установить понимание (т.е. заменить фигурные квадратные скобки), но просмотр list операция O(n), тогда как поиск set является операцией O(1), и поэтому здесь предпочтительным является set. Не говоря уже о дополнительном преимуществе автоматической дедупликации, если в common.txt есть повторяющиеся слова.

Что касается collections.defaultdict(), то это просто для того, чтобы сэкономить нам немного кодирования / проверки, автоматически инициализируя словарь для ключа всякий раз, когда он запрашивается - без него вам придется делать это вручную:

with open("common.txt", "r") as f:  # open the `common.txt` for reading
    common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set

interpunction = ";,'\""  # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))

sentences_counter = []  # a list to hold a word count for each sentence
with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
    # read the whole file to include linebreaks and split on `.` to get individual sentences
    sentences = [s for s in f.read().split(".") if s.strip()]  # ignore empty sentences
    for sentence in sentences:  # iterate over each sentence
        sentence = sentence.translate(trans_table)  # replace the interpunction with spaces
        word_counter = {}  # initialize a word counting dictionary
        for word in sentence.split():  # split the sentence and iterate over the words
            word = word.lower()  # turn the word to lowercase
            if word not in common_words:  # count only words not in the common.txt
                word_counter[word] = word_counter.get(word, 0) + 1  # increase the last count
        sentences_counter.append(word_counter)  # add the current sentence word count

ОБНОВЛЕНИЕ 3 : Если вы просто хотите получить необработанный список слов по всем предложениям, как это выглядит из вашего последнего обновления вопроса, вам даже не нужно рассматривать сами предложения - просто добавьте точку в список связей прочитайте файл построчно, разделите его на пробелы и посчитайте слова как прежде:

import collections

with open("common.txt", "r") as f:  # open the `common.txt` for reading
    common_words = {l.strip().lower() for l in f}  # read each line and and add it to a set

interpunction = ";,'\"."  # define word separating characters and create a translation table
trans_table = str.maketrans(interpunction, " " * len(interpunction))

sentences_counter = []  # a list to hold a word count for each sentence

word_counter = collections.defaultdict(int)  # a string:int default dict for counting
with open("sample.txt", "r") as f:  # open the `sample.txt` for reading
    for line in f:  # read the file line by line
        for word in line.translate(trans_table).split():  # remove interpunction and split
            if word.lower() not in common_words:  # count only words not in the common.txt
                word_counter[word.lower()] += 1  # increase the count

print("\n".join("{}: {}".format(w, c) for w, c in word_counter.items()))  # print the counts
0 голосов
/ 14 мая 2018

Надеюсь, приведенный ниже код работает так, как вам нужно

file = ('sample.txt', 'r') 
file_1 = ('common.txt', 'r')
dict= {}
Orginal_data = file.read().split()
data=Orginal_data.lower() 
Common_data = file_1.read(). split ()
C_data=Common_data.lower()

for char in ',;\n': 
    data = data.replace(char,' ') 

for i in data:
     Value=0
     for j in C_data: 
          if i != j:
             Not_Equal=1
      If(Not_Equal==1):
          for k in data:
              if i ==k:
                  dict={ i : Value } # This line helps to count the appearance
                   Value+=1
print dict
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...