Разбор xml файла pandas дф - PullRequest
0 голосов
/ 29 марта 2020

У меня есть сотни файлов xml, как показано в примере ниже. Я хочу извлечь village_name и detail и сохранить его в pandas фрейме данных. Я пытаюсь проанализировать xml с помощью ElementTree, но я не могу извлечь эти данные.

<?xml version="1.0" encoding="UTF-8"?>
<Report  xsi:schemaLocation="NutrientStatusFarmerWise http://10.248.218.50/ReportServer?%2FSoilHealthManagement%2FNutrientStatusFarmerWise&amp;rs%3ACommand=Render&amp;rs%3AFormat=XML&amp;rs%3ASessionID=1uc05ob0bluyfe55v3pjltmi&amp;rc%3ASchema=True" Name="NutrientStatusFarmerWise" textbox1=" Nutrient Status - FarmerWise" Textbox38="Ghazipur" Textbox37="Uttar Pradesh" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="NutrientStatusFarmerWise">
    <table1  Textbox48="Sr.No." textbox2="Sample No." textbox3="Farmer Name" Textbox4="Land Area" textbox5="Khasra No./ Dag No." textbox6="Survey No." textbox10="Longitude" textbox11="Latitude" textbox16="pH" Textbox12="EC" Textbox17="OC" Textbox19="N" Textbox21="P" Textbox23="K" Textbox25="S" Textbox27="Zn" Textbox29="Fe" Textbox31="Cu" Textbox33="Mn" Textbox35="B">
        <sub_district_name_english_Collection>
            <sub_district_name_english  sub_district_name_english1="Sub District/Mandal: Kasimabad">
                <village_name_Collection>
                    <village_name  Textbox49="1" sub_district_name_english="Village: BARHATA">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP6146509/2017-18/72711093" farmername="इसराइल" DagNo="5" SurveyNo="1" longitude="83.4281000000" Latitude="25.9001000000" pH="7.50000 MAl " EC="0.61800 N   " OC="0.33000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.42000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="2" sample_no="UP6146509/2017-18/72711315" farmername="इसराइल" DagNo="5" SurveyNo="1" longitude="83.4218000000" Latitude="25.9001000000" pH="7.80000 MAl " EC="0.68000 N   " OC="0.25000 L " N="76.50000 VL" P="18.00000 L " K="89.60000 L " S="7.30000 D" Zn="0.50000 D" Fe="1.06000 D" Cu="0.80000 S" Mn="6.11000 S" B="0.77000 S"/>
                            <Detail  Textbox50="3" sample_no="UP6146509/2017-18/73629456" farmername="इसराइल " LandArea="0.22" DagNo="5" SurveyNo="1" longitude="83.1800000000" Latitude="25.9001000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="4.54000 S" B="0.02000 D"/>
                            <Detail  Textbox50="4" sample_no="UP6146509/2017-18/73041814" farmername="इसराइल" LandArea="0.21" DagNo="5" SurveyNo="1" longitude="83.4228130000" Latitude="25.4428129700" pH="7.80000 MAl " EC="0.87500 N   " OC="0.33000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="4.54000 S" B="0.02000 D"/>
                            <Detail  Textbox50="5" sample_no="UP6146509/2017-18/73042119" farmername="इसराइल" LandArea="0.21" DagNo="5" SurveyNo="1" longitude="83.4228138100" Latitude="25.4428129800" pH="7.50000 MAl " EC="0.61800 N   " OC="0.33000 L " N="87.15000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.28000 D" Fe="11.62000 S" Cu="0.29000 S" Mn="4.54000 S" B="0.02000 D"/>
                        </Detail_Collection>
                    </village_name>
                    <village_name  Textbox49="2" sub_district_name_english="Village: CHAK SIMA">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP6146513/2017-18/71662420" farmername="अभिमन्यु" DagNo="5" SurveyNo="1" longitude="83.1500000000" Latitude="25.3600000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="4.54000 S" B="0.27500 D"/>
                        </Detail_Collection>
                    </village_name>
                    <village_name  Textbox49="3" sub_district_name_english="Village: CHAKAB TALIYA">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP6146515/2017-18/71767236" farmername="गणेशदत्त" LandArea="1.16" DagNo="1" SurveyNo="1" longitude="83.3966470000" Latitude="25.4239670000" pH="7.50000 MAl " EC="0.84000 N   " OC="0.37000 L " N="90.00000 VL" P="9.00000 VL" K="112.00000 L " S="9.00000 D" Zn="0.42000 D" Fe="6.75000 S" Cu="1.17000 S" Mn="3.89000 S" B="0.27500 D"/>
                            <Detail  Textbox50="2" sample_no="UP6146515/2017-18/71767591" farmername="गणेशदत्त" LandArea="1.16" DagNo="1" SurveyNo="1" longitude="83.3966470000" Latitude="25.4239670000" pH="7.80000 MAl " EC="0.87500 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="3" sample_no="UP6146515/2017-18/71806573" farmername="गणेशदत्त" DagNo="1" SurveyNo="1" longitude="83.6630000000" Latitude="25.7108330000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="7.80000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.27500 D"/>
                            <Detail  Textbox50="4" sample_no="UP6146515/2017-18/71807058" farmername="रामजी" DagNo="1" SurveyNo="1" longitude="83.9626000000" Latitude="25.9001000000" pH="7.30000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="9.00000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="5" sample_no="UP6146515/2017-18/71766655" farmername="गणेशदत्त" LandArea="1.16" DagNo="1" SurveyNo="1" longitude="83.3947000000" Latitude="25.4239670000" pH="7.80000 MAl " EC="0.87500 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="4.54000 S" B="0.27500 D"/>
                        </Detail_Collection>
                    </village_name>
                    <village_name  Textbox49="4" sub_district_name_english="Village: CHAKMIR MOHAMMAD">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP6146504/2017-18/72474629" farmername="नगीना" DagNo="6" SurveyNo="1" longitude="83.7800000000" Latitude="25.9000000000" pH="7.50000 MAl " EC="0.61800 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="1.69000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="2" sample_no="UP6146504/2017-18/72475435" farmername="नगीना" DagNo="6" SurveyNo="1" longitude="83.4545600000" Latitude="25.9001000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="1.69000 S" Mn="4.54000 S" B="0.27500 D"/>
                            <Detail  Textbox50="3" sample_no="UP6146504/2017-18/72474105" farmername="नगीना " DagNo="6" SurveyNo="1" longitude="83.7800000000" Latitude="25.9000000000" pH="7.50000 MAl " EC="0.61800 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.42000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="4.54000 S" B="0.27500 D"/>
                        </Detail_Collection>
                    </village_name>
                    <village_name  Textbox49="5" sub_district_name_english="Village: KHIDIRPUR">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP6146510/2017-18/71633882" farmername="गिरजा" DagNo="5" SurveyNo="1" longitude="83.5000000000" Latitude="25.5000000000" pH="7.80000 MAl " EC="0.84000 N   " OC="0.33000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="1.69000 S" Mn="4.54000 S" B="0.27500 D"/>
                            <Detail  Textbox50="2" sample_no="UP6146510/2017-18/71635014" farmername="गिरजा" DagNo="5" SurveyNo="1" longitude="83.5000000000" Latitude="25.5000000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="9.00000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.27500 D"/>
                            <Detail  Textbox50="3" sample_no="UP6146510/2017-18/71637973" farmername="गिरजा" DagNo="5" SurveyNo="1" longitude="83.2500000000" Latitude="25.3600000000" pH="7.50000 MAl " EC="0.61800 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.42000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="4.54000 S" B="0.02000 D"/>
                            <Detail  Textbox50="4" sample_no="UP6146510/2017-18/71638461" farmername="गिरजा" DagNo="5" SurveyNo="1" longitude="83.1500000000" Latitude="25.3600000000" pH="7.80000 MAl " EC="0.87500 N   " OC="0.37000 L " N="87.15000 VL" P="18.00000 L " K="123.20000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="5" sample_no="UP6146510/2017-18/71636837" farmername="गिरजा" DagNo="5" SurveyNo="1" longitude="83.2500000000" Latitude="25.3600000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="123.20000 L " S="4.20000 D" Zn="0.42000 D" Fe="6.75000 S" Cu="1.69000 S" Mn="3.89000 S" B="0.02000 D"/>
                        </Detail_Collection>
                    </village_name>
                    <village_name  Textbox49="6" sub_district_name_english="Village: MOHIUDDINPUR">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP6146500/2017-18/71661642" farmername="रामशंकर" LandArea="0.24" DagNo="2" SurveyNo="1" longitude="83.1700000000" Latitude="25.5000000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.33000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="9.00000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="1.17000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="2" sample_no="UP6146500/2017-18/72720259" farmername="रमाशंकर" DagNo="2" SurveyNo="1" longitude="83.4220000000" Latitude="25.9001000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="4.54000 S" B="0.02000 D"/>
                            <Detail  Textbox50="3" sample_no="UP6146500/2018-19/74758221" farmername="कपिलदेव " DagNo="16S" SurveyNo="1" longitude="83.6296270000" Latitude="25.8175170000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="1.17000 S" Mn="3.89000 S" B="0.27500 D"/>
                            <Detail  Textbox50="4" sample_no="UP6146500/2018-19/74761627" farmername="कपिलदेव " DagNo="16S" SurveyNo="1" longitude="83.6296270000" Latitude="25.8175170000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="1.17000 S" Mn="4.54000 S" B="0.02000 D"/>
                            <Detail  Textbox50="5" sample_no="UP6146500/2018-19/75627041" farmername="कपिलदेव " DagNo="16 S" SurveyNo="1" longitude="83.6296270000" Latitude="25.8175170000" pH="7.80000 MAl " EC="0.87500 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="10.80000 S" Zn="0.28000 D" Fe="6.75000 S" Cu="0.16000 D" Mn="4.54000 S" B="0.02000 D"/>
                            <Detail  Textbox50="6" sample_no="UP6146500/2018-19/75634699" farmername="कपिलदेव" LandArea="0.70" DagNo="16" SurveyNo="1" longitude="83.1800000000" Latitude="25.8988890000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="90.00000 VL" P="4.50000 VL" K="123.20000 L " S="10.80000 S" Zn="0.28000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="7" sample_no="UP6146500/2018-19/75635010" farmername="कपिलदेव" LandArea="0.70" DagNo="16" SurveyNo="1" longitude="83.1800000000" Latitude="25.9001000000" pH="7.80000 MAl " EC="0.87500 N   " OC="0.37000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.27500 D"/>
                            <Detail  Textbox50="8" sample_no="UP6146500/2018-19/74757041" farmername="कपिलदेव " DagNo="16S" SurveyNo="1" longitude="83.6296270000" Latitude="25.8175170000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="9.00000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="0.29000 S" Mn="3.56000 S" B="0.02000 D"/>
                            <Detail  Textbox50="9" sample_no="UP6146500/2018-19/74757365" farmername="कपिलदेव " DagNo="16S" SurveyNo="1" longitude="83.6296270000" Latitude="25.8175170000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.33000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="7.80000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="0.29000 S" Mn="4.54000 S" B="0.27500 D"/>
                            <Detail  Textbox50="10" sample_no="UP6146500/2018-19/74773024" farmername="कपिलदेव" DagNo="16" SurveyNo="1" longitude="83.6200000000" Latitude="25.8097220000" pH="7.10000 MAl " EC="0.87500 N   " OC="0.33000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="1.69000 S" Mn="3.89000 S" B="0.27500 D"/>
                            <Detail  Textbox50="11" sample_no="UP6146500/2018-19/74773331" farmername="कपिलदेव" DagNo="16" SurveyNo="1" longitude="83.2500000000" Latitude="25.3600000000" pH="7.30000 MAl " EC="0.61800 N   " OC="0.40000 L " N="87.15000 VL" P="9.00000 VL" K="112.00000 L " S="8.40000 D" Zn="0.42000 D" Fe="11.62000 S" Cu="1.69000 S" Mn="4.54000 S" B="0.27500 D"/>
                            <Detail  Textbox50="12" sample_no="UP6146500/2018-19/74786744" farmername="कपिलदेव" DagNo="16" SurveyNo="1" longitude="83.2500000000" Latitude="25.3600000000" pH="7.50000 MAl " EC="0.61800 N   " OC="0.33000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.42000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="4.54000 S" B="0.02000 D"/>
                        </Detail_Collection>
                    </village_name>
                    <village_name  Textbox49="7" sub_district_name_english="Village: SIURDIH">
                        <Detail_Collection>
                            <Detail  Textbox50="1" sample_no="UP205627/2017-18/66736985" farmername="विश्वनाथ" DagNo="227" SurveyNo="32" longitude="83.1875000000" Latitude="25.6672230000" pH="7.40000 MAl " EC="0.55000 N   " OC="0.40000 L " N="16.52000 VL" P="18.25000 L " K="17.65000 VL" S="19.00000 S" Zn="0.22000 D" Fe="16.52000 S" Cu="1.88000 S" Mn="14.20000 S" B="0.21000 D"/>
                            <Detail  Textbox50="2" sample_no="UP6146511/2017-18/71637587" farmername="रविंद्र कुमार" DagNo="1" SurveyNo="1" longitude="83.1800000000" Latitude="25.9001000000" pH="7.80000 MAl " EC="0.61800 N   " OC="0.37000 L " N="87.15000 VL" P="9.00000 VL" K="123.20000 L " S="7.80000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.27000 S" Mn="3.89000 S" B="0.02000 D"/>
                            <Detail  Textbox50="3" sample_no="UP6146511/2017-18/71638369" farmername="रविंद्र कुमार" DagNo="1" SurveyNo="1" longitude="83.1800000000" Latitude="25.9001000000" pH="7.50000 MAl " EC="0.87500 N   " OC="0.33000 L " N="87.15000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="4.54000 S" B="0.27500 D"/>
                            <Detail  Textbox50="4" sample_no="UP6146511/2017-18/71772329" farmername="रविंद्र कुमार" DagNo="1" SurveyNo="1" longitude="83.1800000000" Latitude="25.9001000000" pH="7.30000 MAl " EC="0.61800 N   " OC="0.39000 L " N="74.25000 VL" P="9.00000 VL" K="123.20000 L " S="9.00000 D" Zn="0.28000 D" Fe="6.75000 S" Cu="0.29000 S" Mn="3.89000 S" B="0.27500 D"/>
                        </Detail_Collection>
                    </village_name>
                </village_name_Collection>
            </sub_district_name_english>
        </sub_district_name_english_Collection>
    </table1>
</Report>

Мой код

import xml.etree.ElementTree as ET
tree = ET.parse(NutrientStatus.xml)

#Print all the children of root element
root = tree.getroot()

root.findall('village_name')

root.find('Village_name_collection')

root.findall('Detail_Collection')

Ни один из них не работает. Я буду очень признателен за любую помощь в этом отношении.

Ссылка на оригинальный файл https://drive.google.com/file/d/1XVo1cPIuYb5kQEo5gCi1OQLCXjDGiP4v/view?usp=sharing

Вывод должен быть таким

Village sample_no farmername longitude Latitude pH   EC     OC
BARHATA UP6146509 इसराइल     83.428100 25.90010 7.5  0.618  0.33
        /2017-18/                                   
         72711093

Мне нужны все значения внутри Detail тега

1 Ответ

0 голосов
/ 30 марта 2020

Хорошо, я решаю это с помощью BeautifulSoup.

Вот мой код

import pandas as pd
from bs4 import BeautifulSoup

with open(files[24],encoding="utf-8") as fp:
    soup = BeautifulSoup(fp, 'xml')

dts = soup.findAll('Detail')

df = pd.DataFrame()
for dt in dts:
   df = df.append(pd.DataFrame([list(dt.attrs.values())]))
df
Добро пожаловать на сайт PullRequest, где вы можете задавать вопросы и получать ответы от других членов сообщества.
...