Я новичок в Spark. Имейте запрос, связанный с чтением файла CSV.
Я пытаюсь прочитать 2 файла CSV в 2 отдельных фреймах данных и «взять» по 5 строк в каждом. Однако я вижу только последнее действие с кадром данных. Я что-то пропустил? Почему не было напечатано первое действие с кадром данных CSV?
# Read first CSV
file_location1 = "/FileStore/tables/airports.csv"
file_type1 = "csv"
# CSV options
infer_schema1 = "true"
first_row_is_header1 = "false"
delimiter1 = ","
# Load File 1
df1 = spark.read.format(file_type1) \
.option("inferSchema", infer_schema1) \
.option("header", first_row_is_header1) \
.option("sep", delimiter1) \
.load(file_location1)
**df1.take(5)**
# Read second CSV
file_location2 = "/FileStore/tables/Report.csv"
file_type2 = "csv"
# CSV options
infer_schema2 = "true"
first_row_is_header2 = "true"
delimiter2 = ","
# Load File 2
df2 = spark.read.format(file_type2) \
.option("inferSchema", infer_schema2) \
.option("header", first_row_is_header2) \
.option("sep", delimiter2) \
.load(file_location2)
**df2.take(5)**
Вывод: виден только вывод второго кадра данных (https://i.stack.imgur.com/bO7GO.png)
df1:pyspark.sql.dataframe.DataFrame
_c0:integer
_c1:string
_c2:string
_c3:string
_c4:string
_c5:string
_c6:double
_c7:double
_c8:integer
_c9:string
_c10:string
_c11:string
_c12:string
_c13:string
df2:pyspark.sql.dataframe.DataFrame = [Parcel(s): string, Building Name: string ... 100 more fields]
Out[1]: [Row(Parcel(s)='0022/012', Building Name='580 NORTH POINT ST', Building Address='580 NORTH POINT ST', Postal Code=94133, Full.Address='POINT (-122.416746 37.806186)', Floor Area=24022, Property Type='Commercial', Property Type - Self Selected='Hotel', PIM Link='http://propertymap.sfplanning.org/?&search=0022/012', Year Built=1900, Energy Audit Due Date=datetime.datetime(2013, 4, 1, 0, 0), Energy Audit Status='Did Not Comply', Benchmark 2018 Status='Violation - Did Not Report', 2018 Reason for Exemption=None, Benchmark 2017 Status='Violation - Did Not Report', 2017 Reason for Exemption=None, Benchmark 2016 Status='Violation - Did Not Report', 2016 Reason for Exemption=None, Benchmark 2015 Status='Violation - Did Not Report', 2015 Reason for Exemption=None, Benchmark 2014 Status='Violation - Did Not Report', 2014 Reason for Exemption=None, Benchmark 2013 Status='Violation - Did Not Report', 2013 Reason for Exemption=None, Benchmark 2012 Status='Violation - Did Not Report', 2012 Reason for Exemption=None, Benchmark 2011 Status='Exempt', 2011 Reason for Exemption='SqFt Not Subject This Year', Benchmark 2010 Status='Exempt', 2010 Reason for Exemption='SqFt Not Subject This Year', 2018 ENERGY STAR Score=None, 2018 Site EUI (kBtu/ft2)=None, 2018 Source EUI (kBtu/ft2)=None, 2018 Percent Better than National Median Site EUI=None, 2018 Percent Better than National Median Source EUI=None, 2018 Total GHG Emissions (Metric Tons CO2e)=None, 2018 Total GHG Emissions Intensity (kgCO2e/ft2)=None, 2018 Weather Normalized Site EUI (kBtu/ft2)=None, 2018 Weather Normalized