@QuickSilver -- Please find below full stacktrace
******************
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/05/28 19:53:10 INFO SparkContext: Running Spark version 2.4.5
20/05/28 19:53:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/05/28 19:53:12 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
at org.apache.hadoop.security.Groups.<init>(Groups.java:93)
at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2422)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:293)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$5(SparkSession.scala:935)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
at dataFrameBasics.WorkingWithAvroFile$.main(WorkingWithAvroFile.scala:9)
at dataFrameBasics.WorkingWithAvroFile.main(WorkingWithAvroFile.scala)
20/05/28 19:53:12 INFO SparkContext: Submitted application: Working with Avro File
20/05/28 19:53:12 INFO SecurityManager: Changing view acls to: santosh
20/05/28 19:53:12 INFO SecurityManager: Changing modify acls to: santosh
20/05/28 19:53:12 INFO SecurityManager: Changing view acls groups to:
20/05/28 19:53:12 INFO SecurityManager: Changing modify acls groups to:
20/05/28 19:53:12 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(santosh); groups with view permissions: Set(); users with modify permissions: Set(santosh); groups with modify permissions: Set()
20/05/28 19:53:16 INFO Utils: Successfully started service 'sparkDriver' on port 50965.
20/05/28 19:53:16 INFO SparkEnv: Registering MapOutputTracker
20/05/28 19:53:16 INFO SparkEnv: Registering BlockManagerMaster
20/05/28 19:53:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/05/28 19:53:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/05/28 19:53:16 INFO DiskBlockManager: Created local directory at C:\Users\santosh\AppData\Local\Temp\blockmgr-0efb8730-052b-474d-ad33-669ad1b5d5ed
20/05/28 19:53:16 INFO MemoryStore: MemoryStore started with capacity 351.3 MB
20/05/28 19:53:16 INFO SparkEnv: Registering OutputCommitCoordinator
20/05/28 19:53:17 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/05/28 19:53:17 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://Lenovo-PC:4040
20/05/28 19:53:18 INFO Executor: Starting executor ID driver on host localhost
20/05/28 19:53:18 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50976.
20/05/28 19:53:18 INFO NettyBlockTransferService: Server created on Lenovo-PC:50976
20/05/28 19:53:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/05/28 19:53:18 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, Lenovo-PC, 50976, None)
20/05/28 19:53:18 INFO BlockManagerMasterEndpoint: Registering block manager Lenovo-PC:50976 with 351.3 MB RAM, BlockManagerId(driver, Lenovo-PC, 50976, None)
20/05/28 19:53:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, Lenovo-PC, 50976, None)
20/05/28 19:53:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, Lenovo-PC, 50976, None)
20/05/28 19:53:19 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/Users/santosh/Scala/Workspace/SparkCourseAsMavenProject/spark-warehouse').
20/05/28 19:53:19 INFO SharedState: Warehouse path is 'file:/C:/Users/santosh/Scala/Workspace/SparkCourseAsMavenProject/spark-warehouse'.
20/05/28 19:53:23 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/05/28 19:53:24 INFO InMemoryFileIndex: It took 232 ms to list leaf files for 1 paths.
**root
|-- registration_dttm: string (nullable = true)
|-- id: long (nullable = true)
|-- first_name: string (nullable = true)
|-- last_name: string (nullable = true)
|-- email: string (nullable = true)
|-- gender: string (nullable = true)
|-- ip_address: string (nullable = true)
|-- cc: long (nullable = true)
|-- country: string (nullable = true)
|-- birthdate: string (nullable = true)
|-- salary: double (nullable = true)
|-- title: string (nullable = true)
|-- comments: string (nullable = true)**
20/05/28 19:53:31 INFO FileSourceStrategy: Pruning directories with:
20/05/28 19:53:31 INFO FileSourceStrategy: Post-Scan Filters:
20/05/28 19:53:31 INFO FileSourceStrategy: Output Data Schema: struct<registration_dttm: string, id: bigint, first_name: string, last_name: string, email: string ... 11 more fields>
20/05/28 19:53:31 INFO FileSourceScanExec: Pushed Filters:
20/05/28 19:53:32 INFO CodeGenerator: Code generated in 790.65082 ms
20/05/28 19:53:35 INFO CodeGenerator: Code generated in 188.480854 ms
20/05/28 19:53:36 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 220.3 KB, free 351.1 MB)
20/05/28 19:53:42 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 20.6 KB, free 351.1 MB)
20/05/28 19:53:42 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on Lenovo-PC:50976 (size: 20.6 KB, free: 351.3 MB)
20/05/28 19:53:42 INFO SparkContext: Created broadcast 0 from show at WorkingWithAvroFile.scala:24
20/05/28 19:53:42 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4287865 bytes, open cost is considered as scanning 4194304 bytes.
20/05/28 19:53:42 INFO SparkContext: Starting job: show at WorkingWithAvroFile.scala:24
20/05/28 19:53:42 INFO DAGScheduler: Got job 0 (show at WorkingWithAvroFile.scala:24) with 1 output partitions
20/05/28 19:53:42 INFO DAGScheduler: Final stage: ResultStage 0 (show at WorkingWithAvroFile.scala:24)
20/05/28 19:53:42 INFO DAGScheduler: Parents of final stage: List()
20/05/28 19:53:42 INFO DAGScheduler: Missing parents: List()
20/05/28 19:53:42 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at show at WorkingWithAvroFile.scala:24), which has no missing parents
20/05/28 19:53:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 14.1 KB, free 351.1 MB)
20/05/28 19:53:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 6.0 KB, free 351.0 MB)
20/05/28 19:53:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on Lenovo-PC:50976 (size: 6.0 KB, free: 351.3 MB)
20/05/28 19:53:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1163
20/05/28 19:53:43 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3] at show at WorkingWithAvroFile.scala:24) (first 15 tasks are for partitions Vector(0))
20/05/28 19:53:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/05/28 19:53:43 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7732 bytes)
20/05/28 19:53:43 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/05/28 19:53:44 INFO FileScanRDD: Reading File path: file:///C:/Spark_Files/userdata1.avro, range: 0-93561, partition values: [empty row]
20/05/28 19:53:44 INFO CodeGenerator: Code generated in 85.990927 ms
20/05/28 19:53:46 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 3182 bytes result sent to driver
20/05/28 19:53:46 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 2830 ms on localhost (executor driver) (1/1)
20/05/28 19:53:46 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
20/05/28 19:53:46 INFO DAGScheduler: ResultStage 0 (show at WorkingWithAvroFile.scala:24) finished in 3.362 s
20/05/28 19:53:46 INFO DAGScheduler: Job 0 finished: show at WorkingWithAvroFile.scala:24, took 4.340480 s
**+--------------------+---+----------+---------+--------------------+------+--------------+----------------+--------------------+----------+---------+--------------------+--------+
| registration_dttm| id|first_name|last_name| email|gender| ip_address| cc| country| birthdate| salary| title|comments|
+--------------------+---+----------+---------+--------------------+------+--------------+----------------+--------------------+----------+---------+--------------------+--------+
|2016-02-03T07:55:29Z| 1| Amanda| Jordan| ajordan0@com.com|Female| 1.197.201.2|6759521864920116| Indonesia| 3/8/1971| 49756.53| Internal Auditor| 1E+02|
|2016-02-03T17:04:03Z| 2| Albert| Freeman| afreeman1@is.gd| Male|218.111.175.34| null| Canada| 1/16/1968|150280.17| Accountant IV| |
|2016-02-03T01:09:31Z| 3| Evelyn| Morgan|emorgan2@altervis...|Female| 7.161.136.94|6767119071901597| Russia| 2/1/1960|144972.51| Structural Engineer| |
|2016-02-03T12:36:21Z| 4| Denise| Riley| driley3@gmpg.org|Female| 140.35.109.83|3576031598965625| China| 4/8/1997| 90263.05|Senior Cost Accou...| |
|2016-02-03T05:05:31Z| 5| Carlos| Burns|cburns4@miitbeian...| |169.113.235.40|5602256255204850| South Africa| | null| | |
|2016-02-03T07:22:34Z| 6| Kathryn| White| kwhite5@google.com|Female|195.131.81.179|3583136326049310| Indonesia| 2/25/1983| 69227.11| Account Executive| |
|2016-02-03T08:33:08Z| 7| Samuel| Holmes|sholmes6@foxnews.com| Male|232.234.81.197|3582641366974690| Portugal|12/18/1987| 14247.62|Senior Financial ...| |
|2016-02-03T06:47:06Z| 8| Harry| Howell| hhowell7@eepurl.com| Male| 91.235.51.73| null|Bosnia and Herzeg...| 3/1/1962|186469.43| Web Developer IV| |
|2016-02-03T03:52:53Z| 9| Jose| Foster| jfoster8@yelp.com| Male| 132.31.53.61| null| South Korea| 3/27/1992|231067.84|Software Test Eng...| 1E+02|
|2016-02-03T18:29:47Z| 10| Emily| Stewart|estewart9@opensou...|Female|143.28.251.245|3574254110301671| Nigeria| 1/28/1997| 27234.28| Health Coach IV| |
+--------------------+---+----------+---------+--------------------+------+--------------+----------------+--------------------+----------+---------+--------------------+--------+
only showing top 10 rows**
20/05/28 19:53:47 INFO FileSourceStrategy: Pruning directories with:
20/05/28 19:53:47 INFO FileSourceStrategy: Post-Scan Filters:
20/05/28 19:53:47 INFO FileSourceStrategy: Output Data Schema: struct<>
20/05/28 19:53:47 INFO FileSourceScanExec: Pushed Filters:
20/05/28 19:53:48 INFO CodeGenerator: Code generated in 50.675225 ms
20/05/28 19:53:48 INFO CodeGenerator: Code generated in 45.623914 ms
20/05/28 19:53:48 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 220.3 KB, free 350.8 MB)
20/05/28 19:53:48 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 20.6 KB, free 350.8 MB)
20/05/28 19:53:48 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on Lenovo-PC:50976 (size: 20.6 KB, free: 351.3 MB)
20/05/28 19:53:48 INFO SparkContext: Created broadcast 2 from count at WorkingWithAvroFile.scala:25
20/05/28 19:53:48 INFO FileSourceScanExec: Planning scan with bin packing, max size: 4287865 bytes, open cost is considered as scanning 4194304 bytes.
20/05/28 19:53:48 INFO SparkContext: Starting job: count at WorkingWithAvroFile.scala:25
20/05/28 19:53:48 INFO DAGScheduler: Registering RDD 6 (count at WorkingWithAvroFile.scala:25) as input to shuffle 0
20/05/28 19:53:48 INFO DAGScheduler: Got job 1 (count at WorkingWithAvroFile.scala:25) with 1 output partitions
20/05/28 19:53:48 INFO DAGScheduler: Final stage: ResultStage 2 (count at WorkingWithAvroFile.scala:25)
20/05/28 19:53:48 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
20/05/28 19:53:48 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 1)
20/05/28 19:53:48 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[6] at count at WorkingWithAvroFile.scala:25), which has no missing parents
20/05/28 19:53:48 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 10.9 KB, free 350.8 MB)
20/05/28 19:53:48 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 5.5 KB, free 350.8 MB)
20/05/28 19:53:48 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on Lenovo-PC:50976 (size: 5.5 KB, free: 351.2 MB)
20/05/28 19:53:48 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1163
20/05/28 19:53:48 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[6] at count at WorkingWithAvroFile.scala:25) (first 15 tasks are for partitions Vector(0))
20/05/28 19:53:48 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
20/05/28 19:53:48 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, executor driver, partition 0, PROCESS_LOCAL, 7721 bytes)
20/05/28 19:53:48 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 27
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 17
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 14
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 20
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 22
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 28
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 12
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 13
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 30
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 21
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 9
20/05/28 19:53:49 INFO FileScanRDD: Reading File path: file:///C:/Spark_Files/userdata1.avro, range: 0-93561, partition values: [empty row]
20/05/28 19:53:49 INFO CodeGenerator: Code generated in 33.699831 ms
20/05/28 19:53:49 INFO BlockManagerInfo: Removed broadcast_1_piece0 on Lenovo-PC:50976 in memory (size: 6.0 KB, free: 351.3 MB)
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 19
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 7
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 23
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 16
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 11
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 15
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 26
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 8
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 29
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 10
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 25
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 18
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 5
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 6
20/05/28 19:53:49 INFO ContextCleaner: Cleaned accumulator 24
20/05/28 19:53:49 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1638 bytes result sent to driver
20/05/28 19:53:49 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 618 ms on localhost (executor driver) (1/1)
20/05/28 19:53:49 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
20/05/28 19:53:49 INFO DAGScheduler: ShuffleMapStage 1 (count at WorkingWithAvroFile.scala:25) finished in 0.778 s
20/05/28 19:53:49 INFO DAGScheduler: looking for newly runnable stages
20/05/28 19:53:49 INFO DAGScheduler: running: Set()
20/05/28 19:53:49 INFO DAGScheduler: waiting: Set(ResultStage 2)
20/05/28 19:53:49 INFO DAGScheduler: failed: Set()
20/05/28 19:53:49 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[9] at count at WorkingWithAvroFile.scala:25), which has no missing parents
20/05/28 19:53:49 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 8.6 KB, free 350.8 MB)
20/05/28 19:53:49 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.4 KB, free 350.8 MB)
20/05/28 19:53:49 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on Lenovo-PC:50976 (size: 4.4 KB, free: 351.3 MB)
20/05/28 19:53:49 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1163
20/05/28 19:53:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[9] at count at WorkingWithAvroFile.scala:25) (first 15 tasks are for partitions Vector(0))
20/05/28 19:53:49 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
20/05/28 19:53:49 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, executor driver, partition 0, ANY, 7246 bytes)
20/05/28 19:53:49 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
20/05/28 19:53:49 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks including 1 local blocks and 0 remote blocks
20/05/28 19:53:49 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 36 ms
20/05/28 19:53:50 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 1749 bytes result sent to driver
20/05/28 19:53:50 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 397 ms on localhost (executor driver) (1/1)
20/05/28 19:53:50 INFO DAGScheduler: ResultStage 2 (count at WorkingWithAvroFile.scala:25) finished in 0.477 s
20/05/28 19:53:50 INFO DAGScheduler: Job 1 finished: count at WorkingWithAvroFile.scala:25, took 1.384763 s
20/05/28 19:53:50 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
**Count:1000**