Question

У меня есть датафрейм, который содержит последовательность строк.Я хочу перебирать строки по очереди без изменения порядка.

Я попытался описать код ниже.

 scala> val df = Seq(
 |     (0,"Load","employeeview", "employee.empdetails", null ),
 |     (1,"Query","employeecountview",null,"select count(*) from employeeview"),
 |     (2,"store", "employeecountview",null,null)
 |   ).toDF("id", "Operation","ViewName","DiectoryName","Query")
df: org.apache.spark.sql.DataFrame = [id: int, Operation: string ... 3 more fields]

scala> df.show()
+---+---------+-----------------+-------------------+--------------------+
| id|Operation|         ViewName|       DiectoryName|               Query|
+---+---------+-----------------+-------------------+--------------------+
|  0|     Load|     employeeview|employee.empdetails|                null|
|  1|    Query|employeecountview|               null|select count(*) f...|
|  2|    store|employeecountview|               null|                null|
+---+---------+-----------------+-------------------+--------------------+

scala> val dfcount = df.count().toInt
dfcount: Int = 3

scala> for( a <- 0 to dfcount-1){
              // first Iteration I want  id =0   Operation="Load" ViewName="employeeview" DiectoryName="employee.empdetails" Query= null
                // second iteration I want  id=1  Operation="Query" ViewName="employeecountview"  DiectoryName="null" Query= "select count(*) from employeeview"
               // Third Iteration I want   id= 2  Operation= "store" ViewName="employeecountview"  DiectoryName="null"  Query= "null"
          //ignore below sample code 
         //  val Operation = get(Operation(i))                   
        //       if (Operation=="Load"){
                           // based on operation type i am calling appropriate function  and passing entire row as a parameter 
        //       } else if(Operation= "Query"){      
        //                
        //       } else if(Operation= "store"){ 

        //       }

      }

примечание: порядок обработки не должен изменяться.(Здесь Unique Identification - это ID, поэтому мы должны выполнить строку 0,1,2 и т. Д.)

Заранее спасибо.

Achilleus · Answer 1 · 23 февраля 2019

Это мое решение с использованием наборов данных.Это дало бы безопасность типов и более чистый код.Но пришлось бы измерять производительность. Это не должно сильно варьироваться.

case class EmployeeOperations(id: Int, operation: String, viewName: String,DiectoryName: String, query: String)
 val data = Seq(
    EmployeeOperations(0, "Load", "employeeview", "employee.empdetails", ""),
    EmployeeOperations(1, "Query", "employeecountview", "", "select count(*) from employeeview"),
    EmployeeOperations(2, "store", "employeecountview", "", "")
  )
  val ds: Dataset[EmployeeOperations] = spark.createDataset(data)(Encoders.product[EmployeeOperations])
  printOperation(ds).show

  def printOperation(ds: Dataset[EmployeeOperations])={
    ds.map(x => x.operation match {
      case "Query" => println("matching Query"); "Query"
      case "Load" => println("matching Load"); "Load"
      case "store" => println("matching store"); "store"
      case _ => println("Found something else") ;"Nothing"
    }
    )
  }

Я вернул здесь только строку для тестирования.Вы можете вернуть любой примитивный тип.Это вернуло бы:

scala> printOperation(ds).show
matching Load
matching Query
matching store
+-----+
|value|
+-----+
| Load|
|Query|
|store|
+-----+

stack0114106 · Answer 2 · 22 февраля 2019

Проверьте это:

scala> val df = Seq(
     |     (0,"Load","employeeview", "employee.empdetails", null ),
     |     (1,"Query","employeecountview",null,"select count(*) from employeeview"),
     |     (2,"store", "employeecountview",null,null)
     |   ).toDF("id", "Operation","ViewName","DiectoryName","Query")
df: org.apache.spark.sql.DataFrame = [id: int, Operation: string ... 3 more fields]

scala> df.show()
+---+---------+-----------------+-------------------+--------------------+
| id|Operation|         ViewName|       DiectoryName|               Query|
+---+---------+-----------------+-------------------+--------------------+
|  0|     Load|     employeeview|employee.empdetails|                null|
|  1|    Query|employeecountview|               null|select count(*) f...|
|  2|    store|employeecountview|               null|                null|
+---+---------+-----------------+-------------------+--------------------+


scala> val dfcount = df.count().toInt
dfcount: Int = 3

scala> :paste
// Entering paste mode (ctrl-D to finish)

for( a <- 0 to dfcount-1){
val operation = df.filter(s"id=${a}").select("Operation").as[String].first

operation match {

case "Query" => println("matching Query") // or call a function here for Query()
case "Load" => println("matching Load") // or call a function here for Load()
case "store" => println("matching store") //
case x => println("matched " + x )

}

}

// Exiting paste mode, now interpreting.

matching Load
matching Query
matching store

scala>

Как читать кадр данных построчно, не меняя порядок?в Spark Scala

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как читать кадр данных построчно, не меняя порядок?в Spark Scala

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов