当前位置:  首页>> 技术小册>> Spark入门教程

Parquet是一种柱状(columnar)格式,可以被许多其它的数据处理系统支持。Spark SQL提供支持读和写Parquet文件的功能,这些文件可以自动地保留原始数据的模式。

加载数据

  1. // sqlContext from the previous example is used in this example.
  2. // createSchemaRDD is used to implicitly convert an RDD to a SchemaRDD.
  3. import sqlContext.createSchemaRDD
  4. val people: RDD[Person] = ... // An RDD of case class objects, from the previous example.
  5. // The RDD is implicitly converted to a SchemaRDD by createSchemaRDD, allowing it to be stored using Parquet.
  6. people.saveAsParquetFile("people.parquet")
  7. // Read in the parquet file created above. Parquet files are self-describing so the schema is preserved.
  8. // The result of loading a Parquet file is also a SchemaRDD.
  9. val parquetFile = sqlContext.parquetFile("people.parquet")
  10. //Parquet files can also be registered as tables and then used in SQL statements.
  11. parquetFile.registerTempTable("parquetFile")
  12. val teenagers = sqlContext.sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19")
  13. teenagers.map(t => "Name: " + t(0)).collect().foreach(println)

该分类下的相关小册推荐:

暂无相关推荐.