Spark Read and Write to CSV

Read

1
2
3
4
5
val logs = spark.read.csv("/tmp/sousuo")
or
val logs = spark.read.option("delimiter", "\t").csv("/tmp/sousuo")
or
val logs = spark.read.option("delimiter", ",").csv("/tmp/sousuo")

Write

1
2
3
4
5
df
.repartition(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("mydata.csv")

or coalesce:

1
2
3
4
5
df
.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("mydata.csv")

参考 https://stackoverflow.com/questions/31674530/write-single-csv-file-using-spark-csv

Share