- Suppose there is a tap value in the name of a column.
- Get the schema of the data frame.
- print(df.schema)
- StructType(StructField( _a,LongType,true) ...
- Notice that there is a tab value before "_a".
- print(df.schema)
- Save it to a variable
- val schema = "StructType(StructField( _a,LongType,true) ..."
- Add quotes to the column name
- println(schema.replaceAll("StructField\\(([\\s_a-zA-Z0-9]+),", "StructField\\(\"$1\","))
- Put all StructFields inside a Seq, and change the column name which has tab value.
- StructType(Seq(StructField("bad_a",LongType,true) ...
- Create a new data frame by using the new schema
- import org.apache.spark.sql.types._
- val df2 = spark.createDataFrame(df.rdd, StructType(Seq(StructField("bad_a",LongType,true) ... )
- Use the new column name to select it.
- df2.select("*").where("bad_a is not null").show(false)
Wednesday, August 22, 2018
Select column which has special character in its name in spark
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.