Skip to content Skip to sidebar Skip to footer

How To Read A Fixed Character Length Format File In Spark

The data is as below. [Row(_c0='ACW00011604 17.1167 -61.7833 10.1 ST JOHNS COOLIDGE FLD '), Row(_c0='ACW00011647 17.1333 -61.7833 19.2 ST JOHNS

Solution 1:

You need to read as lines of text. Otherwise the delimiter is wrong

df = spark.read.text("hdfs:////data/stn") 

And then parse

df = df.select(
    df.value.substr(1, 11).alias('id'),
    df.value.substr(13, 20).alias('LATITUDE'),
    df.value.substr(22, 30).alias('LONGITUDE'),
    df.value.substr(32, 37).alias('c3'),
    df.value.substr(39, 40).alias('c4'),
    df.value.substr(42, 71).alias('c5'),
    df.value.substr(73, 75).alias('c6'),
    df.value.substr(77, 79).alias('c7'),
    df.value.substr(81, 85).alias('c8'))
df.show(3)

Post a Comment for "How To Read A Fixed Character Length Format File In Spark"