我将多个文件存储在hdfs位置,如下所示
/ user / project / 202005 / part-01798
/ user / project / 202005 / part-01799
有2000个此类零件文件。每个文件的格式{'Name':'abc','Age':28,'Marks':[20,25,30]} , {'Name':...}
依此类推。我有2个问题
1) How to check whether these are multiple files or multiple partitions of the same file
2) How to read these in a data frame using pyspark
df = spark.read.csv('/user/project/202005/',header=True, inferSchema=True)