"). To change file separator update "Value Separator" filed in ConvertExcelToCSVProcessor nifi processor.

问题描述 投票:0回答:1
Another option is to escape comma, to achieve that you need to play with "Quote Character" and with "Escape Character"

To keep values as they were in the excel file, experiment with "Format Cell Values" value.

Since Nifi does not have processor to support .XLS (older excel) to .CSV conversion, I wrote a python script to perform conversion, and calling it from ExecuteStreamCommand.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-poi-nar/1.10.0/org.apache.nifi.processors.poi.ConvertExcelToCSVProcessor/

While converting excel rows, the Python script also perform cleanup on rows such as add escape character, remove any \n so that resulted CSV won't fail at ValidateRecord or ConvertRecord processor!

Give it a try (need to tweak) and do let us know that whether it's useful in your case!

ExecuteStreamCommand Processor Configuration

我正在使用ExcelToCsv nifi处理器将xlsx文件转换为csv文件。想把一堆不同格式的数据的xlsx文件转换成csv文件。一旦文件被转换为csv,数据被改变如下。

顺便说一下。

我在ExcelToCsv处理器中使用了以下属性值。

参考ExcelToCsv nifi处理器链接。

CSV格式:自定义

值分隔符:逗号

引用字符:双引号
apache-nifi
1个回答
0
投票
  • 17.90==>17.900000001
  • 270E+11===> 270000000000
  • 34,45,67,344===>344567344 : 第三种情况下,引号没有被添加。
谁能告诉我们,为什么我在csv输出文件中得到错误的结果?

0
投票

我正在使用ExcelToCsv nifi处理器进行对话,将xlsx文件转换为csv文件。想把一堆不同格式的数据的xlsx文件转换成csv。一旦文件被转换为csv,...

import csv
import os
import sys
from io import StringIO, BytesIO
import pandas as pd
import xlrd
from pandas import ExcelFile

wb = xlrd.open_workbook(file_contents=sys.stdin.read(),logfile=open(os.devnull, 'w'))
excel_file_df = pd.read_excel(wb, sheet_name='Sheet1', index=False, index_col=0, encoding='utf-8',engine='xlrd')

#flowfile_content = ExcelFile(BytesIO(sys.stdin.read()))
#excel_file_df = pd.read_excel(flowfile_content, sheet_name='Sheet1', index=False, index_col=0, encoding='utf-8')

csv_data_rows = []
header_list = list(excel_file_df.columns.values)
temp_header_list = []

for field in header_list:
    temp = '"' + field +  '"'
    temp_header_list.append(temp)

header_row  = ','.join([str(elem) for elem in temp_header_list])
csv_data_rows.append(header_row)
is_header_row = True
for index, row in excel_file_df.iterrows():

    if is_header_row :
        is_header_row = False
        continue

    temp_data_list = []
    for item in row :
        #item = item.encode('utf-8', 'ignore').decode('utf-8')
        if hasattr(item, 'encode'):
            item = item.encode('ascii', 'ignore').decode('ascii')

        item = str(item)
        item = item.replace('\n', '')
        item = item.replace('",', '" ')
        if item == 'nan':
            item=''
        temp = '"' + str(item) + '"'
        temp_data_list.append(temp)

    data_row = ','.join([str(elem) for elem in temp_data_list])
    data_row = data_row
    csv_data_rows.append(data_row)

for item in csv_data_rows:
    sys.stdout.write("%s\r\n" % item)

逗号(",")被用作分隔符,所以你不能把34,45,67,344作为单一的值在你的csv文件中。如果你还想用逗号,你可以把文件的分隔符从逗号改成其他字符,如管道("

© www.soinside.com 2019 - 2024. All rights reserved.