Netezza外部表的间歇性问题。
外部表使用系统本身生成的文件失败(意味着外部表生成的文件不是来自其他来源的文件。),但是我们尝试通过nzload实用程序将同一文件加载到另一个表,并且该文件没有任何问题。此问题不一致,大多数情况下无法重现。
CREATE EXTERNAL TABLE SP_PORTFOLIO_EXT_DATA_6128_140
(
CLIENT_ID INTEGER,
CONFIG_ID INTEGER,
SCENARIO_ID INTEGER,
PORTFOLIO_ID INTEGER,
PORTFOLIO_NAME CHARACTER VARYING(200),
CUSTOM13 CHARACTER VARYING(600),
CUSTOM12 CHARACTER VARYING(500),
CUSTOM11 CHARACTER VARYING(500),
CUSTOM10 CHARACTER VARYING(500),
CUSTOM9 CHARACTER VARYING(500),
CUSTOM8 CHARACTER VARYING(500),
CUSTOM7 CHARACTER VARYING(500),
CUSTOM6 CHARACTER VARYING(2000),
CUSTOM3 CHARACTER VARYING(500),
CUSTOM2 CHARACTER VARYING(3000),
CUSTOM1 CHARACTER VARYING(500),
CREATIVE CHARACTER VARYING(512),
PLACEMENT CHARACTER VARYING(5000),
IMPRESSIONS NUMERIC(38,0),
CLICKS NUMERIC(38,0),
CONVERSIONS INTEGER,
TRUE_CONVERSIONS NUMERIC(38,6),
OPTMETRIC NUMERIC(38,6),
LASTAD_OPTMETRIC NUMERIC(38,6),
CURRSPEND NUMERIC(38,6)
)
USING
(
DATAOBJECT('/san5/Netezza/CAR/CAR_ZEUS/SPBU/test/SP_PORTFOLIO_EXT_DATA_6128_140.csv')
DELIMITER 254
ESCAPECHAR '/'
TIMESTYLE '24HOUR'
LOGDIR '/tmp'
Y2BASE 2000
ENCODING 'internal'
);
命令成功完成
select COUNT(*) from SP_PORTFOLIO_EXT_DATA_6128_140;
ERROR [HY000] ERROR: External Table : count of bad input rows reached maxerrors limit
NZLOAD方法
CREATE TABLE TEST_LOAD
(
CLIENT_ID INTEGER,
CONFIG_ID INTEGER,
SCENARIO_ID INTEGER,
PORTFOLIO_ID INTEGER,
PORTFOLIO_NAME CHARACTER VARYING(200),
CUSTOM13 CHARACTER VARYING(600),
CUSTOM12 CHARACTER VARYING(500),
CUSTOM11 CHARACTER VARYING(500),
CUSTOM10 CHARACTER VARYING(500),
CUSTOM9 CHARACTER VARYING(500),
CUSTOM8 CHARACTER VARYING(500),
CUSTOM7 CHARACTER VARYING(500),
CUSTOM6 CHARACTER VARYING(2000),
CUSTOM3 CHARACTER VARYING(500),
CUSTOM2 CHARACTER VARYING(3000),
CUSTOM1 CHARACTER VARYING(500),
CREATIVE CHARACTER VARYING(512),
PLACEMENT CHARACTER VARYING(5000),
IMPRESSIONS NUMERIC(38,0),
CLICKS NUMERIC(38,0),
CONVERSIONS INTEGER,
TRUE_CONVERSIONS NUMERIC(38,6),
OPTMETRIC NUMERIC(38,6),
LASTAD_OPTMETRIC NUMERIC(38,6),
CURRSPEND NUMERIC(38,6)
)
DISTRIBUTE ON RANDOM;
# Loading data from the same file using Nzload
nzload -host 10.200.29.30 -u xxxxx -pw xxxxx -db SPBU_REPORT_DB_TEST -t test_load -delim 254 -ctrlChars -df /san5/Netezza/CAR/CAR_ZEUS/SPBU/test/SP_PORTFOLIO_EXT_DATA_6128_140.csv
Load session of table 'TEST_LOAD' completed successfully
[ja.prod@inet11026 ~]$ cat /san5/Netezza/CAR/CAR_ZEUS/SPBU/test/SP_PORTFOLIO_EXT_DATA_6128_140.csv|wc -l
191322
select count(*) from test_load;
191322
添加nzlog
File Buffer Size (MB): 8 Load Replay Region (MB): 0
Encoding: INTERNAL Max errors: 1
Skip records: 0 Max rows: 0
FillRecord: No Truncate String: No
Escape Char: '/' Accept Control Chars: No
Allow CR in string: No Ignore Zero: No
Quoted data: NO Require Quotes: No
BoolStyle: 1_0 Decimal Delimiter: '.'
Disable NFC: No
Date Style: YMD Date Delim: '-'
Y2Base: 2000
Time Style: 24HOUR Time Delim: ':'
Time extra zeros: No
Found bad records
bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic, "text consumed"[last char examined]
----------------------------------------------------------------------------------------------------------------------------
1: 25(184) [21, INT4] expected field delimiter or end of record, "0"[.]
Statistics
number of records read: 25
number of bad records: 1
-------------------------------------------------
number of records loaded: 0
Elapsed Time (sec): 0.0
-----------------------------------------------------------------------------
Load completed at: 08-Oct-15 09:59:04 EDT
包含坏行的.nzbad数据(管道符号代表可读性的实际分隔符):
140|1305|6128||NULL|SEO|SEO|test.com/vehicledetail/detail/632888199/overview|SEO|SEO|SEO|SEO Brand|SEO Brand|best Tracking|Google(Seo)|SEO|Impression Tracker|Unknown|0|1|0|0.000000|0.000000|0.000000|0.000000
从nzlog中,我们可以知道第25行的加载失败。具体来说,在尝试加载第21列时,它遇到的值不是整数。
日志显示它遇到一个0,然后是一个句点。因此,数据可能具有0.0或0.1234之类的内容,无法作为整数加载。
bad #: input row #(byte offset to last char examined) [field #, declaration] diagnostic, "text consumed"[last char examined]
----------------------------------------------------------------------------------------------------------------------------
1: 25(184) [21, INT4] expected field delimiter or end of record, "0"[.]
使用您提供的.nzbad数据(此处以'|'代替实际的分隔符,以提高可读性:]
140|1305|6128||NULL|SEO|SEO|test.com/vehicledetail/detail/632888199/overview|SEO|SEO|SEO|SEO Brand|SEO Brand|best Tracking|Google(Seo)|SEO|Impression Tracker|Unknown|0|1|0|0.000000|0.000000|0.000000|0.000000
我注意到的一件事是您有一个带有'/'的varchar字段。您的外部表和nzload方法之间的区别之一是外部表指定了转义符'/',而nzload没有。
[您会发现您的数据'test.com/vehicledetail/detail/632888199/overview'将被加载为'test.comvehicledetaildetail632888199overview',因为'/'字符将被删除,因为它们本身不会被转义(例如'// ')。
如果在数据中的列定界符之前直接加上'/',它将指示它认为列分隔符是数据的一部分,并且会认为数据中的第22列实际上是表中的第21列,会匹配我们在这里看到的内容。
[ScottMcG,正如您所说的,我比较了Nzload和External table生成的nzlog文件,发现转义字符是唯一的区别。所以我注释掉了该部分,然后再次尝试,一切正常。
CREATE EXTERNAL TABLE SP_PORTFOLIO_EXT_DATA_6128_140
(CLIENT_ID INTEGER,CONFIG_ID INTEGER,SCENARIO_ID INTEGER,PORTFOLIO_ID INTEGER,PORTFOLIO_NAME CHARACTER VARYING(200),CUSTOM13字符变化(600),CUSTOM12字符变化(500),CUSTOM11字符变化(500),CUSTOM10字符变化(500),CUSTOM9字符变化(500),CUSTOM8字符变化(500),CUSTOM7字符变化(500),CUSTOM6 CHARACTER VARYING(2000),CUSTOM3字符变化(500),CUSTOM2字符变化(3000),CUSTOM1字符变化(500),创意人物变化(512),位置特征变化(5000),印象数(38,0),点击数值(38,0),转换整数,TRUE_CONVERSIONS NUMERIC(38,6),光学数值(38,6),LASTAD_OPTMETRIC NUMERIC(38,6),CURRSPEND NUMERIC(38,6))使用方法(DATAOBJECT('/ san5 / Netezza / CAR / CAR_ZEUS / SPBU / test / SP_PORTFOLIO_EXT_DATA_6128_140.csv')分隔符254TIMESTYLE'24HOUR'LOGDIR'/ tmp'Y2BASE 2000编码“内部”);
从SP_PORTFOLIO_EXT_DATA_6128_140中选择计数(*);
191322。
数据类型必须更改如下:CHARACTER VARYING替换VARCHAR / NVARCHAR
创建表TEST_LOAD(CLIENT_ID INTEGER,CONFIG_ID INTEGER,SCENARIO_ID INTEGER,PORTFOLIO_ID INTEGER,PORTFOLIO_NAME VARCHAR(200),自定义13 VARCHAR(600),自定义12个VARCHAR(500),CUSTOM11 VARCHAR(500),自定义10 VARCHAR(500),自定义9 VARCHAR(500),自定义8个VARCHAR(500),CUSTOM7 VARCHAR(500),CUSTOM6 VARCHAR(2000),CUSTOM3 VARCHAR(500),CUSTOM2 VARCHAR(3000),CUSTOM1 VARCHAR(500),创意VARCHAR(512),放置VARCHAR(5000),印象数(38,0),点击数值(38,0),转换整数,TRUE_CONVERSIONS NUMERIC(38,6),光学数值(38,6),LASTAD_OPTMETRIC NUMERIC(38,6),CURRSPEND NUMERIC(38,6))随机分配;