SQOOP - 导入失败:无法从空字符串创建路径

问题描述 投票:0回答:2

我正在使用SQOOP增量更新将表从SQL服务器加载到HBase表。但SQL表中的空值不会导入HBase。我知道Hbase不支持空值,并且包含null的字段不会出现在Hbase中。但我担心的是,当特定列的大多数记录的空值被跳过时,即使在该字段的某些记录中存在值的情况下也是如此。以下是SQL表结构

   CREATE TABLE [dbo].[user_test](
[user_id] [nvarchar](20) NOT NULL,
[user_name] [nvarchar](100) NULL,
[password] [varchar](128) NULL,
[created_date] [datetime2](7) NULL,
[modified_date] [datetime2](7) NULL,
[last_login_date] [datetime2](7) NULL,
[email_id] [nvarchar](100) NULL,
[security_question_id] [int] NULL,
[answered_count] [int] NULL,
[skip_count] [int] NULL,
[role_id] [smallint] NULL,
[use_yn] [char](1) NULL,
[first_login] [char](1) NULL,
[score] [int] NULL,
[secret_answer] [nvarchar](100) NULL,
PRIMARY KEY CLUSTERED 
(
[user_id] ASC
 )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, 
  ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  ) ON [PRIMARY]

在上表中,email_id的值在大多数记录中为空。但即使对于存在email_id值的记录,也没有导入到Hbase表中。 sqoop命令成功获取SQL中的附加记录。 SQOOP命令如下:

   sqoop import 
   --connect "jdbc:sqlserver://107.108.32.198:1433;database=ETL_interim_DB;" 

   --username "hadoop" --password "Semco123" 
   --query "SELECT CAST(user_id AS Integer) as 
    user_id,user_name,password,modified_date,last_login_date,email_id,security_question_id,answered_count,skip_count,role_id,use_yn,first_login,score,secret_answer from 

    ETL_interim_DB.dbo.user_test WHERE \$CONDITIONS" 
    --hbase-table test2 
    --column-family cf 
    --hbase-row-key user_id 
    --split-by user_id -m 1 
    --incremental append 
    --check-column user_id 
    --last-value 10

但是显示了以下错误。

Note: Recompile with -Xlint:deprecation for details.
0    [main] ERROR org.apache.sqoop.tool.ImportTool  - Imported Failed: Can 
not create a Path from a null string

任何人都可以建议如何将SQL服务器中存在的所有值导入HBase,如果SQL中的Null值导入到将其导入Hbase表时会发生什么?

sql hadoop hbase sqoop
2个回答
1
投票

COALESCE操作让我通过提供默认值将SQL中的空字段导入HBase。以下是相同的sqoop命令:

    sqoop import 
    --connect "jdbc:sqlserver://107.108.32.198:1433;database=ETL_interim_DB;" 
    --username "hadoop" --password "Semco123" 
    --query "SELECT CAST(user_id AS Integer) as user_id
    COALESCE(user_name,'xyz') as user_name, \
    COALESCE(password,'123') as password, \
    COALESCE(created_date, '9999-12-31 00:00:00.0000000') as created_date, \
    COALESCE(modified_date,'9999-12-31 00:00:00.0000000') as modified_date, \
    COALESCE(last_login_date,'9999-12-31 00:00:00.0000000') as lastlogin, \
    COALESCE(email_id,'0') as email_id, \
    COALESCE(security_question_id,-1) as security_question_id, \
    COALESCE(answered_count,-1) as answered_count, \
    COALESCE(skip_count,-1) as skip_count, \
    COALESCE(secret_answer, '0') as secret_answer, \
    COALESCE(role_id,0) as role_id, \
    COALESCE(use_yn,'0') as use_yn, \
    COALESCE(first_login,'0') as firstlogin, \
    COALESCE(score,-1) as score from ETL_interim_DB.dbo.ms_user_detail_test WHERE \$CONDITIONS" \
   --hbase-table test2 
   --column-family cf 
   --hbase-row-key user_id 
   --split-by user_id -m 1 
   --incremental append 
   --check-column user_id 
   --last-value 10

0
投票

您可以尝试解决此问题。对于Hbase,使列具有空值,您可以更新SQL DB中的NULL值(空单元格),使其具有某些值,如“0”或文本“NULL”.Below是查询。

UPDATE [Table Name] SET [Column Name]='Null' WHERE [Column Name] IS NULL.

要么,

ALTER TABLE [Table Name] CHANGE COLUMN [Column Name] VARCHAR(50) NOT NULL DEFAULT '';

然后尝试将SQL导入到Hbase.Hope这有帮助!

© www.soinside.com 2019 - 2024. All rights reserved.