创建mysql表并加载数据

问题描述 投票:0回答:1

我面临问题。

我有这个数据。数据存储在从以下位置提取的gene_ontology.txt中:http://gala.bx.psu.edu/downloads/hg15/genes/在此数据中,第二列中有多个字符串“-”,“,”以及字符串和数字之间的空格。

示例数据:

GO:0000001,mitochondrion inheritance
GO:0000002,mitochondrial genome maintenance
GO:0000003,reproduction
GO:0000005,ribosomal-chaperone activity
GO:0000006,high affinity zinc uptake transmembrane transporter......

我想将该数据插入我按如下方式创建的表中:

mysql> create table annotation
    -> (GOid VARCHAR(255) NOT NULL,
    -> FUNCTION TEXT NOT NULL,
   -> PRIMARY KEY goid (goid));

加载时:

LOAD DATA LOCAL INFILE 'gene_ontology.txt' INTO TABLE annotation FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';

Query OK, 26077 rows affected, 1639 warnings (0.89 sec)
Records: 26083  Deleted: 0  Skipped: 6  Warnings: 1639  (<- LOADING GAVE ME WARNING AND SKIPPED RESULTS)

结果为:

mysql> SELECT GOid FROM annotation LIMIT 6;

显示好结果

+-------------+
| GOid        |
+-------------+
|  GO:0000001 |
|  GO:0000002 |
|  GO:0000003 |
|  GO:0000005 |
|  GO:0000006 |
+-------------+

[[在此处输入图像描述] [1]

但是问题是当我同时选择两列时:

mysql> SELECT GOid, FUNCTION FROM annotation LIMIT 10;

+-------------+----------------------------------------------------------------+
| GOid        | FUNCTION                                                        |
+-------------+----------------------------------------------------------------+
                                     |nce
                              |enome maintenance
                                                  |
                                  | activity
  |GO:0000006 | high affinity zinc uptake transmembrane transporter activity
      |000007 | low-affinity zinc ion transmembrane transporter activity
                                               |
|  GO:0000009 | alpha-1                                                        |
                     |hexaprenyltranstransferase activity
                                       |
+-------------+----------------------------------------------------------------+

或仅第二个:

 +----------------------------------------------------------------+
 | FUNCION                                                        |
 +----------------------------------------------------------------+
                                      |
                               |nce
                                                  |
                                  |
 |igh affinity zinc uptake transmembrane transporter activity
 |ffinity zinc ion transmembrane transporter activity
                                                   |
 | alpha-1                                                        |
                     |stransferase activity
                                           |
 +----------------------------------------------------------------+

[[在此处输入图像描述] [2]

我不知道发生了什么。为了解决该问题,我尝试使用LONGTEXT和BLOB来更改“ FUNCTION”的类型。我已经做了这些更改,因为我认为问题在于第二列的类型(FUNCTION)。但是我没有成功。

  [1]: https://i.stack.imgur.com/FdlKw.png
  [2]: https://i.stack.imgur.com/GC6F1.png
mysql unix warnings bioinformatics informatica
1个回答
0
投票
问题似乎出在第二列文字之间的逗号“,”。尝试将完整文件以固定宽度加载到TableA中的单个列中。然后使用substr()将第一个字段放入第二个表TableB的column1,将第二个字段放入column2。这可能会加载数据,您可以进一步分析数据以改善加载过程。
© www.soinside.com 2019 - 2024. All rights reserved.