Javascript - 从AWS s3存储桶读取镶木地板数据（带有快速压缩）

Question

在nodeJS中，我试图读取一个镶木地板文件（compression ='snappy'），但没有成功。

我使用https://github.com/ironSource/parquetjs npm模块打开本地文件并读取它但reader.cursor（）抛出了“尚未实现”的神秘错误。使用哪个压缩（plain，rle或snappy）来创建输入文件并不重要，它会抛出相同的错误。

这是我的代码：

const readParquet = async (fileKey) => {

  const filePath = 'parquet-test-file.plain'; // 'snappy';

  console.log('----- reading file : ', filePath);
  let reader = await parquet.ParquetReader.openFile(filePath);
  console.log('---- ParquetReader initialized....');

  // create a new cursor
  let cursor = reader.getCursor();

  // read all records from the file and print them
  if (cursor) {
    console.log('---- cursor initialized....');

    let record = await cursor.next() ; // this line throws exception
    while (record) {
      console.log(record);
      record = await cursor.next();
    }
  }

  await reader.close();
  console.log('----- done with reading parquet file....');

  return;
};

来电阅读：

let dt = readParquet(fileKeys.dataFileKey);
dt
  .then((value) => console.log('--------SUCCESS', value))
  .catch((error) => {
    console.log('-------FAILURE ', error); // Random error
    console.log(error.stack);
  })

更多信息：1。我使用pyarrow.parquet 2在python中生成了我的镶木地板文件。我在编写文件3时使用了'SNAPPY'压缩。我可以在python中读取这些文件而没有任何问题4.我的模式没有修复（未知）每次我写拼花文件。写作时我不创建模式。 5. error.stack在console 6. console.log中打印undefined（'------- FAILURE'，错误）;打印“尚未实现”

我想知道是否有人遇到类似的问题，并有想法/解决方案分享。 BTW我的镶木地板文件存储在AWS S3位置（与此测试代码不同）。我仍然需要找到从S3存储桶读取镶木地板文件的解决方案。

任何帮助，建议，代码示例将受到高度赞赏。

Answer 1

使用var AWS = require('aws-sdk');从S3获取数据。

然后使用node-parquet将镶木地板文件读入变量。

import np = require('node-parquet');

// Read from a file:
var reader = new np.ParquetReader(`file.parquet`);
var parquet_info = reader.info();
var parquet_rows = reader.rows();
reader.close();
parquet_rows = parquet_rows + "\n";

Javascript - 从AWS s3存储桶读取镶木地板数据（带有快速压缩）

问题描述投票：2回答：1

1个回答

最新问题

Javascript - 从AWS s3存储桶读取镶木地板数据（带有快速压缩）

问题描述 投票：2回答：1

1个回答

最新问题

问题描述投票：2回答：1