在bash shell中使用Python 2.6从目录中读取文件的正确方法

Question

我正在尝试读取文件以进行文本处理。

我的想法是使用我正在编写的map-reduce代码在我的虚拟机上通过Hadoop伪分布式文件系统运行它们。界面是Ubuntu Linux，我正在安装Python 2.6。我需要使用sys.stdin来读取文件，并且sys.stdout所以我从mapper传递给reducer。

这是我的mapper测试代码：

#!/usr/bin/env python

import sys
import string
import glob
import os

files = glob.glob(sys.stdin)
for file in files:
    with open(file) as infile:
        txt = infile.read()
        txt = txt.split()
    print(txt)

我不确定glob如何与sys.stdin一起工作，我得到以下错误：

用管道测试后：

[training@localhost data]$ cat test | ./mapper.py

我明白了：

cat: test: Is a directory
Traceback (most recent call last):
  File "./mapper.py", line 8, in <module>
    files = glob.glob(sys.stdin)
  File "/usr/lib64/python2.6/glob.py", line 16, in glob
    return list(iglob(pathname))
  File "/usr/lib64/python2.6/glob.py", line 24, in iglob
    if not has_magic(pathname):
  File "/usr/lib64/python2.6/glob.py", line 78, in has_magic
    return magic_check.search(s) is not None
TypeError: expected string or buffer

目前，我只是想在一个目录中读取三个小的.txt文件。

谢谢！

Answer 1

我仍然不完全了解您的预期输出（列表或纯文本），以下内容可行：

#!/usr/bin/env python

import sys, glob

dir = sys.stdin.read().rstrip('\r\n')
files = glob.glob(dir + '/*')
for file in files:
    with open(file) as infile:
        txt = infile.read()
        txt = txt.split()
    print(txt)

然后执行：

echo "test" | ./mapper.py

我的建议是通过命令行参数提供目录名，而不是通过上面的stdin提供。如果您想调整输出格式，请告诉我。希望这可以帮助。

Answer 2

files = os.listdir（path）

使用它列出所有文件，然后申请循环。

在bash shell中使用Python 2.6从目录中读取文件的正确方法

问题描述投票：2回答：2

2个回答

最新问题

在bash shell中使用Python 2.6从目录中读取文件的正确方法

问题描述 投票：2回答：2

2个回答

最新问题

问题描述投票：2回答：2