使用pandas.read_csv文档字符串在我写的函数

问题描述 投票:4回答:1

我想写出与下面的头一个函数:

def split_csv(file, sep=";", output_path=".", nrows=None, chunksize=None, low_memory=True, usecols=None):

正如你所看到的,我使用的是相同的参数,在几个发现pd.read_csv。我想知道(或做)是向前关于从read_csv这些参数,以我自己的功能,而无需复制/粘贴这些文档字符串。

编辑:据我所知,没有了这个盒子现有解决方案。因此,或许建设一个正在有序进行。我心里有:

some_new_fancy_library.get_doc(for_function = pandas.read_csv,for_parameters = ['sep','nrows'])将输出:

{'sep': 'doc as found in the docstring', 'nrows' : 'doc as found in the docstring', ...}

然后它会仅仅是一个插入字典的值到我自己的函数文档字符串的事

干杯

python pandas documentation docstring
1个回答
3
投票

你可以用正则表达式解析文档字符串和匹配的参数返回到您的函数:

import re

pat = re.compile(r'([\w_+]+ :)')    # capturing group for arguments

splitted = pat.split(pd.read_csv.__doc__)

# Compare the parsed docstring against your function's arguments and only extract the required docstrings
docstrings = '\n'.join([''.join(splitted[i: i+2]) for i, s in enumerate(splitted) if s.rstrip(" :") in split_csv.__code__.co_varnames])

split_csv.__doc__ = docstrings

help(split_csv)

# Help on function split_csv in module __main__:
# 
# split_csv(file, sep=';', output_path='.', nrows=None, chunksize=None, low_memory=True, usecols=None)
#   sep : str, default ','
#       Delimiter to use. If sep is None, the C engine cannot automatically detect
#       the separator, but the Python parsing engine can, meaning the latter will
#       be used and automatically detect the separator by Python's builtin sniffer
#       tool, ``csv.Sniffer``. In addition, separators longer than 1 character and
#       different from ``'\s+'`` will be interpreted as regular expressions and
#       will also force the use of the Python parsing engine. Note that regex
#       delimiters are prone to ignoring quoted data. Regex example: ``'\r\t'``
#   
#   usecols : list-like or callable, default None
#       Return a subset of the columns. If list-like, all elements must either
#       be positional (i.e. integer indices into the document columns) or strings
#       that correspond to column names provided either by the user in `names` or
#       inferred from the document header row(s). For example, a valid list-like
#       `usecols` parameter would be [0, 1, 2] or ['foo', 'bar', 'baz']. Element
#       order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.
#       To instantiate a DataFrame from ``data`` with element order preserved use
#       ``pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']]`` for columns
#       in ``['foo', 'bar']`` order or
#       ``pd.read_csv(data, usecols=['foo', 'bar'])[['bar', 'foo']]``
#       for ``['bar', 'foo']`` order.
#   
#       If callable, the callable function will be evaluated against the column
#       names, returning names where the callable function evaluates to True. An
#       example of a valid callable argument would be ``lambda x: x.upper() in
#       ['AAA', 'BBB', 'DDD']``. Using this parameter results in much faster
#       parsing time and lower memory usage.
#   
#   nrows : int, default None
#       Number of rows of file to read. Useful for reading pieces of large files
#   
#   chunksize : int, default None
#       Return TextFileReader object for iteration.
#       See the `IO Tools docs
#       <http://pandas.pydata.org/pandas-docs/stable/io.html#io-chunking>`_
#       for more information on ``iterator`` and ``chunksize``.
#   
#   low_memory : boolean, default True
#       Internally process the file in chunks, resulting in lower memory use
#       while parsing, but possibly mixed type inference.  To ensure no mixed
#       types either set False, or specify the type with the `dtype` parameter.
#       Note that the entire file is read into a single DataFrame regardless,
#       use the `chunksize` or `iterator` parameter to return the data in chunks.
#       (Only valid with C parser)

但当然,这依赖于你有确切的参数名称复制功能。正如你所看到的,你将需要添加无与伦比的文档字符串你的自我(例如fileoutput_path)。

© www.soinside.com 2019 - 2024. All rights reserved.