数据分析：映射和纠正列中拼写错误的问题

Question

我是新来的，所以如果我问问题很糟糕，我很抱歉。

我一直在审查 jupyter 上的一个项目的数据，我希望映射一些数据以适应公司的特定类别。

最后，我使用了一个大的 if 循环，但正如预期的问题是我无法解析每个单独的使用柱的细胞还有其他更好的方法吗？我得到的代码不起作用首先，所以我尝试用我对 python 的一点知识来改进它。

所以我想在 SicCodes 列中选择一个值，将其与映射进行比较，基本上得到名称作为输出。我认为作为第一种方法，可以使用 if 循环和然后以后再改进。但事实上，我无法将数据帧推入我的小 to_code_range 所以我想使用 for 循环执行此操作，但目前未成功。

有人对如何改进有好主意吗？预先感谢。

mappings = [
    (1000, 9990, 'Agriculture'),
    (10000, 14990, 'Mining'),
    (15000, 17990, 'Construction'),
    (18000, 19990, 'not used'),
    (20000, 39990, 'Manufacturing'),
    (40000, 49990, 'Utility Services'),
    (50000, 51990, 'Wholesale Trade'),
    (52000, 59990, 'Retail Trade'),
    (60000, 69200, 'Financials'),
    (70000, 90040, 'Services'),
    (91000, 97290, 'Public Administration'),
    (98000, 99990, 'Nonclassifiable'),
]

"""errors = set()
def to_code_range(i): 
    if type(i) != int: 
        print("Pas un int")
    if i=="None Supplied": 
        return np.nan
    code = int(i)
    for code_from, code_to, name in mappings: 
        if (code<=code_to)&(code>=code_from): 
            return name
        errors.add(code)
        return np.nan"""

def to_code_range(valeur): 
    if type(valeur) != int: print("Pas un int")
    code = int(valeur)
    if (code<1000): return np.nan
    if (code>=1000)&(code<=9990): return "Agriculture"
    if (code>=10000)&(code<=14990): return "Mining" 
    if (code>=10000)&(code<=14990): return "Mining"
    if (code>=15000)&(code<=17990): return "Construction"
    if (code>=18000)&(code<=19990): return "not used"
    if (code>=20000)&(code<=39990): return "Manufacturing"
    if (code>=40000)&(code<=49990): return "Utility Services"
    if (code>=50000)&(code<=51990): return "Wholesale Trade"
    if (code>=52000)&(code<=59990): return "Retail Trade"
    if (code>=60000)&(code<=69200): return "Financials"
    if (code>=70000)&(code<=90040): return "Services"
    if (code>=91000)&(code<=97290): return "Public Administration"
    if (code>=98000)&(code<=99990): return "Nonclassifiable"
    else :return np.nan
        
#report['SICCode.SicText_1'] = to_code_range(report["SicCodes"])
for i in report['SicCodes']: report['SICCode.SicText_1'][i] = to_code_range(i)

if循环和for循环但我的输出有错误

Answer 1

我会通过以下方式做到这一点：

import pandas as pd
import numpy as np

mappings = [
    (1000, 9990, 'Agriculture'),
    (10000, 14990, 'Mining'),
    (15000, 17990, 'Construction'),
    (18000, 19990, 'not used'),
    (20000, 39990, 'Manufacturing'),
    (40000, 49990, 'Utility Services'),
    (50000, 51990, 'Wholesale Trade'),
    (52000, 59990, 'Retail Trade'),
    (60000, 69200, 'Financials'),
    (70000, 90040, 'Services'),
    (91000, 97290, 'Public Administration'),
    (98000, 99990, 'Nonclassifiable'),
]

def to_code_range(valeur): 
    if type(valeur) != int: 
        print("Pas un int")
        return np.nan
    for code_from, code_to, name in mappings:
        if code_from <= valeur <= code_to:
            return name
    return np.nan

# Assuming 'report' is a DataFrame with a column 'SicCodes'
report = pd.DataFrame({
    'SicCodes': [1000, 15000, 20000, 40000, 50000, 60000, 70000, 91000, 98000]
})

report['SICCode.SicText_1'] = report['SicCodes'].apply(to_code_range)

print(report)

解释器的输出

   SicCodes      SICCode.SicText_1
0      1000            Agriculture
1     15000           Construction
2     20000          Manufacturing
3     40000       Utility Services
4     50000        Wholesale Trade
5     60000             Financials
6     70000               Services
7     91000  Public Administration
8     98000        Nonclassifiable

数据分析：映射和纠正列中拼写错误的问题

问题描述投票：0回答：1

1个回答

最新问题

数据分析：映射和纠正列中拼写错误的问题

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1