对复杂数据的子集进行排序和排序

问题描述 投票:-1回答:2

我在“县”内的“城市”中有一个关于道路事故的庞大而复杂的GIS数据文件。行代表道路。列提供“城市”,“县”和“城市事故总数”。因此,一个城市包含数条道路(事故总和的重复值),而一个县包含数个城市。对于每个“县”,我现在要根据事故的数量对城市进行排名,以便在每个“县”内,事故最多的城市排名为“ 1”,事故较少的城市排名为“ 2”及以上。该等级值应写入原始数据文件。

我最初的方法是:1.根据“县” _ID”和“事故”对数据进行排序(降序)2.而不是为每一行计算:

if('County' in row 'n+1' = 'County' in row ’n’) AND (Accidents in row 'n+1' = 'Accidents' in row ’n’): 
    return value: ’n’  ## maintain same rank for cities within 'County'

else if ('County' in row 'n+1' = 'County' in row ’n’) AND if ('Accidents' in row 'n+1' < 'Accidents' in row ’n’): 
    return value: ’n+1’  ## increasing rank value  within 'County'

else if ('County' in row 'n+1' < 'County' in row ’n’) AND ('Accidents' in row 'n+1’ < 'Accidents' in row ’n’): 
    return value:’1’  ## new 'County', i.e. start ranking from 1

else:  
    return “0” #error

但是,我不知道如何正确编码;也许这种方法也不适合。也许循环可以解决问题?

有什么建议吗?

python subset arcgis ranking
2个回答
0
投票

建议使用Python Pandas module

虚拟数据

使用县,事故,城市列创建数据

将使用pandas read_csv加载实际数据。

import pandas as pd
df = pd.DataFrame([
    ['a', 1, 'A'],
    ['a', 2, 'B'],
    ['a', 5, 'C'],
    ['b', 5, 'D'],
    ['b', 5, 'E'],
    ['b', 6, 'F'],
    ['b', 8, 'G'],
    ['c', 2, 'H'],
    ['c', 2, 'I'],
    ['c', 7, 'J'],
    ['c', 7, 'K']
], columns = ['county', 'accidents', 'city'])

结果数据框

df:

  county  accidents city
0       a          1    A
1       a          2    B
2       a          5    C
3       b          5    D
4       b          5    E
5       b          6    F
6       b          8    G
7       c          2    H
8       c          2    I
9       c          7    J
10      c          7    K

按县分组数据行,按事故分组rank行内的事故

排名代码

# ascending = False causes cities with most accidents to be ranked = 1
df["rank"] = df.groupby("county")["accidents"].rank("dense", ascending=True)

结果

df:

  county  accidents city  rank
0       a          1    A   3.0
1       a          2    B   2.0
2       a          5    C   1.0
3       b          5    D   3.0
4       b          5    E   3.0
5       b          6    F   2.0
6       b          8    G   1.0
7       c          2    H   2.0
8       c          2    I   2.0
9       c          7    J   1.0
10      c          7    K   1.0

0
投票

我认为@DarryIG的方法是正确的,但它不认为环境是ArcGIS。

由于您用Python标记了问题,所以我想出了一个使用Pandas的工作流程。使用ArcGIS工具和或字段计算器,还有其他方法可以做到这一点。

import arcpy # if you are using this script outside ArcGIS
import pandas as pd 

# change this to your actual shapefile, you might have to include a path
filename = "road_accidents" 
sFields = ['County', 'City', 'SumOfAccidents'] # consider this to be your columns

# read everything in your file into a Pandas DataFrame with a SearchCursor
with arcpy.da.SearchCursor(filename, sFields) as sCursor:
    df = pandas.DataFrame(data=[row for row in sCursor], columns=field_names)
df = df.drop_duplicates() # since each row represents a street, we can remove duplicate
# we use this code from DarrylG to calculate a rank
df['Rank'] = df.groupby('County')['SumOfAccidents'].rank('dense', ascending=True)
# set a multiindex, since there might be duplicate city-names
df = df.set_index(['County', 'City'])
dct = df.to_dict() # convert the dataframe into a dictionary

# add a field to your shapefile
arcpy.AddField_management('Rank', 'Rank', 'SHORT')

# now we can update the Shapefile
uFields = ['County', 'City', 'Rank']
with arcpy.da.UpdateCursor(filename, uFields) as uCursor: # open a UpdateCursor on the file
    for row in uCursor: # for each row (street)
        # get the county/city combo
        County_City = (row[uFields.index('County')], row[uFields.index('City')])
        if County_City in dct: # see if it is in your dictionary (it should)
            # give it the value from dictionary
            row[uFields.index('Rank')] = dct['Rank'][County_City] 
        else: 
            # otherwise...
            row[uFields.index('Rank')] = 999
        uCursor.updateRow(row) # update the row

您可以在ArcGIS Pro Python控制台中运行此代码。或使用Jupyter笔记本。希望能帮助到你!

© www.soinside.com 2019 - 2024. All rights reserved.