版本比较时标注错误

Question

我有一个看起来像这样的数据框：

api_spec_id   commit_date             info_version          label
500                2020-07-22           1.1 
500                2020-11-09           1.1 
500                2020-11-16           1.1 
500                2020-11-16           1.1 
500                2020-11-23           1.1 
500                2021-02-01           1.1     
138641             2020-06-25          0.1.0                 major
138641             2020-06-25          0.1.0    
138641             2020-06-27          0.1.0    
138641             2020-06-27          0.1.9                 patch
138641             2020-06-27          0.1.10                patch
138641             2020-06-27          0.1.11                patch
138641             2020-06-27          0.1.13                patch
138641             2020-06-27          0.1.14                patch
138641             2020-06-27          0.1.15                patch
138641             2020-06-28          0.2.0                 minor.patch
138641             2020-06-30          0.2.1                 patch
138641             2020-07-01          0.3.0                 minor.patch
138641             2020-07-08          0.4.0                 minor
138641             2020-07-11          0.5.0                 minor
138641             2020-07-12          0.6.0                 minor

我正在尝试比较连续行之间的版本然后标记它们，但我面临的问题通常是第一个

commit_date

的一些

api_spec_id

也有一个标签，正如我们可以看到的id 138641 ，当它应该为空时，因为没有前一行可以与之比较。

这是下面的代码，我觉得这个问题可能是因为

sem

函数，因为它提取所有行的版本，这可能会导致

diff

中的一些问题，但它再次适用于某些行

id

的，这很奇怪，我无法调试这个问题。

pat = r'(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(?:\.(?P<micro>\d+))?'

sem = new['info_version'].str.extract(pat).fillna(0).astype(int)

diff = sem.diff().fillna(0).ne(0)

new['label'] = diff.dot(sem.columns + '.').str.rstrip('.')

代码的第二部分适用于所有不是有效语义版本且必须使用

Version

类进行解析的版本。

attrs = ['major', 'minor', 'micro', 'pre', 'post', 'dev', 'local']
def extract_version(ver):
    ver = Version(ver)  
    return pd.Series({attr: getattr(ver, attr) for attr in attrs}, dtype=str)


sem = new['info_version'].agg(extract_version).fillna('').rename(columns={'micro': 'patch'})
diff = sem.ne(sem.shift().fillna(sem.iloc[0]))
new['label'] = diff.dot(sem.columns + '.').str.rstrip('.')

有没有其他方法可以计算出差异？任何建议或想法将不胜感激。

Answer 1

根据您之前的一个问题，我建议您按

api_spec_id

列分组以处理版本：

api_spec_id       commit_date   info_version    label
500                2021-02-01            1.1     
138641             2020-06-25          0.1.0    major  # <- without groupby

如果你使用

groupby

，输出将是：

api_spec_id       commit_date   info_version    label
500                2021-02-01            1.1     
138641             2020-06-25          0.1.0           # <- with groupby

所以你应该使用：

pat = r'(?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(?:\.(?P<micro>\d+))?'

sem = new['info_version'].str.extract(pat).fillna(0).astype(int)

# diff = sem.diff().fillna(0).ne(0)
diff = sem.groupby(new['api_spec_id']).diff().fillna(0).ne(0)

new['label'] = diff.dot(sem.columns + '.').str.rstrip('.')

对于您的第二部分代码，这是完全相同的问题：

attrs = ['major', 'minor', 'micro', 'pre', 'post', 'dev', 'local']
def extract_version(ver):
    ver = Version(ver)  
    return pd.Series({attr: getattr(ver, attr) for attr in attrs}, dtype=str)


sem = new['info_version'].agg(extract_version).fillna('').rename(columns={'micro': 'patch'})
# diff = sem.ne(sem.shift().fillna(sem.iloc[0]))
diff = (sem.groupby(new['api_spec_id'], group_keys=False)
           .apply(lambda x: x.ne(x.shift().fillna(x.iloc[0]))))

new['label'] = diff.dot(sem.columns + '.').str.rstrip('.')

在这两种情况下，现在的输出是：

>>> new
    api_spec_id commit_date info_version        label
0           500  2020-07-22          1.1             
1           500  2020-11-09          1.1             
2           500  2020-11-16          1.1             
3           500  2020-11-16          1.1             
4           500  2020-11-23          1.1             
5           500  2021-02-01          1.1             
6        138641  2020-06-25        0.1.0             
7        138641  2020-06-25        0.1.0             
8        138641  2020-06-27        0.1.0             
9        138641  2020-06-27        0.1.9        patch
10       138641  2020-06-27       0.1.10        patch
11       138641  2020-06-27       0.1.11        patch
12       138641  2020-06-27       0.1.13        patch
13       138641  2020-06-27       0.1.14        patch
14       138641  2020-06-27       0.1.15        patch
15       138641  2020-06-28        0.2.0  minor.patch
16       138641  2020-06-30        0.2.1        patch
17       138641  2020-07-01        0.3.0  minor.patch
18       138641  2020-07-08        0.4.0        minor
19       138641  2020-07-11        0.5.0        minor
20       138641  2020-07-12        0.6.0        minor

版本比较时标注错误

问题描述投票：0回答：1

1个回答

最新问题

版本比较时标注错误

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1