在spss中引用第二个数据集？

Question

我有一张表，其中包含不同时间点发生的 ID 和数据行。有些 ID 没有所有时间点，因为它们要么尚未发生，要么该人没有资格继续。我需要为每个 ID 创建新行并用错误代码填充它们。

我有第二个数据表，其中包含我想要使用的 ID 和错误代码。

我的目标是：将新行追加到我的第一个表中，并填充第二个表中指定的错误代码。

如何使用语法来避免手动创建新行并根据 SPSS 中的第二个表填充信息？

例如：

表A

身份证	时间点	值1	值2	值3
1	1	10	12	13
1	2	30	22	20
1	3	11	11	45
2	1	81	10	10
2	2	20	32	21
3	1	11	15	12

表 B：为什么缺失？

身份证	时间点	为何失踪
2	3	-20
3	2	-15
3	3	-15

所需表C：

身份证	时间点	值1	值2	值3
1	1	10	12	13
1	2	30	22	20
1	3	11	11	45
2	1	81	10	10
2	2	20	32	21
2	3	-20	-20	-20
3	1	11	15	12
3	2	-15	-15	-15
3	3	-15	-15	-15

SORT CASES BY SubjectID AssessmentPeriod.

* Match TableCBase with TableB to identify missing combinations.
MATCH FILES /FILE=*
  /TABLE=TableB
  /BY ID timepoint.

* Create new rows in TableCBase for missing combinations with specific error codes.
IF $CASENUM = 0.
  DO REPEAT #i = value1 value2 value3 / #r = whyMissing whyMissing whyMissing.
    COMPUTE #i = #r.
  END REPEAT.
  DATASET NAME TableBError.
END IF.

* Combine the modified TableCBase with TableA.
MATCH FILES /FILE=*
  /TABLE=TableCBase
  /BY ID timepoint.

* Sort the final table.
SORT CASES BY ID timepoint.

我已经尝试过这种语法，但我似乎无法将其正确导入到我现有的语法中。

如果我可以使用 python 来做到这一点，这就是我会使用的：

# Update the data columns with missing data codes
data_columns = [f'Value{i}' for i in range(1, 5)]  # Adjust based on the actual number of columns

# Get all unique subject IDs and assessment periods from Table A
all_subjects = table_a['subjectID'].unique()
all_periods = table_a['AssessmentPeriod'].unique()

# Create a master dataframe with all combinations of subject IDs and assessment periods
master_df = pd.DataFrame([(subject, period) for subject in all_subjects for period in all_periods],
                         columns=['subjectID', 'AssessmentPeriod'])

# Merge with Table B to get missing data codes
table_c_base = pd.merge(master_df, table_b, on=['subjectID', 'AssessmentPeriod'], how='left', suffixes=('_B', ''))

# Merge with Table A to get corresponding data values
table_c_base = pd.merge(table_c_base, table_a, on=['subjectID', 'AssessmentPeriod'], how='left', suffixes=('_B', '_A'))

#fill NA
table_c_base['Value1']=table_c_base['value1'].fillna(table_c_base['whyMissing'])
table_c_base['Value2']=table_c_base['Value2'].fillna(table_c_base['whyMissing'])
table_c_base['Value3']=table_c_base['Value3'].fillna(table_c_base['whyMissing'])


table_c_base= table_c_base.dropna(axis=1, how='all')

Answer 1

根据您的示例，更好的策略是

add files

而不是

match files

。像这样：

dataset activate TableCBase.
add files /file=* /file=TableB.
do repeat vr=var1 to var3.
   if not missing(whyMissing) vr=whyMissing.
end repeat.

该语法首先将附加数据添加为新行，然后将错误代码复制到空变量中。

在spss中引用第二个数据集？

问题描述投票：0回答：1

1个回答

最新问题

身份证	时间点	值1	值2	值3
1	1	10	12	13
1	2	30	22	20
1	3	11	11	45
2	1	81	10	10
2	2	20	32	21
2	3	-20	-20	-20
3	1	11	15	12
3	2	-15	-15	-15
3	3	-15	-15	-15

身份证	时间点	值1	值2	值3
1	1	10	12	13
1	2	30	22	20
1	3	11	11	45
2	1	81	10	10
2	2	20	32	21
2	3	-20	-20	-20
3	1	11	15	12
3	2	-15	-15	-15
3	3	-15	-15	-15

在spss中引用第二个数据集？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1

身份证	时间点	值1	值2	值3
1	1	10	12	13
1	2	30	22	20
1	3	11	11	45
2	1	81	10	10
2	2	20	32	21
2	3	-20	-20	-20
3	1	11	15	12
3	2	-15	-15	-15
3	3	-15	-15	-15