[在Java中使用简单的嵌套循环实现内部联接

Question

我在一次采访中被问到如何在Java中使用嵌套的for循环实现内部联接。我在Internet上找到有关Hash Join的信息https://rosettacode.org/wiki/Hash_join，但在Internet上找不到任何解释内部联接的简单嵌套循环实现的信息。我尝试实现代码，但如代码注释中所述，卡在了几个地方。

/**
 * 
 * @param R
 * @param index1 Join column for table R.
 * @param S
 * @param index2 Join column for table S.
 * @return
 */
public String[][] innerJoin(String[][] R, int index1, String[][] S, int index2) {
    // How to define the result array. What should be it's size?? Is the below code correct.
    String[][] result = new String[R.length + S.length][R[0].length + S[0].length];

    // loop through both the tables to find out when the join column have common values.
    // output those common values.
    for (int i = 0; i < R.length; i++) {
        for (int j = 0; j < S.length; j++) {
            if (R[i][index1] == S[j][index2]) {
                // How to combine both tables here ???
            }
        }
    }

    return result;
}

）

Answer 1

您已经在问题代码中正确识别了3个重要问题：

您如何计算结果表的大小？
您如何找到匹配项？
找到匹配项后，如何将其添加到结果表中？

计算结果的简单方法是将匹配项存储在其他位置，然后在返回匹配项之前计算找到的匹配项数。从这个意义上讲，最好使用ArrayList<String[]>而不是String[][]，因为您可以追加到ArrayLists，但不能更改数组的大小。

使用双循环查找匹配项确实是非常低效的O(nm)，但是，嘿，如果那是他们想要的，那肯定可以做到。首先对索引进行排序然后再进行处理会容易得多（O(n log n + m log m + n log m)，具有O（n + m）的额外内存）；或构建哈希表并使用它们（O(n + m + n) = O(n + m)）。

选择要返回的内容取决于列所代表的内容以及是否存在重复项。例如，您可以决定以下格式：

作为第一列，index1的内容
第一张表中的所有列（index1除外）
第二个表中的所有列（index2除外）。

请注意，格式的选择有些随意；您可能已经将index1留在了它的位置，然后从表2的列中省略了它。在任何情况下，有了前面的答案，您将得到：

public String[][] innerJoin(String[][] R, int index1, String[][] S, int index2) {
    // temporary storage for matches
    ArrayList<String[]> matches = new ArrayList<>();

    // loop through both the tables to find out when the join column have common values.
    // output those common values.
    for (int i = 0; i < R.length; i++) {
        for (int j = 0; j < S.length; j++) {
            if (R[i][index1] == S[j][index2]) {
                matches.add(combine(R[i], S[j], index1, index2));
            }
        }
    }

    // convert matches to expected output array
    return matches.toArray(new String[matches.size()][]);
}

private String[] combine(String[] one, String[] two, int index1, int index2) {
    String[] r = new String[one.length + two.length - 1];
    int pos = 0;
    r[pos ++] = one[index1];
    for (int i=0; i<one.length; i++) if (i != index1) r[pos ++] = one[i];
    for (int i=0; i<two.length; i++) if (i != index2) r[pos ++] = two[i];
    return r;
}

Answer 2

我将尝试给您一些提示：

结果数组的长度不是R和S表的长度之和。根据表的内容，最多可以为R.length * S.length。
结果数组中“列”的数量确实为R[0].length + S[0].length（只要数组是“真实”表且每个“行”中没有可变数量的“列”。
在循环中（在if块中，您应该
- 在结果数组的当前“输出”行（从0开始）中，将前R[0].length列（0..rl-1）设置为R[i][0] ... R[i][rl - 1]列的内容
- 然后，将rl ... R[0].length + S[0].length - 1列（rl ... rl + sl-1）设置为S[j][0] ... S[j][sl - 1]列的内容
- 在结果数组中为当前“输出”行增加一个计数器

最后，这只是一些数组偏移量算法;-）

[在Java中使用简单的嵌套循环实现内部联接

问题描述投票：0回答：2

2个回答

最新问题

[在Java中使用简单的嵌套循环实现内部联接

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2