使用PHP和MySQL结表进行分类,包括和排除类

问题描述 投票:2回答:1

我试图分析使用手动分配的类别鸣叫。一切都存储在MySQL数据库。我可以添加和删除微博,类别,以及它们之间的关系没有任何问题。

包括使用逻辑作品类别或预期。如果我想找到归类为“委内瑞拉”或“马杜罗,”鸣叫我在名为$include$include_logic设置为"or"阵列发送这些两届。任何一类下分为鸣叫返回。大!

当我尝试使用AND逻辑(即,在全体归类鸣叫包括术语,例如,既委内瑞拉和马杜罗)或当我尝试不含类的问题开始。

下面的代码:

function filter_tweets($db, $user_id, $from_utc, $to_utc, $include = null, $include_logic = null, $exclude = null) {

    $include_sql = '';
    if (isset($include)) {
        $include_sql = 'AND (';
        $logic_op = '';
        foreach ($include as $cat) {
            $include_sql .= "{$logic_op}cats.name = '$cat' ";
            $logic_op = ($include_logic != 'and') ? 'OR ' : 'AND '; # AND doesn't work here
        }
        $include_sql .= ')';
    }
    $exclude_sql = ''; # Nothing I've tried with this works.

    $sql = "
        SELECT DISTINCT tweets.id FROM tweets
            LEFT OUTER JOIN tweets_cats ON tweets.id = tweets_cats.tweet_id
            LEFT OUTER JOIN cats ON tweets_cats.cat_id = cats.id 
        WHERE tweets.user_id = $user_id
            AND created_at
                BETWEEN '{$from_utc->format('Y-m-d H:i:s')}'
                    AND '{$to_utc->format('Y-m-d H:i:s')}'
            $include_sql
            $exclude_sql
        ORDER BY tweets.created_at ASC;";

    return db_fetch_all($db, $sql);   
}

其中db_fetch_all()

function db_fetch_all($con, $sql) {

    if ($result = mysqli_query($con, $sql)) {
        $rows = mysqli_fetch_all($result);
        mysqli_free_result($result);
        return $rows;
    }
    die("Failed: " . mysqli_error($con)); 
}

tweets_catstweetscats表之间的联接表。

在连接和接线表阅读后,我明白了为什么我的代码不提到的两种情况下工作。它只能看一个鸣叫和相应类别在一个时间。所以要求它省略归类为“X”鸣叫是没有实际意义,因为当遇到同样的鸣叫和归类为“Y”它不会忽略它。

我不明白的是如何修改代码,以便它的工作。我还没有发现人试图做类似的事情的任何实例。也许我不是寻找合适的条款。我会很感激,如果有人可以指向我一个很好的资源用在MySQL结表相似,我如何使用它们的工作。


Edit: Here's working SQL created by the function using the example mentioned above including "Venezuela" OR "Maduro" on the VP twitter account with the date range set to tweets so far this month (EST converted to UTC).
SELECT DISTINCT tweets.id FROM tweets
    LEFT OUTER JOIN tweets_cats ON tweets.id = tweets_cats.tweet_id
    LEFT OUTER JOIN cats ON tweets_cats.cat_id = cats.id
WHERE tweets.user_id = 818910970567344128
    AND created_at BETWEEN '2019-02-01 05:00:00' AND '2019-03-01 05:00:00'
    AND (cats.name = 'Venezuela' OR cats.name = 'Maduro' )
ORDER BY tweets.created_at ASC;


Update: Here's working SQL that adheres to AND logic for included categories. Many thanks to @Strawberry for the suggestion!
SELECT tweets.id FROM tweets
    LEFT OUTER JOIN tweets_cats ON tweets.id = tweets_cats.tweet_id
    LEFT OUTER JOIN cats ON tweets_cats.cat_id = cats.id 
WHERE tweets.user_id = 818910970567344128
    AND created_at BETWEEN '2019-02-01 05:00:00' AND '2019-03-01 05:00:00'
    AND cats.name IN ('Venezuela', 'Maduro')
GROUP BY tweets.id
HAVING COUNT(*) = 2
ORDER BY tweets.created_at ASC;

这是一个有点超出我的理解SQL,虽然。我很高兴它的工作原理。我只希望我的理解如何。


Update 2: Here's working SQL that excludes categories. I realized that the AND/OR logic that applies to included categories also applies to excluded ones. This example uses OR logic. The syntax is essentially Q1 NOT IN (Q2), where Q2 is what's excluded, which is basically the same query used for inclusion.
SELECT id FROM tweets
WHERE user_id = 818910970567344128
    AND created_at BETWEEN '2019-02-01 05:00:00' AND '2019-03-01 05:00:00'
    AND id NOT IN (
        SELECT tweets.id FROM tweets
            LEFT OUTER JOIN tweets_cats ON tweets.id = tweets_cats.tweet_id
            LEFT OUTER JOIN cats ON tweets_cats.cat_id = cats.id 
        WHERE tweets.user_id = 818910970567344128
            AND created_at BETWEEN '2019-02-01 05:00:00' AND '2019-03-01 05:00:00'
            AND cats.name IN ('Venezuela','Maduro')
    )
ORDER BY created_at ASC;


Update 3: Here's the working code.
function filter_tweets($db, $user_id, $from_utc, $to_utc,
                       $include = null, $include_logic = null,
                       $exclude = null, $exclude_logic = null) {

    if (isset($exclude)) {
        $exclude_sql = "
              AND tweets.id NOT IN (\n"
            . include_tweets($user_id, $from_utc, $to_utc, $exclude, $exclude_logic)
            . "\n)";
    } else {
        $exclude_sql = '';
    }
    if (isset($include)) {
        $sql = include_tweets($user_id, $from_utc, $to_utc, $include, $include_logic, $exclude_sql);
    } else {
        $sql = "
            SELECT id FROM tweets
            WHERE user_id = $user_id
              AND created_at
                BETWEEN '{$from_utc->format('Y-m-d H:i:s')}'
                    AND '{$to_utc  ->format('Y-m-d H:i:s')}'
              $exclude_sql";
    }
    $sql .= "\nORDER BY tweets.created_at ASC;";

    return db_fetch_all($db, $sql);   
}

其依赖于用于生成SQL此附加功能:

function include_tweets($user_id, $from_utc, $to_utc, $include, $logic, $exclude_sql = '') {

    $group_sql = '';
    $include_sql = 'AND cats.name IN (';
    $comma = '';
    foreach ($include as $cat) {
        $include_sql .= "$comma'$cat'";
        $comma = ',';
    }
    $include_sql .= ')';
    if ($logic == 'and')
        $group_sql = 'GROUP BY tweets.id HAVING COUNT(*) = ' . count($include);
    return "
        SELECT tweets.id FROM tweets
          LEFT OUTER JOIN tweets_cats ON tweets.id = tweets_cats.tweet_id
          LEFT OUTER JOIN cats ON tweets_cats.cat_id = cats.id 
        WHERE tweets.user_id = $user_id
          AND created_at
            BETWEEN '{$from_utc->format('Y-m-d H:i:s')}'
                AND '{$to_utc  ->format('Y-m-d H:i:s')}'
          $include_sql
        $group_sql
        $exclude_sql";
}
php mysql join mysqli junction-table
1个回答
0
投票

要做到这一点的方法之一是加入你tweets表与联接表多次,例如像这样:

SELECT tweets.*
FROM tweets
  JOIN tweet_cats AS tweet_cats_foo
    ON tweet_cats_foo.tweet_id = tweets.id
  JOIN tweet_cats AS tweet_cats_bar
    ON tweet_cats_bar.tweet_id = tweets.id
WHERE
  tweet_cats_foo.name = 'foo' AND tweet_cats_bar.name = 'bar'

或者等价地,像这样的:

SELECT tweets.*
FROM tweets
  JOIN tweet_cats AS tweet_cats_foo
    ON tweet_cats_foo.tweet_id = tweets.id
    AND tweet_cats_foo.name = 'foo'
  JOIN tweet_cats AS tweet_cats_bar
    ON tweet_cats_bar.tweet_id = tweets.id
    AND tweet_cats_bar.name = 'bar'

需要注意的是,为了简单起见,我假定上面,你的结表直接包含的类别名称。如果你坚持要用数字类别ID,但按名称搜索类别,我建议创建加入类别和接线表使用数字类别ID,并使用该视图,而不是在您的查询的实际结台在一起的视图。这样可以节省您不必包括一大堆的查询不必要的样板代码只是为了找到数字类别ID。

排除查询,你可以使用一个LEFT JOIN并确认是否有匹配的记录在接线表中存在(在这种情况下,所有从该表中的列会NULL),就像这样:

SELECT tweets.*
FROM tweets
  LEFT JOIN tweet_cats AS tweet_cats_foo
    ON tweet_cats_foo.tweet_id = tweets.id
    AND tweet_cats_foo.name = 'foo'
WHERE
  tweet_cats_foo.tweet_id IS NULL  -- could use any non-null column here

(使用这种方法,你需要包括tweet_cats_foo.name = 'foo'条款,而不是LEFT JOIN子句中的WHERE条件。)

当然,你也可以结合这些。例如,要查找类别foo鸣叫,但不是在bar,你可以这样做:

SELECT tweets.*
FROM tweets
  JOIN tweet_cats AS tweet_cats_foo
    ON tweet_cats_foo.tweet_id = tweets.id
    AND tweet_cats_foo.name = 'foo'
  LEFT JOIN tweet_cats AS tweet_cats_bar
    ON tweet_cats_bar.tweet_id = tweets.id
    AND tweet_cats_bar.name = 'bar'
WHERE
  tweet_cats_bar.tweet_id IS NULL

或者,再次等价:

SELECT tweets.*
FROM tweets
  LEFT JOIN tweet_cats AS tweet_cats_foo
    ON tweet_cats_foo.tweet_id = tweets.id
    AND tweet_cats_foo.name = 'foo'
  LEFT JOIN tweet_cats AS tweet_cats_bar
    ON tweet_cats_bar.tweet_id = tweets.id
    AND tweet_cats_bar.name = 'bar'
WHERE
  tweet_cats_foo.tweet_id IS NOT NULL
  AND tweet_cats_bar.tweet_id IS NULL

PS。找到类别交叉,as suggested by Strawberry in the comments above另一种方式,就是做一个连接对结合表,组结果由鸣叫ID,并使用HAVING条款计算有多少符合条件的分类,发现每个鸣叫:

SELECT tweets.*
FROM tweets
  JOIN tweet_cats ON tweet_cats.tweet_id = tweets.id
WHERE
   tweet_cats.name IN ('foo', 'bar')
GROUP BY tweets.id
HAVING COUNT(DISTINCT tweet_cats.name) = 2

该方法也可以推广通过使用第二处理排除(左)加入,例如像这样:

SELECT tweets.*
FROM tweets
  JOIN tweet_cats AS tweet_cats_wanted
    ON tweet_cats_wanted.tweet_id = tweets.id
    AND tweet_cats_wanted.name IN ('foo', 'bar')
  LEFT JOIN tweet_cats AS tweet_cats_unwanted
    ON tweet_cats_unwanted.tweet_id = tweets.id
    AND tweet_cats_unwanted.name IN ('baz', 'blorgh', 'xyzzy')
WHERE
  tweet_cats_unwanted.tweet_id IS NULL
GROUP BY tweets.id
HAVING COUNT(DISTINCT tweet_cats_wanted.name) = 2

我没有基准这两种方法,看看哪一个更有效,而且我强烈建议在决定哪一个去与之前这样做。原则上,我希望在多连接方法更容易为数据库引擎优化,因为它清楚地映射到连接的交集,而对于GROUP BY ... HAVING方法天真的数据库可能最终浪费了大量的首先努力找出所有匹配的任何一类,而事后才应用HAVING子句来过滤掉一切,但那些符合所有类别的tweet。一个简单的测试案例,这可能是几个非常大的类别,有一个非常小的一个,这是我所期望的使用多连接方法更有效率的交集。不过,当然,每个人都应该进行测试,而不是仅仅依靠直觉这样的事情。

© www.soinside.com 2019 - 2024. All rights reserved.