我有一个索引,其中某些数据重复,除了纬度,经度和id外,所有字段都是相似的(字段id不是真正的ID,只是生成了row_number() OVER () AS id
。
例如:
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy;
+------+------------+---------------+----------+-----------+
| id | vacancy_id | prof_area_ids | latitude | longitude |
+------+------------+---------------+----------+-----------+
| 1 | 917 | 11,199,202 | 0.973178 | 0.743566 |
| 2 | 916 | 17,283,288 | 0.973178 | 0.743566 |
| 3 | 915 | 17,288 | 0.973178 | 0.743566 |
| 4 | 914 | 30,482 | 0.973178 | 0.743566 |
| 5 | 919 | 15,243 | 0.825153 | 0.692837 |
| 6 | 919 | 15,243 | 0.825162 | 0.692828 |
| 7 | 918 | 8,154 | 0.825153 | 0.692837 |
| 8 | 918 | 8,154 | 0.825162 | 0.692828 |
| 9 | 920 | 17,283,288 | 0.958914 | 1.282161 |
| 10 | 920 | 17,283,288 | 0.958915 | 1.282215 |
| 11 | 924 | 12,208 | 0.97333 | 0.658246 |
| 12 | 924 | 12,208 | 0.973336 | 0.658237 |
| 13 | 923 | 21,365 | 0.97333 | 0.658246 |
| 14 | 923 | 21,365 | 0.973336 | 0.658237 |
| 15 | 922 | 20,359 | 0.97333 | 0.658246 |
| 16 | 922 | 20,359 | 0.973336 | 0.658237 |
| 17 | 921 | 19,346 | 0.97333 | 0.658246 |
| 18 | 921 | 19,346 | 0.973336 | 0.658237 |
| 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 |
| 20 | 925 | 12,208 | 0.88396 | 2.389868 |
+------+------------+---------------+----------+-----------+
20 rows in set (0.00 sec)
现在我想按vacancy_id
分组数据>
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy group by vacancy_id; +------+------------+---------------+----------+-----------+ | id | vacancy_id | prof_area_ids | latitude | longitude | +------+------------+---------------+----------+-----------+ | 1 | 917 | 11,199,202 | 0.973178 | 0.743566 | | 2 | 916 | 17,283,288 | 0.973178 | 0.743566 | | 3 | 915 | 17,288 | 0.973178 | 0.743566 | | 4 | 914 | 30,482 | 0.973178 | 0.743566 | | 5 | 919 | 15,243 | 0.825153 | 0.692837 | | 7 | 918 | 8,154 | 0.825153 | 0.692837 | | 9 | 920 | 17,283,288 | 0.958914 | 1.282161 | | 11 | 924 | 12,208 | 0.97333 | 0.658246 | | 13 | 923 | 21,365 | 0.97333 | 0.658246 | | 15 | 922 | 20,359 | 0.97333 | 0.658246 | | 17 | 921 | 19,346 | 0.97333 | 0.658246 | | 19 | 926 | 12,17,208,292 | 0.88396 | 2.389868 | | 20 | 925 | 12,208 | 0.88396 | 2.389868 | | 21 | 961 | 4,105 | 0.959217 | 1.280721 | | 23 | 960 | 8,155 | 0.959217 | 1.280721 | | 25 | 959 | 12,208 | 0.959217 | 1.280721 | | 27 | 928 | 1,60 | 0.963734 | 1.070297 | | 29 | 927 | 32,513 | 0.963734 | 1.070297 | | 31 | 929 | 6,140 | 0.786553 | 0.678649 | | 33 | 932 | 1,40,46 | 0.824627 | 0.694182 | +------+------------+---------------+----------+-----------+ 20 rows in set (0.00 sec)
结果很棒!但是,当我想获取所有带有多面的分组数据时,问题就开始了
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids; +------+------------+-----------------+----------+-----------+ | id | vacancy_id | prof_area_ids | latitude | longitude | +------+------------+-----------------+----------+-----------+ | 1 | 917 | 11,199,202 | 0.973178 | 0.743566 | | 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 | | 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 | | 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 | | 374 | 1097 | 11,199 | 0.785255 | 0.678504 | +------+------------+-----------------+----------+-----------+ 5 rows in set (0.00 sec) +---------------+----------+ | prof_area_ids | count(*) | +---------------+----------+ | 202 | 1 | | 199 | 12 | | 11 | 12 | | 196 | 5 | | 197 | 3 | | 60 | 3 | | 1 | 3 | +---------------+----------+ 7 rows in set (0.02 sec)
不确定的结果不正确。因为实际上prof_area_ids = 199处的数据计数必须为5而不是12。那么如何对多面字段进行分组?
我在这里找到http://sphinxsearch.com/blog/2013/06/21/faceted-search-with-sphinx/,但只写了“如果您具有MVA方面,则需要使用GROUPBY()函数,该函数返回进行分组的实际值。”而且没有三分。
mysql> select id,vacancy_id,prof_area_ids,latitude,longitude,GROUPBY() as selected,COUNT(*) from jobVacancy where prof_area_ids=199 group by vacancy_id facet prof_area_ids; +------+------------+-----------------+----------+-----------+----------+----------+ | id | vacancy_id | prof_area_ids | latitude | longitude | selected | count(*) | +------+------------+-----------------+----------+-----------+----------+----------+ | 1 | 917 | 11,199,202 | 0.973178 | 0.743566 | 917 | 1 | | 191 | 1004 | 11,196,199 | 0.925335 | 2.768874 | 1004 | 2 | | 313 | 1072 | 1,11,60,197,199 | 0.963968 | 1.070624 | 1072 | 3 | | 318 | 1136 | 11,196,199 | 0.96071 | 1.448998 | 1136 | 3 | | 374 | 1097 | 11,199 | 0.785255 | 0.678504 | 1097 | 3 | +------+------------+-----------------+----------+-----------+----------+----------+ 5 rows in set (0.00 sec) +---------------+----------+ | prof_area_ids | count(*) | +---------------+----------+ | 202 | 1 | | 199 | 12 | | 11 | 12 | | 196 | 5 | | 197 | 3 | | 60 | 3 | | 1 | 3 | +---------------+----------+ 7 rows in set (0.02 sec)
还多面的结果是错误的。
我有一个索引,其中某些数据重复,所有字段都相似,除了纬度,经度和ID(字段ID不是真正的ID,只是生成了row_number()OVER()AS ID)。例如:...
不确定的结果不正确。因为实际上prof_area_ids = 199处的数据计数必须为5而不是12。那么如何对多面字段进行分组?
似乎是要在FACET上有效地使用COUNT(DISTINCT vacancy_id)
,而不是默认的COUNT(*)
,但事实证明是这样的>]
... FACET prof_area_ids,COUNT(DISTINCT vacancy_id) AS vacancies BY prof_area_ids