Commit adfe2fb9 authored by Yuanle Song's avatar Yuanle Song

v0.9.0 support fuzzy pinyin.

- added FuzzyFlag property.
- updated zero-pinyin.el to support zero-pinyin-fuzzy-flag variable.
parent 56662390
......@@ -3,6 +3,15 @@
"http://www.freedesktop.org/standards/dbus/1.0/introspect.dtd" >
<node>
<interface name="com.emacsos.zero.ZeroPinyinService1.ZeroPinyinServiceInterface">
<!--
FuzzyFlag: fuzzy matching flag for GetCandidates method.
0 0b0 no fuzzy matching
1 0b1 enable fuzzy match on z<->zh, c<->ch, s<->sh
2 0b10 enable fuzzy match on l<->n
flag numbers can be combined with bit or.
-->
<property name="FuzzyFlag" type="u" access="readwrite"></property>
<!--
GetCandidates:
......
......@@ -41,11 +41,11 @@ on_handle_get_candidates(ZeroPinyinService *object,
GVariantBuilder *candidates_pinyin_indices = NULL;
/* test data */
/* get_candidates_test (preedit_str, fetch_size, candidates_builder, matched_lengths_builder); */
/* get_candidates_test(preedit_str, fetch_size, candidates_builder, matched_lengths_builder); */
candidates_builder = g_variant_builder_new(G_VARIANT_TYPE("as"));
matched_lengths_builder = g_variant_builder_new(G_VARIANT_TYPE("au"));
candidates_pinyin_indices = g_variant_builder_new(G_VARIANT_TYPE("aa(ii)"));
get_candidates(appdata->db, preedit_str, fetch_size, candidates_builder, matched_lengths_builder, candidates_pinyin_indices);
get_candidates(appdata->db, preedit_str, fetch_size, zero_pinyin_service_get_fuzzy_flag(object), candidates_builder, matched_lengths_builder, candidates_pinyin_indices);
result = g_variant_new("(asauaa(ii))", candidates_builder, matched_lengths_builder, candidates_pinyin_indices);
g_assert_nonnull(result);
......
# -*- mode: conf -*-
project('zero-pinyin-service', ['c', 'cpp'],
version: '0.8.0',
version: '0.9.0',
license: 'GPL',
default_options: [
'warning_level=2',
......@@ -44,7 +44,7 @@ shared_dep = [glib, gio, uuid, sqlite3]
gen_inc = include_directories('.')
gdbus_codegen = find_program('gdbus-codegen')
zero_pinyin_generated = custom_target('zero-panel-generated',
zero_pinyin_generated = custom_target('zero-pinyin-generated',
input: 'com.emacsos.zero.ZeroPinyinService1.ZeroPinyinServiceInterface.xml',
output: ['zero-pinyin-service-generated.h', 'zero-pinyin-service-generated.c'],
command: [gdbus_codegen, '--generate-c-code', 'zero-pinyin-service-generated', '@[email protected]'])
......
* COMMENT -*- mode: org -*-
#+Date: 2019-04-05
Time-stamp: <2019-10-22>
Time-stamp: <2019-10-23>
#+STARTUP: content
* notes :entry:
** 2019-08-31 ibus-pinyin userdb inference notice.
......@@ -52,6 +52,7 @@ shared between zero-pinyin and ibus-pinyin.
* later :entry:
* current :entry:
**
** 2019-10-23 bug: type "zhey" doesn't show 这样 candidate.
** 2019-10-22 handle a an o en etc differently. only match exactly the character.
no fuzzy matching or incomplete pinyin matching.
......@@ -97,6 +98,99 @@ see ~/c/gtk-im-module/, it uses myastyle-pre-commit-check in git pre-commit
- make the method work. use gobject property maybe.
- set default flags to my flags. reflect this in UI/config file.
* done :entry:
** 2019-10-22 support fuzzy pinyin.
- 0 no fuzzy
- 1 (0b1)
z <-> zh
c <-> ch
s <-> sh
- 2 (0b10)
l <-> n
- I won't implement other flags at this time.
- default is no fuzzy.
- test case:
ru chi 如此
ci di 此地
e nuo 婀娜
e luo
ci yu 词语
song 宋
shong
- implementation
- WONTFIX add a property in appdata.
use a guint flag.
only support 0b0 and 0b10 for now.
Not necessary.
- DONE add dbus interface to allow change the property at run time.
property is added in dbus interface directly.
app just read the property there. there is no network round trip.
data is stored in the dbus service server side. not in dbus daemon.
- DONE add flags in zero-el.
this flag is NOT buffer local.
(setq zero-pinyin-fuzzy-flag 3)
it will set fuzzy mode when zero-pinyin-reset or zero-pinyin-init.
(zero-pinyin-service-set-fuzzy-flag 3)
it works.
- problems
- can I make this change backward compatible?
TODO what will happen when zero asks dbus to set FuzzyFlag property, if
running an old zero-pinyin-service?
- property generated code is okay.
- how to access that property in zero-pinyin-service.c?
guint zero_pinyin_service_get_fuzzy_flag (ZeroPinyinService *object);
this is not the right way.
because flag doesn't change much.
should query once at start up, then listen to change event and update
in-ram data. do not query dbus every time FuzzyFlag is needed.
WONTFIX add event handler to update appdata->fuzzy_flag when property is
changed. // generated code handle set event and update g_object data in
RAM automatically.
signal org.freedesktop.DBus.Properties.PropertiesChanged
see example in
https://developer.gnome.org/gio/2.26/GDBusProxy.html
- how to set the property in zero-el?
call some dbus built-in service.
(zero-pinyin-service-quit)
run zero-pinyin-service in console
(dbus-set-property :session "com.emacsos.zero.ZeroPinyinService1"
"/com/emacsos/zero/ZeroPinyinService1"
"com.emacsos.zero.ZeroPinyinService1.ZeroPinyinServiceInterface"
"FuzzyFlag" 3)
(dbus-set-property :session "com.emacsos.zero.ZeroPinyinService1"
"/com/emacsos/zero/ZeroPinyinService1"
"com.emacsos.zero.ZeroPinyinService1.ZeroPinyinServiceInterface"
"FuzzyFlag" 0)
I think generated code will handle set/get call.
maybe just always get the value from
zero_pinyin_service_get_fuzzy_flag(ZeroPinyinService *object)
https://developer.gnome.org/gio/2.60/gdbus-codegen.html
Server-side usage
just use g_object_get() in server side. it will not talk to dbus.
property change should be handled by generated code.
- set property from emacs and read property in zero-pinyin-service.c
works. but the fuzzy SQL logic is not working.
when set to 3, shong can't produce 宋.
enabled debug log.
I see the problem.
shong is not a valid py if fuzzy flag is not used during parsing.
use correct parsing flag from FuzzyFlag when parsing.
- now it works.
** 2019-08-31 choose maindb like my patched libpyzy.
- here is patched libpyzy maindb logic:
files.push_back (m_user_data_dir + "/main.db");
......
#ifndef _PINYIN_ID_H_
#define _PINYIN_ID_H_
#ifdef __cplusplus
extern "C"
{
#endif
/* data from ../Types.h, copied here so it can be used in C code. */
#define PINYIN_ID_VOID (-1)
#define PINYIN_ID_ZERO (0)
#define PINYIN_ID_B (1)
#define PINYIN_ID_C (2)
#define PINYIN_ID_CH (3)
#define PINYIN_ID_D (4)
#define PINYIN_ID_F (5)
#define PINYIN_ID_G (6)
#define PINYIN_ID_H (7)
#define PINYIN_ID_J (8)
#define PINYIN_ID_K (9)
#define PINYIN_ID_L (10)
#define PINYIN_ID_M (11)
#define PINYIN_ID_N (12)
#define PINYIN_ID_P (13)
#define PINYIN_ID_Q (14)
#define PINYIN_ID_R (15)
#define PINYIN_ID_S (16)
#define PINYIN_ID_SH (17)
#define PINYIN_ID_T (18)
#define PINYIN_ID_W (19)
#define PINYIN_ID_X (20)
#define PINYIN_ID_Y (21)
#define PINYIN_ID_Z (22)
#define PINYIN_ID_ZH (23)
#define PINYIN_ID_A (24)
#define PINYIN_ID_AI (25)
#define PINYIN_ID_AN (26)
#define PINYIN_ID_ANG (27)
#define PINYIN_ID_AO (28)
#define PINYIN_ID_E (29)
#define PINYIN_ID_EI (30)
#define PINYIN_ID_EN (31)
#define PINYIN_ID_ENG (32)
#define PINYIN_ID_ER (33)
#define PINYIN_ID_I (34)
#define PINYIN_ID_IA (35)
#define PINYIN_ID_IAN (36)
#define PINYIN_ID_IANG (37)
#define PINYIN_ID_IAO (38)
#define PINYIN_ID_IE (39)
#define PINYIN_ID_IN (40)
#define PINYIN_ID_ING (41)
#define PINYIN_ID_IONG (42)
#define PINYIN_ID_IU (43)
#define PINYIN_ID_O (44)
#define PINYIN_ID_ONG (45)
#define PINYIN_ID_OU (46)
#define PINYIN_ID_U (47)
#define PINYIN_ID_UA (48)
#define PINYIN_ID_UAI (49)
#define PINYIN_ID_UAN (50)
#define PINYIN_ID_UANG (51)
#define PINYIN_ID_UE (52)
#define PINYIN_ID_VE PINYIN_ID_UE
#define PINYIN_ID_UI (53)
#define PINYIN_ID_UN (54)
#define PINYIN_ID_UO (55)
#define PINYIN_ID_V (56)
#define PINYIN_ID_NG PINYIN_ID_VOID
#ifdef __cplusplus
}
#endif
#endif /* _PINYIN_ID_H_ */
#include "zero-pinyin-service.h"
#include "parse-pinyin.h"
#include "../sqlite3_util.h"
#include "pinyin-id.h"
void
get_candidates_test(const char *preedit_str,
......@@ -32,41 +33,89 @@ get_candidates_test(const char *preedit_str,
}
}
/**
* get pinyin's fuzzy pair.
* for example, zh for z.
*/
gint
get_fuzzy_pair(gint pinyin_id)
{
switch (pinyin_id) {
case PINYIN_ID_Z: return PINYIN_ID_ZH;
case PINYIN_ID_ZH: return PINYIN_ID_Z;
case PINYIN_ID_C: return PINYIN_ID_CH;
case PINYIN_ID_CH: return PINYIN_ID_C;
case PINYIN_ID_S: return PINYIN_ID_SH;
case PINYIN_ID_SH: return PINYIN_ID_S;
case PINYIN_ID_L: return PINYIN_ID_N;
case PINYIN_ID_N: return PINYIN_ID_L;
default:
g_assert_not_reached();
return pinyin_id;
}
}
/**
* build where clause for build_sql_for_n_pinyin().
*
* @pylist: the pinyin list.
* @fuzzy_flag: see dbus interface FuzzyFlag property.
* @n: number of Pinyin to use in pylist.
*
* returns: where_clause, caller should g_free() result after use.
*/
static char *
build_where_clause(GList *pylist,
const guint fuzzy_flag,
const guint n)
{
GString *s = NULL;
GList *iter = pylist;
gboolean first_condition_done = FALSE;
Pinyin *thispy = NULL;
s = g_string_new(NULL);
/* allow append "AND something" without checking */
g_string_append_printf(s, "1=1 ");
for (guint i = 0; i < n; ++i) {
g_assert_nonnull(iter);
thispy = (Pinyin *) iter->data;
/* do not allow omit shengmu. always do strict match */
if (G_LIKELY(first_condition_done)) {
switch (thispy->shengmu_i) {
case PINYIN_ID_Z:
case PINYIN_ID_C:
case PINYIN_ID_S:
case PINYIN_ID_ZH:
case PINYIN_ID_CH:
case PINYIN_ID_SH:
if (fuzzy_flag & FUZZY_FLAG_ZCS_ZHCHSH) {
g_string_append_printf(
s,
"AND (s%u=%d OR s%u=%d) ",
i, thispy->shengmu_i,
i, get_fuzzy_pair(thispy->shengmu_i));
} else {
goto NO_FUZZY;
}
break;
case PINYIN_ID_L:
case PINYIN_ID_N:
if (fuzzy_flag & FUZZY_FLAG_L_N) {
g_string_append_printf(
s,
"AND (s%u=%d OR s%u=%d) ",
i, thispy->shengmu_i,
i, get_fuzzy_pair(thispy->shengmu_i));
} else {
goto NO_FUZZY;
}
break;
default:
NO_FUZZY:
g_string_append_printf(s, "AND s%u=%d ", i, thispy->shengmu_i);
} else {
g_string_append_printf(s, "s%u=%d ", i, thispy->shengmu_i);
first_condition_done = TRUE;
}
/* allow omit yunmu, if 0 don't match on it */
if (thispy->yunmu_i) {
if (G_LIKELY(first_condition_done)) {
g_string_append_printf(s, "AND y%u=%d ", i, thispy->yunmu_i);
} else {
g_string_append_printf(s, "y%u=%d ", i, thispy->yunmu_i);
first_condition_done = TRUE;
}
g_string_append_printf(s, "AND y%u=%d ", i, thispy->yunmu_i);
}
iter = iter->next;
}
......@@ -103,6 +152,7 @@ build_s_y_fields(const guint n)
*/
static char *
build_sql_for_n_pinyin(GList *pylist,
const guint fuzzy_flag,
const guint n,
const guint limit)
{
......@@ -119,7 +169,7 @@ build_sql_for_n_pinyin(GList *pylist,
g_string_append_printf(sql, s_y_fields);
g_string_append_printf(
sql, "FROM maindb.py_phrase_%u WHERE ", n - 1);
where_clause = build_where_clause(pylist, n);
where_clause = build_where_clause(pylist, fuzzy_flag, n);
g_assert_nonnull(where_clause);
g_debug("where_clause=%s", where_clause);
sql = g_string_append(sql, where_clause);
......@@ -182,6 +232,7 @@ get_matched_py_length(const char *preedit_str,
* @preedit_str: the pinyin preedit str. can contain '. This is needed to
* calculate matched_py_length.
* @pylist: the pinyin list.
* @fuzzy_flag: see dbus interface FuzzyFlag property.
* @group_size: the fixed word length. use this many pinyin from pinyin list.
* @limit: fetch this many result is enough for user. more is not a problem though.
* @candidates: the result candidate list. caller should free this after use.
......@@ -192,6 +243,7 @@ static guint
get_candidates_for_n_pinyin(sqlite3 *db,
const char *preedit_str,
GList *pylist,
const guint fuzzy_flag,
const guint group_size,
const guint limit,
GList **candidates)
......@@ -207,7 +259,8 @@ get_candidates_for_n_pinyin(sqlite3 *db,
gint r = 0;
/* build SQL and run SQL query */
char *sql = NULL;
sql = build_sql_for_n_pinyin(pylist, group_size, MAX(limit, DEFAULT_LIMIT));
sql = build_sql_for_n_pinyin(pylist, fuzzy_flag,
group_size, MAX(limit, DEFAULT_LIMIT));
g_debug("build_sql_for_n_pinyin result SQL:\n\n%s\n", sql);
guint matched_py_length = get_matched_py_length(preedit_str, pylist, group_size);
......@@ -296,10 +349,31 @@ add_candidate_to_builders(Candidate *c,
g_free(c->py_indices);
}
/**
* convert zero FuzzyFlag to libpyzy flag.
* I don't use libpyzy flag directly because it is overly complex.
*/
guint
to_pyzy_flag(const guint fuzzy_flag)
{
guint result = 0;
if (fuzzy_flag & FUZZY_FLAG_ZCS_ZHCHSH) {
result = result |
PINYIN_FUZZY_Z_ZH | PINYIN_FUZZY_ZH_Z |
PINYIN_FUZZY_C_CH | PINYIN_FUZZY_CH_C |
PINYIN_FUZZY_S_SH | PINYIN_FUZZY_SH_S;
}
if (fuzzy_flag & FUZZY_FLAG_L_N) {
result = result | PINYIN_FUZZY_L_N | PINYIN_FUZZY_N_L;
}
return result;
}
void
get_candidates(sqlite3 *db,
const char *preedit_str,
const guint fetch_size,
const guint fuzzy_flag,
GVariantBuilder *candidates_builder,
GVariantBuilder *matched_lengths_builder,
GVariantBuilder *candidates_pinyin_indices)
......@@ -310,7 +384,8 @@ get_candidates(sqlite3 *db,
}
GList *pylist = NULL;
guint pylist_len = 0;
pylist = parse_pinyin(preedit_str, 15, PINYIN_FLAGS_NONE);
g_debug("fuzzy_flag=%u", fuzzy_flag);
pylist = parse_pinyin(preedit_str, 15, to_pyzy_flag(fuzzy_flag));
pylist_len = g_list_length(pylist);
guint group_size = pylist_len;
......@@ -319,7 +394,7 @@ get_candidates(sqlite3 *db,
GList *candidates = NULL;
while (fetched_size < fetch_size && group_size > 0) {
g_info("phrase length=%u", group_size);
r = get_candidates_for_n_pinyin(db, preedit_str, pylist, group_size, fetch_size - fetched_size, &candidates);
r = get_candidates_for_n_pinyin(db, preedit_str, pylist, fuzzy_flag, group_size, fetch_size - fetched_size, &candidates);
if (candidates) {
GList *iter = g_list_first(candidates);
Candidate *c = NULL;
......
......@@ -11,6 +11,11 @@ G_BEGIN_DECLS
#define ZERO_PINYIN_OBJECT_PATH "/com/emacsos/zero/ZeroPinyinService1"
#define ZERO_PINYIN_INTERFACE_NAME "com.emacsos.zero.ZeroPinyinService1.ZeroPinyinServiceInterface"
static const guint FUZZY_FLAG_NONE = 0;
static const guint FUZZY_FLAG_ZCS_ZHCHSH = 1;
static const guint FUZZY_FLAG_L_N = 2;
/* Note: next flag should be 4, always use a new bit for flag. */
typedef struct {
gint shengmu_i;
gint yunmu_i;
......@@ -45,6 +50,7 @@ void get_candidates_test(const char *preedit_str,
void get_candidates(sqlite3 *db,
const char *preedit_str,
const guint fetch_size,
const guint fuzzy_flag,
GVariantBuilder *candidates_builder,
GVariantBuilder *matched_lengths_builder,
GVariantBuilder *candidates_pinyin_indices);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment