Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

wangfenjin/simple

Repository files navigation

Simple tokenizer

simple Shi Yi Ge Zhi Chi Zhong Wen He Pin Yin De sqlite3 fts5 Tuo Zhan . Ta Wan Zheng Ti Gong Liao Wei Xin Yi Dong Duan De Quan Wen Jian Suo Duo Yin Zi Wen Ti Jie Jue Fang An Yi Wen Zhong De Fang An Si ,Fei Chang Jian Dan He Gao Xiao Di Zhi Chi Zhong Wen Ji Pin Yin De Sou Suo .

Shi Xian Xiang Guan Jie Shao :https://www.wangfenjin.com/posts/simple-tokenizer/

Zai Ci Ji Chu Shang ,Wo Men Huan Zhi Chi Tong Guo cppjieba Shi Xian Geng Jing Zhun De Ci Zu Pi Pei ,Jie Shao Wen Zhang Jian https://www.wangfenjin.com/posts/simple-jieba-tokenizer/

Yong Fa

Dai Ma Shi Yong

Ming Ling Xing Shi Yong

Shou Xian Xu Yao Que Ren Ni Yong Dao De sqlite Ban Ben Zhi Chi fts5 Tuo Zhan ,Que Ren Fang Fa Shi :

select fts5(?1);

Ran Hou Jiu Ke Yi Shi Yong Liao ,Ju Ti De Li Zi Ke Yi Can Kao example.sql He cpp

INSERT INTO t1 VALUES ('Zhong Hua Ren Min Gong He Guo Guo Ge '); sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match simple_query('Zhong Hua Guo Ge '); [Zhong Hua ]Ren Min Gong He [Guo Guo Ge ] sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match jieba_query('Zhong Hua Guo Ge '); [Zhong Hua ]Ren Min Gong He Guo [Guo Ge ] sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match simple_query('Zhong Hua Ren Min Gong He Guo '); [Zhong Hua Ren Min Gong He Guo Guo ]Ge sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match jieba_query('Zhong Hua Ren Min Gong He Guo '); [Zhong Hua Ren Min Gong He Guo ]Guo Ge ">$ ./sqlite3
SQLite version 3.32.3 2020-06-18 14:00:33
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .load libsimple
sqlite> CREATE VIRTUAL TABLE t1 USING fts5(text, tokenize = 'simple');
sqlite> INSERT INTO t1 VALUES ('Zhong Hua Ren Min Gong He Guo Guo Ge ');
sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match simple_query('Zhong Hua Guo Ge ');
[Zhong Hua ]Ren Min Gong He [Guo Guo Ge ]
sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match jieba_query('Zhong Hua Guo Ge ');
[Zhong Hua ]Ren Min Gong He Guo [Guo Ge ]
sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match simple_query('Zhong Hua Ren Min Gong He Guo ');
[Zhong Hua Ren Min Gong He Guo Guo ]Ge
sqlite> select simple_highlight(t1, 0, '[', ']') as text from t1 where text match jieba_query('Zhong Hua Ren Min Gong He Guo ');
[Zhong Hua Ren Min Gong He Guo ]Guo Ge

Gong Neng

  1. simple tokenizer Zhi Chi Zhong Wen He Pin Yin De Fen Ci ,Bing Qie Ke Tong Guo Kai Guan Kong Zhi Shi Fou Xu Yao Zhi Chi Pin Yin
  2. simple_query() Han Shu Shi Xian Zi Dong Zu Zhuang match query De Gong Neng ,Yong Hu Bu Yong Xue Xi fts5 query De Yu Fa
  3. simple_highlight() Shi Xian Lian Xu Gao Liang match De Ci Hui ,Yu sqlite Zi Dai De highlight Lei Si ,Dan Shi simple_highlight Shi Xian Liao Lian Xu match De Ci Hui Fen Dao Tong Yi Zu De Luo Ji ,Li Lun Shang Yong Hu Geng Xu Yao Zhe Yang
  4. simple_highlight_pos() Shi Xian Fan Hui match De Ci Hui Wei Zhi ,Yong Hu Ke Yi Zi Xing Jue Ding Zen Yao Shi Yong
  5. simple_snippet() Shi Xian Jie Qu match Pian Duan De Gong Neng ,Yu sqlite Zi Dai De snippet Gong Neng Lei Si ,Tong Yang Shi Zeng Qiang Lian Xu match De Ci Hui Fen Dao Tong Yi Zu De Luo Ji
  6. jieba_query() Shi Xian jiebaFen Ci De Xiao Guo ,Zai Suo Yin Bu Bian De Qing Kuang Xia ,Ke Yi Shi Xian Geng Jing Zhun De Pi Pei . Ke Yi Tong Guo -DSIMPLE_WITH_JIEBA=OFF Guan Diao Jie Ba Fen Ci De Gong Neng #35
  7. jieba_dict() Zhi Ding dict De Mu Lu ,Zhi Xu Yao Diao Yong Yi Ci ,Xu Yao Zai Diao Yong jieba_query() Zhi Qian Zhi Ding .
  8. pinyin_dict() Zhi Chi Zhi Ding Zi Ding Yi De pinyin.txt Wen Jian Lu Jing . Diao Yong Cheng Gong Hou Hui Li Ji Qie Huan Pin Yin Ying She ;Ru Guo Wen Jian Ge Shi Bu Zheng Que ,Hui Fan Hui Cuo Wu Bing Bao Chi Dang Qian Ying She Bu Bian .

Zi Ding Yi pinyin.txt

Mo Ren Hui Shi Yong Nei Zhi Zai so Zhong De contrib/pinyin.txt. Ru Guo Xi Wang Shi Yong Zi Ji De Pin Yin Biao ,Ke Yi Zai Cha Xun Qian Diao Yong :

select pinyin_dict('/path/to/pinyin.txt');

pinyin.txt Mei Xing Ge Shi Yu Mo Ren Wen Jian Yi Zhi ,Li Ru :

U+3007: ling,yuan,xing
U+3007: ling,yuan,xing # Xing Wei Zhu Shi Ye Zhi Chi (Qian Mian Xu Yao Kong Ge )

Zhu Yi :

  • Jian Yi Zai Jian Suo Yin He Cha Xun Qian Xian Diao Yong Yi Ci pinyin_dict().
  • Ru Guo Ti Huan Liao Pin Yin Ying She ,Yi You Suo Yin Zhong De Pin Yin token Bu Hui Zi Dong Zhong Jian ,Xu Yao An Ni De Ye Wu Ce Lue Zhong Jian Suo Yin .

Kai Fa

Bian Yi Xiang Guan

Shi Yong Zhi Chi c++14 Yi Shang De Bian Yi Qi Bian Yi ,Zhi Jie Zai Gen Mu Lu ./build-and-run Jiu Hui Bian Yi Suo You Xu Yao De Wen Jian Bing Yun Xing Ce Shi . Bian Yi Shu Chu Jian output Mu Lu

Ye Ke Yi Shou Dong cmake:

mkdir build; cd build
cmake ..
make -j 12
make install

Zhi Chi iOS Bian Yi :

./build-ios.sh

Zhi Chi ohos Bian Yi :

Cong Guan Fang Fa Bu Qu Dao Xia Zai Gua Yong Yu Mu Biao Ping Tai De SDK.

tar -zxvf ohos-sdk.tar.gz
cd $OHOS_SDK/Linux
for i in *.zip;do unzip ${i};done

Kai Shi Bian Yi

./build-ohos.sh

Dai Ma

  • src/entry Ru Kou Wen Jian ,Zhu Ce sqlite tokenizer He Han Shu
  • src/simple_tokenizer Fen Ci Qi Shi Xian
  • src/simple_highlight Gao Liang Han Shu ,Ji Yu Nei Zhi De Gao Liang Han Shu Gai De ,Rang Ming Zhong De Xiang Lin Dan Ci Lian Xu Gao Liang
  • src/pinyin Zhong Wen Zhuan Pin Yin Yi Ji Pin Yin Chai query De Shi Xian

TODO

  • Tian Jia CI/CD
  • Tian Jia Shi Yong De Li Zi ,Can Jian cpp python3
  • Bu Fen Can Shu Ke Pei ,Bi Ru Pin Yin Wen Jian De Lu Jing (Yi Jing Ba Wen Jian Da Bao Dao so Zhong )
  • Jian Shao Yi Lai ,Jian Xiao so De Da Xiao
  • Gei Chu Xing Neng Shu Ju :Jia Zai Kuo Zhan Shi Jian 2msNei ;Di Yi Ci Shi Yong Pin Yin Gong Neng Xu Yao Jia Zai Pin Yin Wen Jian ,Da Gai 500ms;Di Yi Ci Shi Yong Jie Ba Fen Ci Gong Neng Xu Yao Jia Zai Jie Ba Fen Ci Wen Jian ,Da Gai 4s.

Star History

About

Zhi Chi Zhong Wen He Pin Yin De SQLite fts5 Quan Wen Sou Suo Kuo Zhan | A SQLite3 fts5 tokenizer which supports Chinese and PinYin

Topics

Resources

Readme

License

View license

Stars

Watchers

Forks

Packages

Contributors