words-match
words-match組件是基于字典樹(DFA)并利用UnixSock通訊和自定義進程實現,開發本組件的目的是幫小伙伴們快速部署內容檢測服務。
使用場景
-
跟文字內容相關的產品都有應用場景。
-
博客類的文章,評論的檢測
-
聊天內容的檢測
-
對垃圾內容的屏蔽
組件要求
None
安裝方法
composer require easyswoole/words-match
倉庫地址
基本使用
準備詞庫
服務啟動的時候會一行一行將數據讀出來,每一行的第一列為敏感詞,其它列為附屬信息
php※是世界上※最好的語言
java
golang
程序員
代碼
邏輯
服務注冊
<?php
namespace EasySwoole\EasySwoole;
use EasySwoole\Component\Di;
use EasySwoole\EasySwoole\AbstractInterface\Event;
use EasySwoole\EasySwoole\Swoole\EventRegister;
use EasySwoole\Http\Request;
use EasySwoole\Http\Response;
use EasySwoole\WordsMatch\WMServer;
class EasySwooleEvent implements Event
{
public static function initialize()
{
date_default_timezone_set('Asia/Shanghai');
Di::getInstance()->set(SysConst::HTTP_GLOBAL_ON_REQUEST, function (Request $request, Response $response): bool {
// TODO: Implement onRequest() method.
return true;
});
Di::getInstance()->set(SysConst::HTTP_GLOBAL_AFTER_REQUEST, function (Request $request, Response $response): void {
// TODO: Implement onRequest() method.
});
}
public static function mainServerCreate(EventRegister $register)
{
// 配置 words-match
$wdConfig = new \EasySwoole\WordsMatch\Config();
$wdConfig->setDict(__DIR__ . '/dictionary.txt'); // 配置 詞庫地址
$wdConfig->setMaxMEM('1024M'); // 配置 每個進程最大占用內存(M),默認為 512 M
$wdConfig->setTimeout(3.0); // 配置 內容檢測超時時間。默認為 3.0 s
$wdConfig->setWorkerNum(3); // 配置 進程數
// $wdConfig->setSockDIR(sys_get_temp_dir()); // (不建議修改)配置 socket 存放地址,默認為 sys_get_temp_dir(),即 '/tmp'
// 注冊服務
WMServer::getInstance($wdConfig)->attachServer(ServerManager::getInstance()->getSwooleServer());
}
}
客戶端使用
<?php
namespace App\HttpController;
use EasySwoole\Http\AbstractInterface\Controller;
use EasySwoole\WordsMatch\WMServer;
class Index extends Controller
{
function detect()
{
// 需要檢測的內容敏感詞
$content = 'php是世界上最好的語言';
// 檢測結果(返回 -1 表示檢測超時,匹配檢測到時返回檢測到的敏感詞內容)
$result = WMServer::getInstance()->detect($content, 3);
var_dump($result);
/**
* 輸出結果:
* array(1) {
[0]=>
object(EasySwoole\WordsMatch\Dictionary\DetectResult)#96 (5) {
["word"]=>
string(30) "php是世界上最好的語言"
["location"]=>
array(1) {
[0]=>
array(3) {
["word"]=>
string(30) "php是世界上最好的語言"
["length"]=>
int(12)
["location"]=>
array(1) {
[0]=>
int(0)
}
}
}
["count"]=>
int(1)
["remark"]=>
string(0) ""
["type"]=>
int(1)
}
* }
*/
}
}
壓測結果
對此組件分別進行1.5萬、13萬等級的詞庫測試,服務默認開啟3個進程。
僅做參考,具體還以線上驗證
電腦配置
MacBook Air (13-inch, 2017)
處理器 1.8 GHz Intel Core i5
內存 8 GB 1600 MHz DDR3
1.5萬詞
并發10總請求數100
10 100
Concurrency Level: 10
Time taken for tests: 0.067 seconds
Complete requests: 100
Failed requests: 0
Non-2xx responses: 100
Total transferred: 17300 bytes
HTML transferred: 2600 bytes
Requests per second: 1492.49 [#/sec] (mean)
Time per request: 6.700 [ms] (mean)
Time per request: 0.670 [ms] (mean, across all concurrent requests)
Transfer rate: 252.15 [Kbytes/sec] received
并發100總請求數1000
Concurrency Level: 100
Time taken for tests: 0.239 seconds
Complete requests: 1000
Failed requests: 0
Non-2xx responses: 1000
Total transferred: 173000 bytes
HTML transferred: 26000 bytes
Requests per second: 4189.17 [#/sec] (mean)
Time per request: 23.871 [ms] (mean)
Time per request: 0.239 [ms] (mean, across all concurrent requests)
Transfer rate: 707.74 [Kbytes/sec] received
13萬詞
并發10總請求數100
Concurrency Level: 10
Time taken for tests: 0.057 seconds
Complete requests: 100
Failed requests: 0
Non-2xx responses: 100
Total transferred: 17300 bytes
HTML transferred: 2600 bytes
Requests per second: 1751.71 [#/sec] (mean)
Time per request: 5.709 [ms] (mean)
Time per request: 0.571 [ms] (mean, across all concurrent requests)
Transfer rate: 295.94 [Kbytes/sec] received
并發100總請求數1000
Concurrency Level: 100
Time taken for tests: 0.225 seconds
Complete requests: 1000
Failed requests: 0
Non-2xx responses: 1000
Total transferred: 173000 bytes
HTML transferred: 26000 bytes
Requests per second: 4444.84 [#/sec] (mean)
Time per request: 22.498 [ms] (mean)
Time per request: 0.225 [ms] (mean, across all concurrent requests)
Transfer rate: 750.93 [Kbytes/sec] received