检测发送文本所需短信数量的最佳方法

问题描述 投票:0回答:3

我正在 php 中寻找一个代码/lib,我将调用它并向其传递文本,它会告诉我:

  1. 我需要使用什么编码才能将此文本作为短信发送(7,8,16 位)
  2. 我将使用多少条短信来发送此文本(像http://ozekisms.com/index.php?owpn=612那样计算“分段信息”一定很聪明)

你知道有任何代码/库可以为我做这件事吗?

再次强调,我不是在寻找发送短信或转换短信,只是为了向我提供有关文本的信息

更新:

好吧,我做了下面的代码,它似乎工作正常,如果您有更好/优化的代码/解决方案/lib,请告诉我

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";




function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not able to detect \ in string
}
return true;
}
php encoding sms ascii ucs2
3个回答
9
投票

我在这里添加一些额外的信息,因为之前的答案不太正确。

这些是问题:

  • 需要将当前字符串编码指定为mb_string,否则可能会错误地收集
  • 在 7 位 GSM 编码中,基本字符集扩展字符 (^{}\[~]|€) 每个需要 14 位进行编码,因此它们各算作两个字符。
  • 在UCS-2编码中,你必须警惕表情符号和16位BMP之外的其他字符,因为......
  • 带有 UCS-2 的 GSM 计算 16 位字符,因此如果您有 💩 字符 (U+1F4A9),并且您的运营商和手机偷偷支持 UTF-16 而不仅仅是 UCS-2,它将被编码为代理对UTF-16 中的 16 位字符,因此根据字符串长度计为两个 16 位字符。
    mb_strlen
    只会将此视为单个字符。

如何计算7位字符:

到目前为止,我想出了以下方法来计算 7 位字符:

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_gsm_string($str)
{
    // Basic GSM character set (one 7-bit encoded char each)
    $gsm_7bit_basic = "@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà";

    // Extended set (requires escape code before character thus 2x7-bit encodings per)
    $gsm_7bit_extended = "^{}\\[~]|€";

    $len = 0;

    for($i = 0; $i < mb_strlen($str); $i++) {
        $c = mb_substr($str, i, 1);
        if(mb_strpos($gsm_7bit_basic, $c) !== FALSE) {
            $len++;
        } else if(mb_strpos($gsm_7bit_extended, $c) !== FALSE) {
            $len += 2;
        } else {
            return -1; // cannot be encoded as GSM, immediately return -1
        }
    }

    return $len;
}

如何统计16位字符:

  • 将字符串转换为 UTF-16 表示形式(以使用
    mb_convert_encoding($str, 'UTF-16', 'UTF-8')
    保留表情符号字符。
  • 不要转换为 UCS-2,因为这是有损的
    mb_convert_encoding
    )
  • 使用
    count(unpack('C*', $utf16str))
    计算字节数并除以二以获得计入 GSM 多部分长度的 UCS-2 16 位字符数

*买者自负,关于字节计数的一句话:

  • 不要使用
    strlen
    来计算字节数。虽然它可能有效,但
    strlen
    在具有多字节功能的版本的 PHP 安装中通常会超载,并且也是未来 API 更改的候选者
  • 避免
    mb_strlen($str, 'UCS-2')
    。虽然它当前工作,并且会正确返回 2 作为一堆便便字符(因为它看起来像两个 16 位 UCS-2 字符),但它的稳定伴侣
    mb_convert_encoding
    在从 >16 位转换时是有损的到 UCS-2。谁敢说 mb_strlen 以后不会有损?
  • 避免
    mb_strlen($str, '8bit') / 2
    。它也目前有效,并且在 PHP 文档注释中推荐作为一种计算字节的方法。但在我看来,它遇到了与上述 UCS-2 技术相同的问题。
  • 剩下最安全的当前方式(IMO)
    unpack
    放入字节数组中,并对其进行计数。

那么,这看起来像什么?

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_ucs2_string($str)
{
    $utf16str = mb_convert_encoding($str, 'UTF-16', 'UTF-8');
    // C* option gives an unsigned 16-bit integer representation of each byte
    // which option you choose doesn't actually matter as long as you get one value per byte
    $byteArray = unpack('C*', $utf16str);
    return count($byteArray) / 2;
}

把它们放在一起:

function multipart_count($str)
{
    $one_part_limit = 160; // use a constant i.e. GSM::SMS_SINGLE_7BIT
    $multi_limit = 153; // again, use a constant
    $max_parts = 3; // ... constant

    $str_length = count_gsm_string($str);
    if($str_length === -1) {
        $one_part_limit = 70; // ... constant
        $multi_limit = 67; // ... constant
        $str_length = count_ucs2_string($str);
    }

    if($str_length <= $one_part_limit) {
        // fits in one part
        return 1;
    }

    if($str_length > ($max_parts * $multi_limit)) {
            // too long
        return -1; // or throw exception, or false, etc.
    }

    // divide the string length by multi_limit and round up to get number of parts
    return ceil($str_length / $multi_limit);
}

把它变成一个图书馆......

https://bitbucket.org/solvam/smstools


4
投票

迄今为止我拥有的最佳解决方案:

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";

function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not     able to detect \ in string
}
return true;
}

-2
投票
  • 第 1 页:160 字节
  • 第 2 页:146 字节
  • 第 3 页:153 字节
  • 第 4 页:153 字节
  • 第 5 页:153 字节,....

所以无论语言如何:

// strlen($text) show bytes 

           $count = 0;
           $len = strlen($text);
                if ($len > 306) {
                    $len = $len - 306;
                    $count = floor($len / 153) + 3;
                } else if($len>160){
                    $count = 2;
                }else{
                    $count = 1;
                }
© www.soinside.com 2019 - 2024. All rights reserved.