计算字符串中的单词

问题描述 投票:66回答:20
function WordCount(str) {
  var totalSoFar = 0;
  for (var i = 0; i < WordCount.length; i++)
    if (str(i) === " ") { // if a space is found in str
      totalSoFar = +1; // add 1 to total so far
  }
  totalsoFar += 1; // add 1 to totalsoFar to account for extra space since 1 space = 2 words
}

console.log(WordCount("Random String"));

我认为我已经很好地解决了这个问题,除了我认为if声明是错误的。怎么说if(str(i)包含空格,加1。

编辑:

我发现(感谢Blender)我可以用更少的代码来做到这一点:

function WordCount(str) { 
  return str.split(" ").length;
}

console.log(WordCount("hello world"));
javascript
20个回答
78
投票

使用方括号,而不是括号:

str[i] === " "

或者charAt

str.charAt(i) === " "

你也可以用.split()做到这一点:

return str.split(' ').length;

2
投票
function countWords(str) {
    var regEx = /([^\u0000-\u007F]|\w)+/g;  
    return str.match(regEx).length;
}

说明:

/([^\u0000-\u007F]|\w)匹配单词字符 - 这很棒 - >正则表达式为我们带来了沉重的负担。 (这种模式基于以下SO答案:@Landeeyo的https://stackoverflow.com/a/35743562/1806956

+匹配以前指定的单词字符的整个字符串 - 所以我们基本上组合单词字符。

/g意味着它一直在寻找直到最后。

str.match(regEx)返回一个找到的单词数组 - 所以我们计算它的长度。


2
投票

对于那些想要使用Lodash的人可以使用_.words功能:

var str = "Random String";
var wordCount = _.size(_.words(str));
console.log(wordCount);
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.min.js"></script>

2
投票

这是我的方法,它只是按空格分割一个字符串,然后循环数组,如果数组[i]匹配给定的正则表达式模式,则增加计数。

    function wordCount(str) {
        var stringArray = str.split(' ');
        var count = 0;
        for (var i = 0; i < stringArray.length; i++) {
            var word = stringArray[i];
            if (/[A-Za-z]/.test(word)) {
                count++
            }
        }
        return count
    }

像这样调用:

var str = "testing strings here's a string --..  ? // ... random characters ,,, end of string";
wordCount(str)

(添加额外的字符和空格以显示功能的准确性)

上面的str返回10,这是正确的!


1
投票

这是一个计算HTML代码中单词数量的函数:

$(this).val()
    .replace(/((&nbsp;)|(<[^>]*>))+/g, '') // remove html spaces and tags
    .replace(/\s+/g, ' ') // merge multiple spaces into one
    .trim() // trim ending and beginning spaces (yes, this is needed)
    .match(/\s/g) // find all spaces by regex
    .length // get amount of matches

1
投票
let leng = yourString.split(' ').filter(a => a.trim().length > 0).length

1
投票

我不确定之前是否已经说过,或者这是否需要,但是你不能让字符串成为一个数组然后找到它的长度吗?

let randomString = "Random String";

let stringWords = randomString.split(' ');
console.log(stringWords.length);

1
投票

我认为这个答案会给出以下所有解决方案:

  1. 给定字符串中的字符数
  2. 给定字符串中的单词数
  3. 给定字符串中的行数

 function NumberOf() { 
		 var string = "Write a piece of code in any language of your choice that computes the total number of characters, words and lines in a given text. \n This is second line. \n This is third line.";

		 var length = string.length; //No of characters
		 var words = string.match(/\w+/g).length; //No of words
		 var lines = string.split(/\r\n|\r|\n/).length; // No of lines

		 console.log('Number of characters:',length);
		 console.log('Number of words:',words);
		 console.log('Number of lines:',lines);


}

NumberOf();
  1. 首先,你需要通过string.length找到给定字符串的长度
  2. 然后你可以通过匹配字符串string.match(/\w+/g).length找到单词的数量
  3. 最后你可以像这个string.length(/\r\n|\r|\n/).length分开每一行

我希望这可以帮助那些正在寻找这3个答案的人。


0
投票
<textarea name="myMessage" onkeyup="wordcount(this.value)"></textarea>
<script type="text/javascript">
var cnt;
function wordcount(count) {
var words = count.split(/\s/);
cnt = words.length;
var ele = document.getElementById('w_count');
ele.value = cnt;
}
document.write("<input type=text id=w_count size=4 readonly>");
</script>

0
投票

我知道它已经很晚了,但这个正则表达式可以解决你的问题。这将匹配并返回字符串中的单词数。而不是你标记为解决方案的那个,它将空间 - 空间 - 单词计为2个单词,即使它实际上只有1个单词。

function countWords(str) {
    var matches = str.match(/\S+/g);
    return matches ? matches.length : 0;
}

0
投票

您的代码中出现了一些错误。

function WordCount(str) {
    var totalSoFar = 0;
    for (var i = 0; i < str.length; i++) {
        if (str[i] === " ") {
            totalSoFar += 1;
        }
    }
    return totalSoFar + 1; // you need to return something.
}
console.log(WordCount("Random String"));

使用正则表达式还有另一种简单方法:

(text.split(/\b/).length - 1) / 2

精确值可以在1个单词之间有所不同,但它也会计算没有空格的单词边框,例如“word-word.word”。并且它不计算不包含字母或数字的单词。


68
投票

在重新发明轮子之前尝试这些

来自Count number of words in string using JavaScript

function countWords(str) {
  return str.trim().split(/\s+/).length;
}

来自http://www.mediacollege.com/internet/javascript/text/count-words.html

function countWords(s){
    s = s.replace(/(^\s*)|(\s*$)/gi,"");//exclude  start and end white-space
    s = s.replace(/[ ]{2,}/gi," ");//2 or more space to 1
    s = s.replace(/\n /,"\n"); // exclude newline with a start spacing
    return s.split(' ').filter(function(str){return str!="";}).length;
    //return s.split(' ').filter(String).length; - this can also be used
}

来自Use JavaScript to count words in a string, WITHOUT using a regex - 这将是最好的方法

function WordCount(str) {
     return str.split(' ')
            .filter(function(n) { return n != '' })
            .length;
}

来自作者的注释:

您可以调整此脚本以您喜欢的方式计算单词。重要的部分是s.split(' ').length - 这计算空间。脚本尝试在计数之前删除所有额外的空格(双空格等)。如果文本包含两个单词之间没有空格的单词,则将它们计为一个单词,例如“第一句。下句开头”。


0
投票
function totalWordCount() {
  var str ="My life is happy"
  var totalSoFar = 0;

  for (var i = 0; i < str.length; i++)
    if (str[i] === " ") { 
     totalSoFar = totalSoFar+1;
  }
  totalSoFar = totalSoFar+ 1; 
  return totalSoFar
}

console.log(totalWordCount());

17
投票

另一种计算字符串中单词的方法。此代码计算仅包含字母数字字符和“_”,“'”,“ - ”,“'”字符的单词。

function countWords(str) {
  var matches = str.match(/[\w\d\’\'-]+/gi);
  return matches ? matches.length : 0;
}

16
投票

清理字符串后,您可以匹配非空白字符或字边界。

以下是两个简单的正则表达式来捕获字符串中的单词:

  • 非空格字符的序列:/\S+/g
  • 单词边界之间的有效字符:/\b[a-z\d]+\b/g

下面的示例显示了如何使用这些捕获模式从字符串中检索字数。

/*Redirect console output to HTML.*/document.body.innerHTML='';console.log=function(s){document.body.innerHTML+=s+'\n';};
/*String format.*/String.format||(String.format=function(f){return function(a){return f.replace(/{(\d+)}/g,function(m,n){return"undefined"!=typeof a[n]?a[n]:m})}([].slice.call(arguments,1))});

// ^ IGNORE CODE ABOVE ^
//   =================

// Clean and match sub-strings in a string.
function extractSubstr(str, regexp) {
    return str.replace(/[^\w\s]|_/g, '')
        .replace(/\s+/g, ' ')
        .toLowerCase().match(regexp) || [];
}

// Find words by searching for sequences of non-whitespace characters.
function getWordsByNonWhiteSpace(str) {
    return extractSubstr(str, /\S+/g);
}

// Find words by searching for valid characters between word-boundaries.
function getWordsByWordBoundaries(str) {
    return extractSubstr(str, /\b[a-z\d]+\b/g);
}

// Example of usage.
var edisonQuote = "I have not failed. I've just found 10,000 ways that won't work.";
var words1 = getWordsByNonWhiteSpace(edisonQuote);
var words2 = getWordsByWordBoundaries(edisonQuote);

console.log(String.format('"{0}" - Thomas Edison\n\nWord count via:\n', edisonQuote));
console.log(String.format(' - non-white-space: ({0}) [{1}]', words1.length, words1.join(', ')));
console.log(String.format(' - word-boundaries: ({0}) [{1}]', words2.length, words2.join(', ')));
body { font-family: monospace; white-space: pre; font-size: 11px; }

寻找独特的单词

您还可以创建单词映射以获取唯一计数。

function cleanString(str) {
    return str.replace(/[^\w\s]|_/g, '')
        .replace(/\s+/g, ' ')
        .toLowerCase();
}

function extractSubstr(str, regexp) {
    return cleanString(str).match(regexp) || [];
}

function getWordsByNonWhiteSpace(str) {
    return extractSubstr(str, /\S+/g);
}

function getWordsByWordBoundaries(str) {
    return extractSubstr(str, /\b[a-z\d]+\b/g);
}

function wordMap(str) {
    return getWordsByWordBoundaries(str).reduce(function(map, word) {
        map[word] = (map[word] || 0) + 1;
        return map;
    }, {});
}

function mapToTuples(map) {
    return Object.keys(map).map(function(key) {
        return [ key, map[key] ];
    });
}

function mapToSortedTuples(map, sortFn, sortOrder) {
    return mapToTuples(map).sort(function(a, b) {
        return sortFn.call(undefined, a, b, sortOrder);
    });
}

function countWords(str) {
    return getWordsByWordBoundaries(str).length;
}

function wordFrequency(str) {
    return mapToSortedTuples(wordMap(str), function(a, b, order) {
        if (b[1] > a[1]) {
            return order[1] * -1;
        } else if (a[1] > b[1]) {
            return order[1] * 1;
        } else {
            return order[0] * (a[0] < b[0] ? -1 : (a[0] > b[0] ? 1 : 0));
        }
    }, [1, -1]);
}

function printTuples(tuples) {
    return tuples.map(function(tuple) {
        return padStr(tuple[0], ' ', 12, 1) + ' -> ' + tuple[1];
    }).join('\n');
}

function padStr(str, ch, width, dir) { 
    return (width <= str.length ? str : padStr(dir < 0 ? ch + str : str + ch, ch, width, dir)).substr(0, width);
}

function toTable(data, headers) {
    return $('<table>').append($('<thead>').append($('<tr>').append(headers.map(function(header) {
        return $('<th>').html(header);
    })))).append($('<tbody>').append(data.map(function(row) {
        return $('<tr>').append(row.map(function(cell) {
            return $('<td>').html(cell);
        }));
    })));
}

function addRowsBefore(table, data) {
    table.find('tbody').prepend(data.map(function(row) {
        return $('<tr>').append(row.map(function(cell) {
            return $('<td>').html(cell);
        }));
    }));
    return table;
}

$(function() {
    $('#countWordsBtn').on('click', function(e) {
        var str = $('#wordsTxtAra').val();
        var wordFreq = wordFrequency(str);
        var wordCount = countWords(str);
        var uniqueWords = wordFreq.length;
        var summaryData = [
            [ 'TOTAL', wordCount ],
            [ 'UNIQUE', uniqueWords ]
        ];
        var table = toTable(wordFreq, ['Word', 'Frequency']);
        addRowsBefore(table, summaryData);
        $('#wordFreq').html(table);
    });
});
table {
    border-collapse: collapse;
    table-layout: fixed;
    width: 200px;
    font-family: monospace;
}
thead {
    border-bottom: #000 3px double;;
}
table, td, th {
    border: #000 1px solid;
}
td, th {
    padding: 2px;
    width: 100px;
    overflow: hidden;
}

textarea, input[type="button"], table {
    margin: 4px;
    padding: 2px;
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script>

<h1>Word Frequency</h1>
<textarea id="wordsTxtAra" cols="60" rows="8">Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.</textarea><br />
<input type="button" id="countWordsBtn" value="Count Words" />
<div id="wordFreq"></div>

13
投票

我认为这种方法比你想要的多

var getWordCount = function(v){
    var matches = v.match(/\S+/g) ;
    return matches?matches.length:0;
}

5
投票

String.prototype.match返回一个数组,然后我们可以检查长度,

我发现这种方法最具描述性

var str = 'one two three four five';

str.match(/\w+/g).length;

4
投票

到目前为止,我发现的最简单方法是使用带分裂的正则表达式。

var calculate = function() {
  var string = document.getElementById('input').value;
  var length = string.split(/[^\s]+/).length - 1;
  document.getElementById('count').innerHTML = length;
};
<textarea id="input">My super text that does 7 words.</textarea>
<button onclick="calculate()">Calculate</button>
<span id="count">7</span> words

3
投票

@ 7-isnotbad给出的答案非常接近,但不计算单字行。这是修复,它似乎考虑了单词,空格和换行符的每种可能组合。

function countWords(s){
    s = s.replace(/\n/g,' '); // newlines to space
    s = s.replace(/(^\s*)|(\s*$)/gi,''); // remove spaces from start + end
    s = s.replace(/[ ]{2,}/gi,' '); // 2 or more spaces to 1
    return s.split(' ').length; 
}

2
投票

可能有一种更有效的方法来做到这一点,但这对我有用。

function countWords(passedString){
  passedString = passedString.replace(/(^\s*)|(\s*$)/gi, '');
  passedString = passedString.replace(/\s\s+/g, ' '); 
  passedString = passedString.replace(/,/g, ' ');  
  passedString = passedString.replace(/;/g, ' ');
  passedString = passedString.replace(/\//g, ' ');  
  passedString = passedString.replace(/\\/g, ' ');  
  passedString = passedString.replace(/{/g, ' ');
  passedString = passedString.replace(/}/g, ' ');
  passedString = passedString.replace(/\n/g, ' ');  
  passedString = passedString.replace(/\./g, ' '); 
  passedString = passedString.replace(/[\{\}]/g, ' ');
  passedString = passedString.replace(/[\(\)]/g, ' ');
  passedString = passedString.replace(/[[\]]/g, ' ');
  passedString = passedString.replace(/[ ]{2,}/gi, ' ');
  var countWordsBySpaces = passedString.split(' ').length; 
  return countWordsBySpaces;

}

它能够将以下所有内容识别为单独的单词:

abc,abc = 2个字, abc/abc/abc = 3个单词(使用向前和向后斜杠), abc.abc = 2个字, abc[abc]abc = 3个字, abc;abc = 2个字,

(我尝试过的其他一些建议只计算上面的每个例子只有1个字)它还:

  • 忽略所有前导和尾随空格
  • 计算单个字母后跟一个新行,作为一个单词 - 我发现这个页面上给出的一些建议不计算在内,例如: 一个 一个 一个 一个 一个 有时被计为0 x个单词,而其他函数只计为1 x个单词,而不是5 x个单词)

如果有人有任何想法如何改善它,或更清洁/更有效 - 那么请加你2美分!希望这有助于某人出局。

© www.soinside.com 2019 - 2024. All rights reserved.