有什么更好的方法来设计随机生成器的内置统计数据？

Question

上下文：随机句子生成器

1）函数generateSentence()生成以字符串形式返回的随机句子（工作正常）

2）函数calculateStats()输出上述函数理论上可以生成的唯一字符串的数量（在这个模型中也能正常工作，所以一定要阅读免责声明，我不想浪费你的时间）

3）generateStructure()函数和Dictionnary.lists中的单词列表随着时间的推移不断增长

主生成器功能的快速模型：

function generateSentence() {
  var words = [];
  var structure = generateStructure();

  structure.forEach(function(element) {
    words.push(Dictionnary.getElement(element));
  });

  var fullText = words.join(" ");
  fullText = fullText.substring(0, 1).toUpperCase() + fullText.substring(1);
  fullText += ".";
  return fullText;
}

var Dictionnary = {
  getElement: function(listCode) {
    return randomPick(Dictionnary.lists[listCode]);
  },
  lists: {
    _location: ["here,", "at my cousin's,", "in Antarctica,"],
    _subject: ["some guy", "the teacher", "Godzilla"],
    _vTransitive: ["is eating", "is scolding", "is seeing"],
    _vIntransitive: ["is working", "is sitting", "is yawning"],
    _adverb: ["slowly", "very carefully", "with a passion"],
    _object: ["this chair", "an egg", "the statue of Liberty"],
  }
}

// returns an array of strings symbolizing types of sentence elements
// example : ["_location", "_subject", "_vIntransitive"]
function generateStructure() {
  var str = [];

  if (dice(6) > 5) {// the structure can begin with a location or not
    str.push("_location");
  }

  str.push("_subject");// the subject is mandatory

  // verb can be of either types
  var verbType = randomPick(["_vTransitive", "_vIntransitive"]);
  str.push(verbType);

  if (dice(6) > 5) {// adverb is optional
    str.push("_adverb");
  }

  // the structure needs an object if the verb is transitive
  if (verbType == "_vTransitive") {
    str.push("_object");
  }

  return str;
}

// off-topic warning! don't mind the implementation here,
// just know it's a random pick in the array
function randomPick(sourceArray) {
  return sourceArray[dice(sourceArray.length) - 1];
}

// Same as above, not the point, just know it's a die roll (random integer from 1 to max)
function dice(max) {
  if (max < 1) { return 0; }
  return Math.round((Math.random() * max) + .5);
}

在某些时候，我想知道它可以输出多少不同的独特字符串，我写了类似的东西（再次，非常简化）：

function calculateStats() {// the "broken leg" function I'm trying to improve/replace
  var total = 0;
  // lines above : +1 to account for 'no location' or 'no adverb'
  var nbOfLocations = Dictionnary.lists._location.length + 1;
  var nbOfAdverbs = Dictionnary.lists._adverb.length + 1;

  var nbOfTransitiveSentences = 
    nbOfLocations *
    Dictionnary.lists._vTransitive.length *
    nbOfAdverbs *
    Dictionnary.lists._object.length;
  var nbOfIntransitiveSentences =
    nbOfLocations *
    Dictionnary.lists._vIntransitive.length *
    nbOfAdverbs;

  total = nbOfTransitiveSentences + nbOfIntransitiveSentences;
  return total;
}

（旁注：不要担心命名空间污染，对输入参数进行类型检查，或者这类事情，为了清楚起见，这被认为是在泡沫中。）

重要免责声明：这不是修复我发布的代码。这是一个模型，它可以正常工作。真正的问题是“随着未来可能结构的复杂性，以及列表的大小和多样性，计算这些类型的随机结构的统计数据的更好的策略，而不是我笨拙的calculateStats()函数，难以维护，可能处理天文数字大*，并容易出错？“

*在实际工具中，此刻有351 120个独特的结构，对于句子......已经超过总数（已经超过10次幂80）。

Answer 1

由于你的句子结构发生了很大变化（在这个小例子中它确实发生了变化，我无法想象它在实际代码中有多大变化），我会做类似的事情：

首先，我需要以某种方式保存给定Dictionary存在的所有可能的句子结构...也许我会创建一个Language对象，其中有一个Dictionary作为属性，我可以添加可能的句子结构（这部分可能可以进行优化，并找到一种更加程序化的方式来生成所有可能的句子结构，比如规则引擎。句子结构是什么意思？那么，按照你的例子我会将句子结构调用到下一个：

[ 'location', 'transitive-verb', 'adverb', 'object' ] < - Transitive sentence
[ 'location', 'instransitive-verb', 'adverb' ] <- Intransitive sentence

你可能会找到一种生成这种结构的方法......或者对它们进行硬编码。

但是......为什么我认为这可以改善你计算统计数据的方式？因为您通过使用map / reduce操作最小化每个句子的硬编码并使其更具可扩展性。

又怎样？

想象一下，我们的结构可以在全局范围内访问，也可以通过对象或字典本身访问：

// Somewhere in the code
const structures = [
  [ 'location', 'transitive-verb', 'adverb', 'object' ],
  [ 'location', 'instransitive-verb', 'adverb' ] 
];
...
// In this example I just passed it as an argument
function calculateStats(structures) {
  const numberOfCombinations = structures.reduce((total, structure) => {
      // We should calculate the number of combinations a structure has
      const numberOfSentences = structure.reduce((acc, wordType) => {
          // For each word type, we access the list and get the lenght (I am not doing safety checks for any wordType)
          return acc + Dictionary.lists[wordType].length
      }, 0);//Initial accumulator

      return total + numberOfSentences;
  }, 0); // Initial accumulator
  return numberOfCombinations;
}

因此，我们将使用迭代不同结构的功能，而不是硬编码每个可能的组合，所以你基本上只需要添加结构，你的calculateStats函数不应该增长。

如果需要进行更复杂的计算，则需要更改reducers中使用的函数。

我对语法或句法分析知之甚少，所以可能你有了更多的知识，可以找到让它更简单或做“更聪明的计算”的方法。

我自由地用ES6-ish风格写它，如果减少对你来说是一种奇怪的动物，you can read more here或使用lodash / ramda /无论^^

有什么更好的方法来设计随机生成器的内置统计数据？

问题描述投票：0回答：1

1个回答

最新问题

有什么更好的方法来设计随机生成器的内置统计数据？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1