Question

Я пытаюсь разбить эту строку, чтобы потом я мог посчитать, сколько слов той же длины он содержит в карте, а потом уменьшит.

Например, для предложения

Предполагая, что Истина - женщина - что тогда?Я получу -

[
  {length:”1”, number:”1”}, 
  {length:”2”, number:”1”},
  {length:”4”, number:”3”},
  {length:”5”, number:”2”},
  {length:”9”, number:”1”}
]

Как я могу это сделать?

dnickless · Answer 1 · 17 мая 2018

Ответ на ваш вопрос во многом зависит от вашего определения, что такое слово .Если это последовательная последовательность только из символов AZ или az, то это совершенно сумасшедший подход, который, однако, дает вам именно тот результат, который вы запрашиваете.

То, что делает этот код, действительно

Разбор входной строки, чтобы исключить несовпадающие символы (то есть все, что не является ни AZ, ни az).
Объединить полученную в результате очищенную строку, которая будет содержать только действительные символы.
Разделите полученную строку на пробел.
Рассчитайте длину всех найденных слов.
Группируйте по длине и количеству экземпляров.
Некоторое украшение вывода.

Учитывая следующий входной документ

{
    "text" : "SUPPOSING that Truth is a woman--what then?"
}

, следующий конвейер

db.collection.aggregate({
    $project: { // lots of magic to calulate an array that will hold the lengths of all words
        "lengths": {
            $map: { // translate a given word into its length
                input: {
                    $split: [ // split cleansed string by space character
                        { $reduce: { // join the characters that are between A and z
                                input: {
                                    $map: { // to traverse the original input string character by character
                                        input: {
                                            $range: [ 0, { $strLenCP: "$text" } ] // we wamt to traverse the entire string from index 0 all the way until the last character
                                        },
                                        as: "index",
                                        in: {
                                            $let: {
                                                vars: {
                                                    "char": { // temp. result which will be reused several times below
                                                        $substrCP: [ "$text", "$$index", 1 ] // the single character we look at in this loop
                                                    }
                                                },
                                                in: {
                                                    $cond: [ // some value that depends on whether the character we look at is between 'A' and 'z'
                                                        { $and: [
                                                            { $eq: [ { $cmp: [ "$$char", "@" /* ASCII 64,  65  would be 'A' */] },  1 ] }, // is our character greater than or equal to 'A'
                                                            { $eq: [ { $cmp: [ "$$char", "{" /* ASCII 123, 122 would be 'z' */] }, -1 ] }  // is our character less than    or equal to 'z' 
                                                        ]},
                                                        '$$char', // in which case that character will be taken
                                                        ' ' // and otherwise a space character to add a word boundary
                                                    ]
                                                }
                                            }
                                        }
                                    }
                                },
                                initialValue: "", // starting with an empty string
                                in: {
                                    $concat: [ // we join all array values by means of concatenating
                                        "$$value", // the current value with
                                        "$$this"
                                    ]
                                }
                            }
                        },
                        " "
                    ]
                },
                as: "word",
                in: {
                    $strLenCP: "$$word" // we map a word into its length, e.g. "the" --> 3
                }
            }
        }
    }
}, {
    $unwind: "$lengths" // flatten the array which holds all our word lengths
}, {
    $group: {
        _id : "$lengths", // group by the length of our words
        "number": { $sum: 1 }  // count number of documents per group
    } 
}, {
    $match: {
        "_id": { $ne: 0 } // $split might leave us with strings of length 0 which we do not want in the result
    }
}, {
    $project: {
        "_id": 0, // remove the "_id" field
        "length" : "$_id", // length is our group key
        "number" : "$number" // and this is the number of findings
    }
}, {
    $sort: { "length": 1 } // sort by length ascending
})

даст желаемый результат

[
    { "length" : 1, "number" : 1.0 },
    { "length" : 2, "number" : 1.0 },
    { "length" : 4, "number" : 3.0 },
    { "length" : 5, "number" : 2.0 },
    { "length" : 9, "number" : 1.0 }
]

Vicctor · Answer 2 · 16 мая 2018

Эта выборка агрегации будет считать слова одинаковой длины. Надеюсь, это поможет вам:

db.some.remove({})
db.some.save({str:"red brown fox jumped over the hil"})

var res = db.some.aggregate(
    [
    { $project : { word : { $split: ["$str", " "] }} },
    { $unwind : "$word" },
    { $project : { len : { $strLenCP: "$word" }} },
    { $group : { _id : { len : "$len"}, same: {$push:"$len"}}},
    { $project : { len : "$len", count : {$size : "$same"} }}
    ]
)

printjson(res.toArray());

Как разделить строку более чем на один символ в mongoDB

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Как разделить строку более чем на один символ в mongoDB

Пожалуйста, войдите или зарегистрируйтесь чтобы ответить на этот вопрос.

Ответы [ 2 ]

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Пожалуйста, войдите или зарегистрируйтесь что бы добавить комментарий.

Нет похожих вопросов