python-3.x 字符串中最常见的字符

des4xlb0  于 4个月前  发布在  Python
关注(0)|答案(4)|浏览(90)

写一个函数,它接受一个由字母字符组成的字符串作为输入参数,并返回最常见的字符。忽略白色空格,即不将任何白色空格算作字符。注意,这里的大写并不重要,即小写字符等于大写字符。如果某些字符之间有联系,则返回最后一个计数最多的字符

这是更新的代码

def most_common_character (input_str):
    input_str = input_str.lower()
    new_string = "".join(input_str.split())
    print(new_string)
    length = len(new_string)
    print(length)
    count = 1
    j = 0
    higher_count = 0
    return_character = ""
    for i in range(0, len(new_string)):
        character = new_string[i]
        while (length - 1):
            if (character == new_string[j + 1]):
                count += 1
            j += 1
            length -= 1    
            if (higher_count < count):
                higher_count = count
    return (character)     

#Main Program
input_str = input("Enter a string: ")
result = most_common_character(input_str)
print(result)

字符串
上面是我的代码。我得到了一个错误string index out of bound,我不明白为什么。而且代码只检查第一个字符的出现,我不知道如何继续到下一个字符,并采取最大计数?

运行代码时出现的错误:

> Your answer is NOT CORRECT Your code was tested with different inputs.
> For example when your function is called as shown below:
> 
> most_common_character ('The cosmos is infinite')
> 
> ############# Your function returns ############# e The returned variable type is: type 'str'
> 
> ######### Correct return value should be ######## i The returned variable type is: type 'str'
> 
> ####### Output of student print statements ###### thecosmosisinfinite 19

h9vpoimq

h9vpoimq1#

您可以使用正则表达式模式来搜索所有字符。\w匹配任何字母数字字符和下划线;这等效于集合[a-zA-Z0-9_][\w]之后的+表示匹配一个或多个重复。
最后,使用Counter对它们进行求和,然后使用most_common(1)得到最大值。

from collections import Counter
import re

s = "Write a function that takes a string consisting of alphabetic characters as input argument and returns the most common character. Ignore white spaces i.e. Do not count any white space as a character. Note that capitalization does not matter here i.e. that a lower case character is equal to a upper case character. In case of a tie between certain characters return the last character that has the most count"

>>> Counter(c.lower() for c in re.findall(r"\w", s)).most_common(1)
[('t', 46)]

字符串
在平局的情况下,这是一个有点棘手。

def top_character(some_string):
    joined_characters = [c for c in re.findall(r"\w+", some_string.lower())]
    d = Counter(joined_characters)
    top_characters = [c for c, n in d.most_common() if n == max(d.values())]
    if len(top_characters) == 1:
        return top_characters[0]
    reversed_characters = joined_characters[::-1]  
    for c in reversed_characters:
        if c in top_characters:
            return c

>>> top_character(s)
't'

>>> top_character('the the')
'e'


在上面的代码和句子“宇宙是无限的”中,你可以看到'i'比'e'(函数的输出)出现得更频繁:

>>> Counter(c.lower() for c in "".join(re.findall(r"[\w]+", 'The cosmos is infinite'))).most_common(3)
[('i', 4), ('s', 3), ('e', 2)]


你可以在代码块中看到这个问题:

for i in range(0, len(new_string)):
    character = new_string[i]
    ...
return (character)


你正在遍历一个句子,并将该字母赋给变量character,而变量character不会在其他地方重新赋值。因此变量character将始终返回字符串中的最后一个字符。

cu6pst1q

cu6pst1q2#

实际上,你的代码几乎是正确的。你需要把countjlength移到for i in range(0, len(new_string))里面,因为你需要在每次迭代中重新开始,而且如果count大于higher_count,你需要保存charater作为return_character并返回它,而不是character,因为character = new_string[i]总是字符串的最后一个字符。
我不明白你为什么要用j+1while length-1。在纠正它们之后,它现在也包括领带的情况。

def most_common_character (input_str):
    input_str = input_str.lower()
    new_string = "".join(input_str.split())
    higher_count = 0
    return_character = ""

    for i in range(0, len(new_string)):
        count = 0
        length = len(new_string)
        j = 0
        character = new_string[i]
        while length > 0:
            if (character == new_string[j]):
                count += 1
            j += 1
            length -= 1    
            if (higher_count <= count):
                higher_count = count
                return_character = character
    return (return_character)

字符串

lf5gs5x2

lf5gs5x23#

如果我们忽略“tie”要求; collections.Counter()可以工作:

from collections import Counter
from itertools import chain

def most_common_character(input_str):
    return Counter(chain(*input_str.casefold().split())).most_common(1)[0][0]

字符串
范例:

>>> most_common_character('The cosmos is infinite')
'i'
>>> most_common_character('ab' * 3)
'a'


为了返回最后一个计数最多的字符,我们可以使用collections.OrderedDict

from collections import Counter, OrderedDict
from itertools import chain
from operator import itemgetter

class OrderedCounter(Counter, OrderedDict):
    pass

def most_common_character(input_str):
    counter = OrderedCounter(chain(*input_str.casefold().split()))
    return max(reversed(counter.items()), key=itemgetter(1))[0]


范例:

>>> most_common_character('ab' * 3)
'b'


注意事项:此解决方案假设max()返回计数最多的第一个字符(因此有一个reversed()调用,以获取最后一个),并且所有字符都是单个Unicode码点。通常,您可能希望使用\X正则表达式(由regex module支持),从Unicode字符串中提取“用户感知的字符”(eXtended grapheme cluster)。

wfsdck30

wfsdck304#

from collections import Counter

def most_common_character(s):
    char_count = Counter(s)
    return char_count.most_common(1)[0][0]

# Example
string21 = "programming is fun"
common_char = most_common_character(string21)
print("Most Common Character:", common_char)

字符串

输出= r

相关问题