python 时间复杂度O(1)

ylamdve6 于 6个月前发布在 Python

关注(0)|答案(2)|浏览(62)

我想使用Python 3in place对列表进行排序，没有额外的空间。
据我所知，Python使用sorted(myList)对列表进行排序，这会创建一个新的排序数组，显然会占用O（N）的额外空间。或者使用myList.sort()，它使用Timsort，最坏情况下的空间复杂度也是O（N）。
我搜索了所有的文档，但是没有找到任何用于常数空间算法的内置函数（选择排序，插入排序， shell 排序，堆排序，鸡尾酒排序等）。
我知道我可以找到这些算法的实现，但内置的 * 手动优化 * 实现是我希望找到的最好的。
任何建议都很感激。

python

来源：https://stackoverflow.com/questions/62329870/python-sort-in-constant-space-o1

2条答案

按热度按时间

zysjyyx41#

最好的选择是使用heap sort，它既有合理的时间效率（时间复杂度为 O（n log n）），又有空间效率（空间复杂度保证为 O（1））。
虽然Python有一个实现二进制堆的内置模块heapq，但它只导出执行就地堆排序所需的两个函数之一，heapify，它将列表转换为min堆;另一个必要的函数_siftup，一个将给定起始位置的较小子节点冒泡的函数（以此类推，直到碰到一片叶子为止）不被导出。
如果没有_siftup，只能通过将堆中的最小值弹出到一个新列表中来执行堆排序，这需要 O（n） 的空间复杂度：
heapsort可以通过将所有值推到堆上，然后每次弹出一个最小值来实现：

>>> def heapsort(iterable):
...     h = []
...     for value in iterable:
...         heappush(h, value)
...     return [heappop(h) for i in range(len(h))]
... 
>>> heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

字符串
更重要的是，heapify将列表转换为最小堆，这对于就地排序并不理想，因为我们希望在最后交换较大的排序项，而不是相反。引用自heapq的文档：
我们的pop方法返回最小的项，而不是最大的项（在教科书中称为“最小堆”;“最大堆”在文本中更常见，因为它适合就地排序）。
幸运的是，为了满足heapq.merge在反向模式下的需求，heapq实际上还使用其他未导出的函数实现了最大堆，包括_heapify_max，heapify的最大堆版本，以及_siftup_max，_siftup的最大堆版本。
然而，heapq._siftup_max函数不接受结束位置作为参数，这是限制堆的大小以保留列表末尾已经排序的项所必需的。因此，为了解决缺少结束位置参数的问题，同时保持 O（1） 的空间复杂度，我们可以传递给它一个array.array的切片memoryview，因为你在注解中提到你有“通常是整数，总是可以容纳在内存中”，它可以很容易地被加载为'q'类型的array（64位有符号整数）。
但是，heapq模块将尝试从其C实现_heapq（如果可用）导入，_heapq._heapify_max将专门将参数验证为list并拒绝array。heapq._heapify_max的Python实现没有此限制，所以要导入它，我们需要先自己导入_heapq，然后从_heapq模块对象中删除_heapify_max，这样当我们导入heapq时，heapq的_heapify_max就不会被覆盖：

import sys
import _heapq
del sys.modules['_heapq']._heapify_max
from heapq import _heapify_max, _siftup_max

型
下面是如何使用heapq的内置函数执行堆排序，首先使用_heapify_max将数组堆化，然后迭代地将根处的最大数与末尾的叶子交换，并使用_siftup_max筛选它，直到它的所有子元素都较小：

def heapsort(arr):
    _heapify_max(arr)
    view = memoryview(arr)
    for size in reversed(range(1, len(arr))):
        arr[0], arr[size] = arr[size], arr[0]
        _siftup_max(view[:size], 0)

型
或者使用min heap执行堆排序，你必须在最后反转结果：

import sys
import _heapq
del sys.modules['_heapq'].heapify
from heapq import heapify, _siftup
from array import array

def heapsort(arr):
    view = memoryview(arr)
    heapify(view)
    for size in reversed(range(1, len(arr))):
        view[0], view[size] = view[size], view[0]
        _siftup(view[:size], 0)
    arr.reverse()

型
以便：

arr = array('q', [1, 5, 0, 7, 3, 6, 9, 8, 2, 4])
heapsort(arr)
print(arr)

型
产出：

array('q', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

型
此处演示max heap和min heap
或者，正如@KellyBundy在评论中指出的那样，我们可以通过使用代理对象来解决heapq._siftup缺少结束位置参数的问题，该代理对象在切片时通过人为设置的size属性来限制堆的大小，并在调用其__len__方法时将该属性报告为堆的长度。
除了允许任何列表作为输入之外，与memoryview相比，这种方法的额外好处是我们不需要首先导入_heapq来删除heapify，因为我们现在可以向它传递实际的列表：

from heapq import heapify, _siftup

class heapsort:
    def __init__(self, lst):
        self.list = lst
        heapify(lst)
        for self.size in reversed(range(1, len(lst))):
            lst[0], lst[self.size] = lst[self.size], lst[0]
            _siftup(self, 0)
        lst.reverse()

    def __len__(self):
        return self.size

    def __getitem__(self, index):
        return self.list[index]

    def __setitem__(self, index, value):
        self.list[index] = value

型
在这里演示proxy object
类似地，我们可以使用代理对象使基于heappop的堆排序就地工作，方法是使代理对象仅返回堆末尾的项，而不是在弹出时实际删除它。
但是，在这种情况下，heappop的C实现不仅会将代理对象验证为list对象（这可以通过使代理对象继承list来弥补），但也可以直接将其长度作为C属性访问，而不是调用我们重写的__len__方法，所以我们必须删除C实现来调用Python版本。
这种方法的好处是坚持使用公开可用的API，因此最不容易受到heapq实现更改的影响：

import sys
import _heapq
del sys.modules['_heapq'].heappop
from heapq import heapify, heappop

class heapsort:
    def __init__(self, lst):
        heapify(lst)
        self.list = lst
        for self.size in range(len(lst), 0, -1):
            lst[self.size - 1] = heappop(self)
        lst.reverse()

    def __len__(self):
        return self.size

    def __getitem__(self, index):
        return self.list[index]

    def __setitem__(self, index, value):
        self.list[index] = value

    def pop(self):
        return self.list[self.size - 1]

型
在这里演示heappop
最后，请注意，如果heapq完全改变了它的实现，以至于上面的方法都不起作用（同样，特别是对于基于heappop的解决方案，这种可能性非常小），那么您总是可以从头开始实现堆排序：

def heapsort(lst):
    for child in range(1, length := len(lst)):
        while lst[child] > lst[parent := int((child - 1) / 2)]:
            lst[child], lst[child := parent] = lst[parent], lst[child]
    for size in range(length - 1, 0, -1):
        lst[child := 0], lst[size] = lst[size], lst[0]
        while (parent := child) < size:
            if (child := 2 * parent + 1) < size - 1 and lst[child] < lst[child + 1]:
                child += 1
            if child < size and lst[parent] < lst[child]:
                lst[parent], lst[child] = lst[child], lst[parent]

型

赞(0）回复(0）举报 6个月前

llycmphe2#

你提到了插入排序。这里有一个简单而相当快的，几乎肯定只需要O（1）空间：

from bisect import insort

for i in range(len(a)):
    insort(a, a.pop(i), 0, i)

字符串
如果列表被创建为一个更长的列表，那么你删除的元素正好是pop（）导致的realloc，如果realloc移动到不同的内存区域，这可能需要超过O（1）的空间。或者如果CPython改变了它的分配策略，或者你使用了不同的Python实现。但是在任何情况下，我认为这不太可能比在分配的空间内转移内存更多。
pops/insert确实需要线性时间，但这是快速的低级内存移动。因此，虽然这确实使整个事情花费O（n²）时间，但这是相对快速的O（n²），即一个相当小的隐藏常数。插入点是用二分搜索找到的，所以这只是O（n log n）比较。
通过一些测试（Attempt This Online!）：

import random
from bisect import insort

def sort(a):
    for i in range(len(a)):
        insort(a, a.pop(i), 0, i)

for _ in range(5):
    a = random.choices(range(10000), k=10000)
    expect = sorted(a)
    sort(a)
    print(a == expect)

型

赞(0）回复(0）举报 6个月前

我来回答

python 时间复杂度O(1)

2条答案

相关问题

热门标签

最新问答