Allocator内存不足-如何从TensorFlow数据集中清除GPU内存？

6ioyuze2 于 8个月前发布在其他

关注(0)|答案(2)|浏览(94)

假设一个形状为(4559552, 13, 22)的Numpy数组X_train，下面的代码：

train_dataset = tf.data.Dataset \
    .from_tensor_slices((X_train, y_train)) \
    .shuffle(buffer_size=len(X_train) // 10) \
    .batch(batch_size)

一次就行当我重新运行它时（在对X_train进行轻微修改后），由于内存不足，它会触发InternalError：

2021-12-19 15:36:58.460497: W tensorflow/core/common_runtime/bfc_allocator.cc:457]
Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.71GiB requested by op _EagerConst

似乎第一次，它发现100%可用的GPU内存，所以一切正常，但随后的时间，GPU内存几乎已经满了，因此错误。
据我所知，似乎简单地从旧的train_dataset中清除GPU内存就足以解决这个问题，但我在TensorFlow中找不到任何方法来实现这一点。目前，重新分配数据集的唯一方法是杀死Python内核并重新运行所有内容。
有没有一种方法可以避免从头开始重新启动Python内核，而是释放GPU内存，以便将新的数据集加载到其中？
数据集不需要完整的GPU内存，所以我会考虑切换到TFRecord解决方案作为一个非理想的解决方案（因为它带来了额外的复杂性）。

tensorflow

来源：https://stackoverflow.com/questions/70415426/allocator-ran-out-of-memory-how-to-clear-gpu-memory-from-tensorflow-dataset

2条答案

按热度按时间

t0ybt7op1#

尝试对GPU总内存设置硬限制，如此处所示

import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

赞(0）回复(0）举报 8个月前

xn1cxnb42#

做一个垃圾收集似乎可以解决这个问题，而不必重新启动内核。

import gc
gc.collect()

赞(0）回复(0）举报 8个月前

我来回答

Allocator内存不足-如何从TensorFlow数据集中清除GPU内存？

2条答案

相关问题

热门标签

最新问答