unix 如何将LF转换为CRLF？

lpwwtiir 于 4个月前发布在 Unix

关注(0)|答案(5)|浏览(60)

我在网上找到了一个大部分英语单词的列表，但是换行符是unix风格的（用Unicode编码：UTF-8）。
我如何将换行符转换为CRLF，这样我就可以覆盖它们了？我将使用它们的程序会遍历文件中的每一行，所以每行必须有一个单词。
这是文件的一部分：bitbackbitebackbiterbackbitersbackbitesbackbitingbackbittenbackboard
它应该是：

bit
backbite
backbiter
backbiters
backbites
backbiting
backbitten
backboard

字符串
如何将我的文件转换为这种类型？注意：它是26个文件（每个字母一个），总共有80，000字左右（所以程序应该非常快）。
我不知道从哪里开始，因为我从来没有与unicode工作。提前感谢！
使用rU作为参数（如建议的那样），在我的代码中这样做：

with open(my_file_name, 'rU') as my_file:
    for line in my_file:
        new_words.append(str(line))
my_file.close()

型
我得到这个错误：

Traceback (most recent call last):
  File "<pyshell#5>", line 1, in <module>
    addWords('B Words')
  File "D:\my_stuff\Google Drive\documents\SCHOOL\Programming\Python\Programming Class\hangman.py", line 138, in addWords
    for line in my_file:
  File "C:\Python3.3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 7488: character maps to <undefined>

型
有人能帮我吗？

unix

来源：https://stackoverflow.com/questions/13954840/how-do-i-convert-lf-to-crlf

5条答案

按热度按时间

fjnneemd1#

而不是转换，你应该能够使用Python的通用换行符支持打开文件：

f = open('words.txt', 'rU')

字符串
(Note U）。

赞(0）回复(0）举报 4个月前

m1m5dgzv2#

你可以使用字符串的replace方法。比如

txt.replace('\n', '\r\n')

字符串
编辑：
在你的情况下：

with open('input.txt') as inp, open('output.txt', 'w') as out:
    txt = inp.read()
    txt = txt.replace('\n', '\r\n')
    out.write(txt)

型

赞(0）回复(0）举报 4个月前

qco9c6ql3#

你不需要转换文件中的行结束符就可以覆盖它们。正如NPE所建议的，只需使用python的通用换行符模式。
发生UnicodeDecodeError是因为您正在处理的文件被编码为UTF-8，并且当您尝试通过str(line)将内容从字节解码为字符串时，Python正在使用cp1252编码将从文件读取的字节转换为Python 3字符串（即一系列的unicode代码点）。但是，这些文件中有一些字节不能用cp1252编码解码，这会导致UnicodeDecodeError。
如果你将str(line)改为line.decode('utf-8')，你将不会再得到UnicodeDecodeError。查看Text Vs. Data而不是Unicode Vs. 8-bit writeup了解更多细节。
最后，你可能也会发现Joel Spolsky的The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)很有用。

赞(0）回复(0）举报 4个月前

stszievb4#

您可以使用Cereja包

pip install cereja==1.2.0

个字符
或
第一个月
您可以替代任何标准。请参阅filetools模块

赞(0）回复(0）举报 4个月前

ff29svar5#

2023年：

这是一个非常古老的问题，对于Python 2有非常古老的答案
但是似乎open()中的选项U在Python 3中被弃用（甚至被删除）。
在Python 3中，您可以在open()中使用newline="\r\n"来写入\r\n
(and不需要replace()）

with open('input.txt') as file_in:
    txt = file_in.read()

with open('output.txt', 'w', newline='\r\n') as file_out:
    file_out.write(txt)

字符串
但问题中的错误显示cp1252.py也有问题
这意味着它试图将其读取为cp1252，而不是utf-8。
在Python 3中，可能需要在open()中使用encoding="utf-8"

with open('input.txt', encoding="utf-8") as file_in:
    txt = file_in.read()

型
Doc：open（）
您也可以在open（rb，wb）中使用bytes-mode而不是text-mode
它将需要replace()与字节b"\r"和b"\r\n"

with open('input.txt', 'rb') as file_in:
    bytes_data = file_in.read()

bytes_data = bytes_data.replace(b'\n', b'\r\n')

with open('output.txt', 'wb') as file_out:
    file_out.write(bytes_data)

型
它不关心文件是否使用utf-8或cp1252
但是我不知道在utf-8或者cp1252中是否有些字符没有使用\n作为字符代码的一部分，这可能会在文本中产生错误。

赞(0）回复(0）举报 4个月前

我来回答

unix 如何将LF转换为CRLF？

5条答案

相关问题

热门标签

最新问答