str.splitlines(keepends=False)
有时想处理一个在边界处具有不同换行符(’\n’、\n\n’、’\r’、’\r\n’)的语料库。要拆分成句子,而不是单个单词。可以使用 splitline 方法来执行此操作。当 keepends=True 时,文本中包含换行符;否则它们被排除在外
import nltk # You may have to `pip install nltk` to use this library.
macbeth = nltk.corpus.gutenberg.raw('shakespeare-macbeth.txt')
print(macbeth.splitlines(keepends=True)[:5])
Output:
['[The Tragedie of Macbeth by William Shakespeare 1603]\n', '\n', '\n', 'Actus Primus. Scoena Prima.\n', '\n']