Sentence boundaries with spaCy

Every Doc instance (a parsed document) in spaCy supports a .sents iterator, which can be used to iterate over the spans of sentences. spaCy docs: sentence iteration

Language: Python 3
Library: spacy

import spacy

# Set up the data.
# ~~~~~~~~~~~~~~~~

document = """\
Here is a string with multiple sentences.
I enjoy eating pizza and cheeseburgers.
Though typically not simultaneously.
"""


# Load a language model and parse a document.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

nlp = spacy.load('en')
doc = nlp(document)


# Iterate over the sentences.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~

for sentence in doc.sents:
    print('A sentence: %s' % sentence)

Notes

Each sentence is provided as a Span instance, and includes the same tokens as those found in the original Doc instance.

You can iterate over the tokens of a sentence using the same indexing and slicing you use for Doc objects (e.g., for token in sentence: ...).

Notes

See Also