Sentence boundaries with spaCy
Every Doc
instance (a parsed document) in spaCy supports a .sents
iterator, which can be used to iterate over the spans of sentences. spaCy docs: sentence iteration
import spacy
# Set up the data.
# ~~~~~~~~~~~~~~~~
document = """\
Here is a string with multiple sentences.
I enjoy eating pizza and cheeseburgers.
Though typically not simultaneously.
"""
# Load a language model and parse a document.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nlp = spacy.load('en')
doc = nlp(document)
# Iterate over the sentences.
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~
for sentence in doc.sents:
print('A sentence: %s' % sentence)
Notes
Each sentence is provided as a Span
instance, and includes the same tokens as those found in the original Doc
instance.
You can iterate over the tokens of a sentence using the same indexing and slicing you use for Doc
objects (e.g., for token in sentence: ...
).