Node Parser Usage Pattern
Node parsers are a simple abstraction that take a list of documents, and chunk them into Node objects, such that each node is a specific chunk of the parent document. When a document is broken into nodes, all of itβs attributes are inherited to the children nodes (i.e. metadata, text and metadata templates, etc.). You can read more about Node and Document properties here.
Getting Started
Section titled βGetting StartedβStandalone Usage
Section titled βStandalone UsageβNode parsers can be used on their own:
from llama_index.core import Documentfrom llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents( [Document(text="long text")], show_progress=False)Transformation Usage
Section titled βTransformation UsageβNode parsers can be included in any set of transformations with an ingestion pipeline.
from llama_index.core import SimpleDirectoryReaderfrom llama_index.core.ingestion import IngestionPipelinefrom llama_index.core.node_parser import TokenTextSplitter
documents = SimpleDirectoryReader("./data").load_data()
pipeline = IngestionPipeline(transformations=[TokenTextSplitter(), ...])
nodes = pipeline.run(documents=documents)Index Usage
Section titled βIndex UsageβOr set inside a transformations or global settings to be used automatically when an index is constructed using .from_documents():
from llama_index.core import SimpleDirectoryReader, VectorStoreIndexfrom llama_index.core.node_parser import SentenceSplitter
documents = SimpleDirectoryReader("./data").load_data()
# globalfrom llama_index.core import Settings
Settings.text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)
# per-indexindex = VectorStoreIndex.from_documents( documents, transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)],)Modules
Section titled βModulesβSee the full modules guide.