Basic Parsing¶
Opening a file¶
CrystalXMLSource accepts a file path (string or pathlib.Path):
from crxml import CrystalXMLSource
# path string
src = CrystalXMLSource("report.xml")
# pathlib.Path
from pathlib import Path
src = CrystalXMLSource(Path("report.xml"))
Parameters¶
| Param | Type | Default | Description |
|---|---|---|---|
source |
str \| Path |
— | Path to CR XML file |
row_tag |
str |
"Row" |
XML tag for each record row |
The row_tag parameter lets you target a different repeating element if your
CR XML uses a non-standard tag name.
Iteration¶
CrystalXMLSource is iterable. Each row is a dict[str, str]:
Keys are the FieldName attribute values from the CR XML (e.g.
{Report.InvoiceNo}). Values are the raw text of the first
<FormattedValue> or <Value> child element.
Schema inspection¶
Call .schema() to discover fields without consuming the stream:
The source yields rows internally and caches them, so the first batch is not
lost. .schema() is safe to call before building a pipeline.
Memory model¶
The parser streams the file in constant memory. The Rust backend reuses
internal buffers across rows and never materializes the full document.
RSS scales with file content (22 MB for 10 MB, 75 MB for 100 MB),
staying well below file size. pandas is imported lazily — memory climbs
only when to_dataframe is called.
CR XML layout detection¶
Crystal Reports XML stores field values in two patterns:
- Attribute style:
<Field FieldName="{Report.Amount}"><Value>123.45</Value></Field> - Element style:
<Field><FieldName>{Report.Amount}</FieldName><Value>123.45</Value></Field> - Mixed: some fields use attributes, others use child elements
The parser detects both styles automatically, no configuration needed.