DataStore I/O Operations
DataStore supports reading from and writing to various file formats and data sources.
Reading Data
CSV Files
Examples:
Parquet Files
Recommended for large datasets - columnar format with better compression.
Examples:
JSON Files
Examples:
Excel Files
Examples:
SQL Databases
Examples:
Other Formats
Writing Data
to_csv
Export to CSV format.
Examples:
to_parquet
Export to Parquet format (recommended for large data).
Examples:
to_json
Export to JSON format.
Examples:
to_excel
Export to Excel format.
Examples:
to_sql
Export to SQL database or generate SQL string.
Examples:
Other Export Methods
File Format Comparison
| Format | Read Speed | Write Speed | File Size | Schema | Best For |
|---|---|---|---|---|---|
| Parquet | Fast | Fast | Small | Yes | Large datasets, analytics |
| CSV | Medium | Fast | Large | No | Compatibility, simple data |
| JSON | Slow | Medium | Large | Partial | APIs, nested data |
| Excel | Slow | Slow | Medium | Partial | Sharing with non-tech users |
| Feather | Very Fast | Very Fast | Medium | Yes | Inter-process, pandas |
Recommendations
-
For analytics workloads: Use Parquet
- Columnar format allows reading only needed columns
- Excellent compression
- Preserves data types
-
For data exchange: Use CSV or JSON
- Universal compatibility
- Human-readable
-
For pandas interop: Use Feather or Arrow
- Fastest serialization
- Type preservation
Compression Support
Reading Compressed Files
Writing Compressed Files
Compression Options
| Compression | Speed | Ratio | Use Case |
|---|---|---|---|
snappy | Very Fast | Low | Default for Parquet |
lz4 | Very Fast | Low | Speed priority |
gzip | Medium | High | Compatibility |
zstd | Fast | Very High | Best balance |
bz2 | Slow | Very High | Maximum compression |
Streaming I/O
For very large files that don't fit in memory:
Chunked Reading
Using ClickHouse Streaming
Remote Data Sources
HTTP/HTTPS
S3
GCS, Azure, HDFS
See Factory Methods for cloud storage options.