.. image:: images/logo.png :width: 250 :alt: pyxet logo Pyxet Documentation =================== pyxet is a Python library that provides a pythonic interface for `XetHub `_. Xethub is simple git-based system capable of storing TBs of ML data and models in a single repository, with block-level data deduplication that enables hundreds of versions of similar data to be stored without requiring much storage. Join our `Discord `_ to get involved. To stay informed about updates, star this repo and sign up for `XetHub `_ to get the newsletter. Features -------- pyxet provides 2 components: 1. A `fsspec `_ interface that allows compatible libraries such as Pandas, Polars and Duckdb to directly access any version of any file in a Xet repository. See below for some examples. 2. A command line interface inspired by AWSCLI that allows files to be uploaded to and downloaded from Xet repository conveniently and efficiently. 3. A file system mount mechanism that allows any version of any Xet repository to be mounted. This works on Mac, Linux, and Windows 11 Pro. Installation ------------ The easiest to authenticate is to signup on `XetHub `_ and obtain a username and access token. You should write this down. Set up your virtualenv with: ```sh $ python -m venv .venv $ . .venv/bin/activate ``` Then, install pyxet with: ```sh $ pip install pyxet ``` Authentication -------------- There are three ways to authenticate with XetHub: Command Line ~~~~~~~~~~~~ .. code-block:: bash xet login -e -u -p Xet login will write to authentication information to `~/.xetconfig` Environment Variable ~~~~~~~~~~~~~~~~~~~~ Environment variables may be sometimes more convenient: .. code-block:: bash export XET_USER_EMAIL = export XET_USER_NAME = export XET_USER_TOKEN = In Python ~~~~~~~~~ Finally if in a notebook environment, or a non-persistent environment, we also provide a method to authenticate directly from Python. Note that this must be the first thing you run before any other operation: .. code-block:: python import pyxet pyxet.login(, , ) Quickstart ---------- Read a CSV file: .. code-block:: python import pyxet # make xet:// protocol available import pandas as pd # assumes pip install pandas has been run df = pd.read_csv('xet://XetHub/titanic/main/titanic.csv') Checkout the rest of the documentation for detailed usage examples! Encountering Issues? ==================== Please file a bug `here `_, or report on our `Discord channel `_! We are constant making improvements, especially with usability and performance. --------------------------------------- .. toctree:: :maxdepth: 2 :caption: Introduction markdowns/quickstart markdowns/writing .. toctree:: :maxdepth: 2 :caption: API Reference markdowns/filesystem markdowns/cli markdowns/mount .. toctree:: :maxdepth: 2 :caption: Use Cases markdowns/collaboration markdowns/model_versioning .. toctree:: :maxdepth: 2 :caption: API Documentation pyxet