Python `Magika` Module
This guide provides documentation on how to use the magika Python module to identify file types from your code.
Quick Examples
Section titled “Quick Examples”The magika API is designed to be simple and intuitive. The following examples cover the most common use cases for identifying content from bytes, paths, and streams.
From bytes:
>>> from magika import Magika>>> m = Magika()>>> res = m.identify_bytes(b'function log(msg) {console.log(msg);}')>>> print(res.output.label)javascriptFrom a file path:
>>> from magika import Magika>>> m = Magika()>>> res = m.identify_path('./tests_data/basic/ini/doc.ini')>>> print(res.output.label)iniFrom an open file stream:
>>> from magika import Magika>>> m = Magika()>>> with open('./tests_data/basic/ini/doc.ini', 'rb') as f:>>> res = m.identify_stream(f)>>> print(res.output.label)iniAPI Reference
Section titled “API Reference”Instantiating Magika
Section titled “Instantiating Magika”First, create an instance of the Magika class. The constructor accepts several optional arguments to customize its behavior.
from magika import Magika, PredictionMode
# Default instantiationmagika = Magika()
# Custom instantiationmagika_custom = Magika( model_dir="/path/to/custom/model", prediction_mode=PredictionMode.BEST_GUESS, no_dereference=True,)Constructor Arguments:
model_dir(Path, optional): Path to a directory containing a custom model. If not provided, defaults to the latest bundled model.prediction_mode(PredictionMode, optional): The prediction mode to use. Defaults toPredictionMode.HIGH_CONFIDENCE.no_dereference(bool, optional): IfTrue, symbolic links will not be followed; their content type will be reported assymlink. Defaults toFalse.
Identifying Content
Once instantiated, the Magika object provides several methods for identifying content from different sources.
magika.identify_bytes(bytes): Identifies the content type of an in-memory bytes object.magika.identify_path(path): Identifies the content type of a single file from its path (str | os.PathLike).magika.identify_paths(paths): Identifies the content type for a list of file paths.magika.identify_stream(stream): Identifies the content type from an already-open binary file-like object (e.g., the output ofopen(file_path, 'rb')). Note: 1) Magika willseek()around the stream; 2) the stream is not closed (closing is the responsibility of the caller).
If you are dealing with large files, the identify_path, identify_paths, and identify_stream variants are generally better: their implementation seek()s around the file/stream to extract the needed features, without loading the entire content in memory.
Understanding the Result
All identify_* methods return a MagikaResult object. This object acts as a wrapper that contains the prediction details and the status of the operation. You should always check if the operation was successful before accessing the prediction.
>>> result = m.identify_path("path/to/file")>>> if result.ok:... print(f"File is a {result.output.description}")... print(f"MIME Type: {result.output.mime_type}")... else:... print(f"Error: {result.status.message}")Data Models
Section titled “Data Models”The MagikaResult object and its nested data classes provide detailed information about the scan.
Consult the Understanding the Output section for more context.
MagikaResult
class MagikaResult: path: Path ok: bool status: Status prediction: MagikaPrediction # Shortcuts available only when result.ok is True dl: ContentTypeInfo output: ContentTypeInfo score: floatok(bool):Trueif the identification was successful,Falseotherwise.status(Status): Provides details on an error ifokisFalse.prediction(MagikaPrediction): The core prediction object, available only ifokisTrue.dl,output,score: For convenience, these are direct shortcuts to the corresponding fields within thepredictionobject.
MagikaPrediction
Contains the core deep learning model prediction and the final Magika output.
class MagikaPrediction: dl: ContentTypeInfo output: ContentTypeInfo score: float overwrite_reason: OverwriteReasondl(ContentTypeInfo): The raw prediction from the deep learning model.output(ContentTypeInfo): The final prediction from “Magika the tool,” which considers the model’s prediction, its confidence score, and the selected prediction mode. This is the result most users should rely on.score(float): The model’s confidence score (from 0.0 to 1.0).overwrite_reason(OverwriteReason): It indicates why the deep learning model’s prediction was overwritten (e.g., low confidence).
ContentTypeInfo
Contains detailed metadata about a predicted content type.
class ContentTypeInfo: label: ContentTypeLabel # e.g., "python" mime_type: str # e.g., "text/x-python" group: str # e.g., "code" description: str # e.g., "Python source" extensions: List[str] # e.g., ["py", "pyc"] is_text: bool # e.g., TrueContentTypeLabel
A string enum (StrEnum) of all possible content type labels. Because it’s a StrEnum, its members can be used and compared just like regular strings.
class ContentTypeLabel(StrEnum): APK = "apk" BMP = "bmp" # ... and many moreAdditional APIs
Section titled “Additional APIs”The Magika class also exposes a few helper methods:
get_output_content_types(): Returns a list of all possible content type labels that Magika can return in theoutput.labelfield. This is the recommended way to get a definitive list of Magika’s possible outputs.get_model_content_types(): Returns a list of all possible content type labels the deep learning model can return (i.e., the possible values fordl.label, in addition toundefined). This is useful for debugging.get_module_version(): Returns themagikaPython package version as a string.get_model_version(): Returns the name of the model being used as a string.
Development setup
Section titled “Development setup”This section is for contributors to the magika Python package.
-
Project Management:
magikausesuvfor dependency management. To install all development dependencies, run:cd python; uv sync. -
Testing: To run the test suite, use
pytest. You can exclude slow tests for faster runs:cd python; uv run pytest tests -m "not slow". Refer to the GitHub Actions workflows for more testing examples. -
Packaging: We use
maturinto build the Python package, which combines the Rust-based CLI with the Python source code. This process is automated in our Build and Release Python Package GitHub Action. -
Publishing: We publish to PyPI via GitHub Trusted Publishing. This is automated by the Build and Release Python Package GitHub Action, which publishes packages (binary wheels, pure-python wheels, and source distribution) to PyPI (or TestPyPI) after pushing a tag with
python-v*(orpython-test-v*) as prefix. This also takes care of attestation.