Understanding the Output
Whether you use the CLI or one of the language bindings (Python, Rust, JavaScript), Magika provides the same core prediction data. While many users only need the final content type label, detailed information is always available. The CLI offers flexible output formats like JSON, and the APIs provide dedicated result objects (e.g., the Python MagikaResult object).
The meaning of each field is best understood through an example.
$ magika tests_data/basic/javascript/code.js --json[ { "path": "tests_data/basic/javascript/code.js", "result": { "status": "ok", "value": { "dl": { "description": "JavaScript source", "extensions": [ "js", "mjs", "cjs" ], "group": "code", "is_text": true, "label": "javascript", "mime_type": "application/javascript" }, "output": { "description": "JavaScript source", "extensions": [ "js", "mjs", "cjs" ], "group": "code", "is_text": true, "label": "javascript", "mime_type": "application/javascript" }, "score": 0.9710000157356262 } } }]This is how to interpret the output:
pathis simply the file path this prediction is referring to (relevant when scanning multiple files at the same time).result.statusindicates whether magika was able to scan the sample.okmeans all was good, in which case avaluefield is present with the details about the output.scoreindicates the confidence of the prediction.- the
dlblock returns information about the prediction with the deep learning model. In this case, the model predictedjavascript. - the
outputblock returns information about the prediction of “Magika the tool”, which, as discuss in previous sections, considers a number of aspects such as the prediction of the deep learning model, its confidence score, and the selected prediction mode. In the example above, the model’s confidence was high enough to be trustworthy, and thus the output of the “Magika the tool” matches the content type inferred by the deep learning model. - the
dlandoutputblocks contain a number of metadata about the predicted content type, such as a simple textual label suitable for automated processing (label), a human-readable description (description), MIME Type (mime_type), a list of extensions usually associated with the predicted content type (extensions), a high-level group (group), and a boolean that indicates whether the type is textual or not (is_text).
Here is how to interpret the output:
path: The file path corresponding to this prediction.result.status:okindicates a successful scan. If the status is notok, thevaluefield will be absent.
The value field is present on successful scans and contains the following details:
score: The model’s confidence in this prediction.dl: Contains the raw prediction from the deep learning model.output: Contains the final prediction from “Magika the tool.” This result considers the model’s prediction, its confidence score, and the selected prediction mode. In this example, the model’s confidence was high, so the final output matches the model’s prediction.
Within both dl and output, you will find:
label: A simple, machine-readable content type label (e.g.,javascript). The possible values fordl.labelandoutput.labelare documented in each model’s README.description: A human-readable description.mime_type: The corresponding MIME type.group: A high-level category (e.g., code, document, media).is_text: A boolean indicating if the content is textual.extensions: A list of common file extensions for this content type.
As mentioned previously, when the model is not used (e.g., for empty files), dl.label is set to undefined, and the output block will contain a generic content type like txt or unknown.
For most applications, you should use the output.label field, which is the default output of the CLI. The raw dl block is provided primarily for debugging and advanced use cases.
See also the FAQ for why it is best to integrate Magika’s results by focusing on label rather than other fields like mime_type.