MethodologyOctober 5, 202510 min read

Revisiting Statistical Forensics

ELA, noise topology, and entropy analysis for modern generators

DeepSight Research

error level analysisnoise analysisentropystatistical featureschannel statisticsforensics

Abstract

Error Level Analysis (ELA), noise pattern extraction, and Shannon entropy were developed for an era of image compositing and splicing. Diffusion-generated images — synthesized whole, not composited — should theoretically render these techniques obsolete. We show that they remain discriminative, albeit for different reasons: the homogeneity of synthetic generation produces statistical signatures in noise uniformity, compression response, and entropy distribution that distinguish AI outputs from the heterogeneous statistics of real-world photographs.

Error Level Analysis, first described by Krawetz in 2007, was designed to detect spliced regions in photographs by exploiting the fact that JPEG re-compression produces differential error patterns. Regions saved at different quality levels or composited from different sources exhibit distinct ELA profiles. The technique was effective for an era when image manipulation meant cut-and-paste operations in Photoshop.

Diffusion-generated images present a different challenge. They are not composites. They are synthesized whole — every pixel generated by the same process, at the same time, through the same reverse diffusion trajectory. In theory, this should make ELA useless: there are no differentially compressed regions to detect. In practice, we observe something more interesting.

When a diffusion-generated image is re-compressed as JPEG, the error pattern is remarkably uniform. The coefficient of variation across the ELA map — the ratio of standard deviation to mean error — is consistently lower for synthetic images than for photographs. Real photographs contain regions of varying complexity — detailed textures, smooth gradients, high-contrast edges — that respond differently to quantization. Diffusion outputs exhibit a more homogeneous distribution of visual complexity at the microscale level. The uniformity is the signal.

This observation extends to noise analysis more broadly. Real images carry sensor noise — a pattern that varies with ISO sensitivity, sensor temperature, pixel location, and exposure time. Different regions of a photograph exhibit different noise characteristics because they were formed by different photons hitting different photosites under different conditions. Diffusion-generated images also contain noise, but it is process noise from the reverse diffusion trajectory, and it distributes differently. When we partition an image into blocks and measure the coefficient of variation of block-level noise magnitude, synthetic images consistently show lower variation. The noise is too uniform, too well-behaved, too independent of spatial context.

Shannon entropy provides a third statistical signal. The byte-level entropy of raw pixel data captures the overall information density of the image. AI-generated images — particularly those from diffusion models — tend to cluster in a narrower entropy band than photographs, which exhibit a wider distribution reflecting the sheer diversity of real-world scenes. The thresholds are not definitive by themselves, but they are discriminative: they move the probability estimate in a direction that meaningfully influences downstream analysis when combined with other signals.

Channel-level statistics — kurtosis, skewness, and inter-channel correlation of the R, G, and B histograms — provide additional features. AI-generated images tend toward platykurtic distributions (negative excess kurtosis), reflecting a narrower effective dynamic range and fewer extreme pixel values than natural photographs. The effect is subtle and insufficient as a standalone detector, but in a multi-signal framework, even weak signals contribute to the composite confidence measure. The mathematics of Bayesian integration reward breadth of evidence, not just strength of individual signals.

We emphasize that statistical forensics is not, by itself, a reliable detection method. Every threshold we describe has exceptions. Every statistical feature we extract can be confounded by aggressive post-processing, social media re-compression, or deliberate adversarial perturbation. The value of this layer lies not in its standalone accuracy but in its cost profile: zero API calls, sub-50-millisecond compute time, and complete independence from any external service. It is the layer that is always available, always fast, and always contributing signal — the baseline that ensures every image receives at least some level of genuine forensic analysis.

The direction of improvement is clear: better features, better calibration, better integration with complementary signals. But we resist the temptation to overfit statistical features to current generators. The landscape shifts too quickly for brittle thresholds. Instead, we focus on features grounded in the physics of image formation — the fundamental difference between photons captured by a sensor and values computed by a neural network. These signals persist across generator architectures because they arise from the problem structure, not from implementation details that will change with the next model release.

References

[1]Krawetz, N. "A Picture's Worth… Digital Image Analysis and Forensics." Black Hat Briefings, 2007.
[2]Wang et al. "DIRE for Diffusion-Generated Image Detection." ICCV 2023.
[3]Chen et al. "LAID: Lightweight AI-Generated Image Detection in Spatial and Spectral Domains." arXiv:2507.05162, 2025.
[4]Zhang et al. "SCADET: A detection framework for AI-generated artwork integrating dynamic frequency attention and contrastive spectral analysis." PLOS ONE, 2024.

See the research in action