KSCOPE: Paper Demo Showcase

Warning: This page contains NSFW content.
Please note that the 'AI Perception' image is a mock image. To trigger a loading error, it needs to be implemented within a specific framework.
This webpage is intended for S&P paper review purposes only. It is not for public use. Please do not disclose or distribute.

Single Medium, Multiple Perspectives

This page demonstrates the 11 types of semantic gaps identified in our paper. Below are attack samples where media players (Human Perception) and AI services (AI Perception) interpret the same file differently.

Downmix Online Test POC Demos Audio Tests

Virtual Cropping Ignorance

AI services ignore 'virtual crop' metadata (e.g., CLAP in HEIC/AVIF), processing the entire image while humans only see the cropped region.

Human Perception

AI Perception

Download Sample (r1_virtual_crop.avif)

Mirror Flip Ignorance

AI ignores metadata-based mirroring (e.g., 'imir' in AVIF), leading to misinterpretation of orientation-sensitive data like charts.

Human Perception

AI Perception

Download Sample (r2_mirror_flip.avif)

Rotation Ignorance

Similar to mirroring, AI services fail to apply rotation metadata, causing misidentification of rotated content (e.g., CAPTCHAs).

Human Perception

AI Perception

Download Sample (r3_rotation.jpg)

External Resource Ignorance

AI fails to process external resources (e.g., image-based subtitles in MKV or overlays in SVG), perceiving only the underlying content.

Human Perception

Video/SVG shows full-screen subtitle/overlay: "Harmful Content"

AI Perception

AI sees underlying video: "Benign Content"

Download Sample (r4_overlay.svg)

R5/R6

Improper Audio Downmix

AI services use naive downmixing (e.g., simple average) for multi-channel audio, while humans hear a standard-compliant mix, enabling A2A attacks. Try it now

Human Perception (Browser)

"Your Honor, I plead guilty."

AI Perception (ASR)

"I refuse to admit guilt."

Download Sample (r5_audio_downmix.wav)

Improper Alpha Fusion (WebP)

AI improperly handles the alpha channel, leading to perception of different content than what humans see (e.g., moderation bypass).

Human Perception

AI Perception

Download Sample (r7_alpha_fusion.webp)

Improper Transparency Fusion

AI discards alpha or tRNS transparency data, while humans see the image correctly blended against a background (e.g., white).

Human Perception

AI Perception

Download Sample (r8_trns_fusion.png)

Incorrect Content Choice

AI incorrectly selects the first track/frame from a multi-track file (e.g., HEIC), while humans see the primary track/frame.

Human Perception (Primary)

AI Perception (First)

Download Sample (r9_track_selection.heic)

R10

Deterministic Image Sampling

AI processes only the first frame of an animation (e.g., GIF), while humans see the persistent second frame.

Human Perception
(Frame 2)

AI Perception (Frame 1)

Download Sample (r10_image_sampling.gif)

R11

Deterministic Video Sampling

AI deterministically samples a few frames (e.g., 1 per sec), while humans see the full video. Attackers can place malicious content in sampled frames.

Human Perception

Full video (mostly malicious)

AI Perception

Sampled frames (all benign)

Download Sample (r11_video_sampling.mp4)
(Warning: NSFW Content)

POC Video Showcase

Chatgpt-TRNS (Chatbot): Download

Gemini-alpha-avif (Chatbot): Download

Gemini-alpha-png (Chatbot): Download

Gemini-alpha-heic (Chatbot): Download

Qwen-mkv-multitrack (Chatbot): Download

Gemini-crop (Chatbot): Download

Grok-mirror (Chatbot): Download

Kimi-rotation (Chatbot): Download

Gemini-multiTrack (Chatbot): Download

Baidu-ocr (tRNS): Download

Azure-ocr (tRNS): Download

Qwen (Audio): Download

Tencent ASR (Audio): Download

Aliyun ASR (Audio): Download

Kimi (Audio): Download

Deepgram (Audio): Download

Gemini (Audio): Download

Aliyun Audio Moderation (Audio): Download (Warning: NSFW Content)

Tencent Audio Moderation (Audio): Download (Warning: NSFW Content)

Aliyun Video Audio Moderation (Audio): Download (Warning: NSFW Content)

Tencent Video Audio Moderation (Audio): Download (Warning: NSFW Content)

Baidu Content Moderation (Track): Download (Warning: NSFW Content)

Aliyun Content Moderation (Track): Download (Warning: NSFW Content)

Tencent Content Moderation (crop): Download (Warning: NSFW Content)

Tencent Content Moderation (Sampling): Download (Warning: NSFW Content)

Tencent Content Moderation (SVG): Download (Warning: NSFW Content)

Testing the Relationship Between Minimum Speech Intelligibility and Peak Amplitude

MacBook Test Download

OnePlus Test Download

Single Medium, Multiple Perspectives

Virtual Cropping Ignorance

Human Perception

AI Perception

Mirror Flip Ignorance

Human Perception

AI Perception

Rotation Ignorance

Human Perception

AI Perception

External Resource Ignorance

Human Perception

AI Perception

Improper Audio Downmix

Human Perception (Browser)

AI Perception (ASR)

Improper Alpha Fusion (WebP)

Human Perception

AI Perception

Improper Transparency Fusion

Human Perception

AI Perception

Incorrect Content Choice

Human Perception (Primary)

AI Perception (First)

Deterministic Image Sampling

Human Perception (Frame 2)

AI Perception (Frame 1)

Deterministic Video Sampling

Human Perception

AI Perception

POC Video Showcase

Testing the Relationship Between Minimum Speech Intelligibility and Peak Amplitude

Human Perception
(Frame 2)