Smart TVs take snapshots of what you watch multiple times per second - eviltoast

By Jeremy Hsu on September 24, 2024


Popular smart TV models made by Samsung and LG can take multiple snapshots of what you are watching every second – even when they are being used as external displays for your laptop or video game console.

Smart TV manufacturers use these frequent screenshots, as well as audio recordings, in their automatic content recognition systems, which track viewing habits in order to target people with specific advertising. But researchers showed this tracking by some of the world’s most popular smart TV brands – Samsung TVs can take screenshots every 500 milliseconds and LG TVs every 10 milliseconds – can occur when people least expect it.

“When a user connects their laptop via HDMI just to browse stuff on their laptop on a bigger screen by using the TV as a ‘dumb’ display, they are unsuspecting of their activity being screenshotted,” says Yash Vekaria at the University of California, Davis. Samsung and LG did not respond to a request for comment.

Vekaria and his colleagues connected smart TVs from Samsung and LG to their own computer server. Their server, which was equipped with software for analysing network traffic, acted as a middleman to see what visual snapshots or audio data the TVs were uploading.

They found the smart TVs did not appear to upload any screenshots or audio data when streaming from Netflix or other third-party apps, mirroring YouTube content streamed on a separate phone or laptop or when sitting idle. But the smart TVs did upload snapshots when showing broadcasts from the TV antenna or content from an HDMI-connected device.

The researchers also discovered country-specific differences when users streamed the free ad-supported TV channel provided by Samsung or LG platforms. Such user activities were uploaded when the TV was operating in the US but not in the UK.

By recording user activity even when it’s coming from connected laptops, smart TVs might capture sensitive data, says Vekaria. For example, it might record if people are browsing for baby products or other personal items.

Customers can opt out of such tracking for Samsung and LG TVs. But the process requires customers to either enable or disable between six and 11 different options in the TV settings.

“This is the sort of privacy-intrusive technology that should require people to opt into sharing their data with clear language explaining exactly what they’re agreeing to, not baked into initial setup agreements that people tend to speed through,” says Thorin Klosowski at the Electronic Frontier Foundation, a digital privacy non-profit based in California.

https://www.newscientist.com/article/2449198-smart-tvs-take-snapshots-of-what-you-watch-multiple-times-per-second/ (paywall!!)

        • Aceticon@lemmy.world
          link
          fedilink
          English
          arrow-up
          15
          ·
          edit-2
          3 months ago

          I was curious enough to check and with 2KB SRAM that thing doesn’t have anywhere enough memory to process a 320x200 RGB image much less 1080p or 4K.

          Further you definitelly don’t want to send 2 images per-second down to a server in uncompressed format (even 1080p RGB with an encoding that loses a bit of color fidelity to just use two bytes per pixel, adds up to 4MB uncompressed per image), so its either using something with hardware compression or its using processing cycles for that.

          My expectation is that it’s not the snapshoting itself that would eat CPU cycles, it’s the compression.

          That said, I think you make a good point, just with the wrong example - I would’ve gone with: a thing capable of handling video decoding at 50 fps - i.e. one frame per 20ms - (even if it’s actually using hardware video decoding) can probably handle compressing and sending over the network two frames per second, though performance might suffer if they’re using a chip without hardware compression support and are using complex compression methods like JPEG instead of something simpler like LZW or similar.

          • Magnergy@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            3 months ago

            Why think of it as a compression problem? Isn’t the spy device already getting compressed video form some source? That makes it a filtering problem. You would set it to grab and ship key frames (or equivalent term) if you wanted a human to be able to see the intel. But for content matching, maybe count some interval of key frames and then grab the smallest difference frame between the next two key frames. Gives a nice, premade small data chunk. A few of those in sequence starts looking like a hash function (on a dark foggy night).

            Would want some way to sync up the frames that the spy device grabs and the ones grabbed when building the db to match against. Maybe resetting the key frame interval counter when some set of simple frames come through would be enough. Like anything with a uniform color across the whole image or something similar.

            Just spitballing here. I like your impulse to math this.

            • Aceticon@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              ·
              3 months ago

              We’re talking about fingerprinting stuff coming in via HDMI, not stuff being played by the “smart” part of the TV itself from some source.

              You would probably not need to actually sample images if it’s the TV’s processor that’s playing something from a source, because there are probably smarter approaches for most sources (for example, for a TV channel you probably just need to know the setting of the tuner, location and the local time and then get the data from available Program Guide info (called EPG, if I remember it correctly).

              The problem is that anything might be coming over HDMI and it’s not compressed, so if they want to figure out what that is, it’s a much bigger problem.

              Your approach does sound like it would work if the Smart TV was playing some compressed video file, though.

              Mind you, I too am just “thinking out loud” rather that actually knowing what they do (or what I’m talking about ;))

              • Magnergy@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                3 months ago

                I assumed HDMI had some form of encoding, thanks for the correction. Looks like v 2.1 does.

                I think the syncing idea between the spy device and db is still useful. The video itself has stuff to use for reducing the search space by making sure they puck the same instants to fingerprint and exfiltrate.

          • interdimensionalmeme@lemmy.ml
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            3 months ago

            I don’t think they will compress the screenshot and send them but run content in a tensorflow lite model or even just hash a few of the pixels to try for an ID match

            • Aceticon@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              3 months ago

              Well that makes sense but might even be more processor intensive unless they’re using an SOC that includes an NFU or similar.

              I doubt it’s a straight forward hash because a hash database for video which includes all manner of small clips and has to somehow be able to match something missing over 90% of frames (if indeed the thing is sampling it at 2 fps, then it only sees 2 frames out of every 25) would be huge.

              A rough calculation for a system of hashes for groups of 13 frames in a row (so that at least one would be hit if sampling at 2 fps on a 25 fps system) storing just one block of 13 frame hashes per minute in a 5 byte value (so large enough to have 5 trillion distinctive values) would in 1GB store enough hashes for 136k 2h movies in hashes alone so it would be maybe feasible if the system had 2GB+ of main memory, though even then I’m not so sure the CPU speed would be enough to search it every 500ms (though if the hashes are ordered by value in a long array and there’s a matching array of clip IDs, it might be doable since there are some pretty good algorithms for that).

              • interdimensionalmeme@lemmy.ml
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                2
                ·
                3 months ago

                I would sample a few dozens equally space pixels out of the frame, then drop similar value frames, and send that with timestamp. In the cloud, you runs those few pixels in a content recognition model.

                It doesn’t have to be especially accurate or know any niche content, the point is to make a psychomarketing profile of the customer like “car guy, watches tool reviews”.