X updates its terms to ban crawling and scraping - eviltoast

The new terms, which are effective from September 29, ban any kind of scraping or crawling without “prior written consent.”

NOTE: crawling or scraping the Services in any form, for any purpose without our prior written consent is expressly prohibited.

The previous version of the terms allowed crawling in accordance with robots.txt.

“NOTE: crawling the Services is permissible if done in accordance with the provisions of the robots.txt file, however, scraping the Services without our prior consent is expressly prohibited,” it read.

In the last few months, Twitter has also altered its robots.txt file — a file that gives instructions to robot crawlers about what parts of the site they are permitted to visit — to remove instructions for all crawler bots apart from Google.

In 2015, Twitter confirmed that it had a firehose deal in place with Google to surface tweets in search results. It is not clear if the nature or terms of that deal have changed under the new management.

    • IHeartBadCode@kbin.social
      link
      fedilink
      arrow-up
      19
      ·
      1 year ago

      Yeah that’s literally UNENFORCEABLE. We just had a case last year that indicated that you can scrap data from sites so long as the data being scrapped isn’t used for profit.

      Additionally, scrappers cannot be legally held to have agreed to the TOS. Just simply typing an address in and then receiving a page back doesn’t mean that anyone agrees to the TOS of the server that gave the page. For pretty much the same reason software couldn’t enforce the “if you don’t agree with the terms on the CD-ROM, then you cannot open the package the CD-ROM is in.” So just because X wrote that in their TOS has zero bearing on if they can actually enforce that through the court system, which likely that’s going to be a big NAH.

      That’s based off of the point of the gate-up/gate-down test given by the courts. If a normal person can find a random “tweet (are we still calling them that?)” by typing a URL, the gate is up, you cannot pick and choose who gets to enter. If you don’t want a random tweet being scrapped the gates must be down. That means nobody typing in a random URL can ever access that tweet, they have to go through the gate house to gain entry to the resource. But gates down means that no one is going to link to a tweet because when they click the link, instead of seeing the related information, they get handed a login page. Which X has been trying that and news outlets bitching that they’re not going to post tweets in their story if Musk is just going to block everyone.

      The thing that X could argue is that someone is using their tweets for “profit” which is exactly the case they’re trying with the ADL and the CCDH. They’re trying to argue that these not-for-profits are profiting off of convincing ad buyers to not buy ads. Which, if that sounds crazy, OH BOY IS IT. However, Musk’s lawyers have attempted to muddle the waters on what is "PROFIT". So grab some popcorn for that one.

      The thing is that, I get Musk wants to hold tight copyright on the tweets and not surface a lot to others who might use that data for who knows what purpose. BUT you cannot have cake and have eaten it as well. Musk doesn’t get the best of both worlds. He can put everything behind a wall and attempt to enforce his TOS, but that’s still not really go to go well for his ADL/CCDH case. Or he can surface the tweets for the Internet to read. But he cannot have both. We’ve settled that in courts and Congress hasn’t made any kind of motion in changing that standing.

  • Fisk400@feddit.nu
    link
    fedilink
    English
    arrow-up
    27
    ·
    1 year ago

    Looks like he still can’t afford the hosting fees. I wonder if the income will improve once all search engines stop it indexing them.

  • Rhaedas@kbin.social
    link
    fedilink
    arrow-up
    14
    ·
    1 year ago

    in any form

    Is viewing and copying/pasting a manual form? I know the implied meaning is automation but as a legal document it should probably specify that. Unless the plan is to drive more people away, which does seem to be a trend.

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 year ago

    This is the best summary I could come up with:


    Elon Musk-owned X, formerly Twitter, has updated its terms of service to prohibit scraping and crawling — likely to fend off any AI models training on its data.

    The new terms, which are effective from September 29, ban any kind of scraping or crawling without “prior written consent.”

    At that time, Musk had said that it was a temporary measure because the site was getting “data pillaged so much that it was degrading service for normal users.”

    In April, he threatened to sue Microsoft for illegally using the social network’s data to train AI models.

    Earlier this month, X changed its privacy policy to state it might use public data to train AI models.

    Musk has previously noted during a Twitter space that xAI, a company founded in July, would use public data such as tweets to train its models.


    The original article contains 390 words, the summary contains 140 words. Saved 64%. I’m a bot and I’m open source!

  • Jaysyn@kbin.social
    link
    fedilink
    arrow-up
    6
    arrow-down
    2
    ·
    1 year ago

    As long as they are publishing that information to the public internet, they don’t have a leg to stand on legally.

  • D1SoveR@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    I gather, policy as written, that apart from bulk data collection, this also inadvertently prohibits usage of any alternative front-ends, such as Nitter? Does it also stop any archival (akin to Wayback Machine) from happening against their service?

    • geosoco@kbin.socialOP
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      It sounds like it might, but whether it stops them from working or just introduces liability such that they can sue is unclear. Likely the latter, but unclear.