Blog

October 13, 2010 -

When Is Data Scraping Breaking and Entering?

Can violating a website’s terms of use – usually a lengthy scroll of legalese which nobody reads – expose you to criminal liability under anti-hacking statutes?

This question has recently assumed new urgency due to civil and criminal actions against data scrapers and aggregators for their use of robots and spiders (automated software programs) to harvest data from public websites in violation of the terms of use, which typically prohibit such access. The statutory weapon most often used by site owners and prosecutors is the federal Computer Fraud and Abuse Act, 18 U.S.C. §1030, which criminalizes accessing or using a computer system “without authorization,” or in a manner that exceeds the scope of authorization, and causing damage. (The “damage” can, in fact, be nothing more than the costs of investigating and responding to unwanted access.)

As might be expected, the original purpose of the CFAA was to punish malicious hacking. However, the CFAA and similar statutes are now increasingly being invoked for the proposition that access and use of a public website in a manner that violates the terms of use is a computer crime analogous to breaking into a restricted system or area. In Facebook v. ConnectU, LLC, 489 F.Supp.2d 1087 (N.D.Cal. 2007), for example, a federal court held that using bots to access Facebook to gather millions of e-mail addresses for solicitation purposes, activity prohibited by a standard clause in Facebook’s terms of use, constituted use “without permission” in violation of California’s Comprehensive Computer Data Access and Fraud Act (§502 of the California Penal Code). In United States v. Lowson, federal prosecutors have targeted a New Jersey ticket scalping company that allegedly used bots to purchase concert tickets on Ticketmaster.com for resale.

Can I get prosecuted for not reading the unreadable?

The creative deployment of anti-hacking statutes to effectively criminalize a common industry practice (Google uses bots and spiders to index sites for its search engine) raises serious policy and constitutional questions. The Electronic Frontier Foundation (EFF), a leading public-interest digital rights boutique and participant in several of the cases, has argued that such an interpretation gives site owners the ability to unilaterally classify activity they don’t like (including competitive activity that may be in the public interest) as criminal, as well as denying site users constitutionally required fair notice of what constitutes an offense (since no one surfing the web can be expected to read 12 pages of tortured legalese and understand what is illegal).

These issues came to a head in Facebook, Inc. v. Power Ventures, Inc., No. C-08-05780 (N.D. Cal. July 20, 2010), an important marker in the evolving law of data scraping. The facts are relatively simple. Power, through its website at Power.com, provided social media users with tools to enable them to aggregate their data (their own profile and activity data) across several social media sites. When users accessed Facebook through Power.com, the site deployed bots to access their data, violating the explicit terms of Facebook’s user agreement. Facebook warned Power that it considered such access a violation of California Penal Code §502. Facebook then used IP-blocking measures to prevent further pings from the Power IP address. However, Power circumvented these measures so that Facebook could not recognize the source of the pings. A lawsuit inevitably followed, into which the EFF injected itself by filing a brief in support of Power.

In its complaint, Facebook claimed that Power’s actions constituted access of its site “without permission” in violation of §502. Power moved to dismiss the case. The federal district court for the Northern District of California denied the motion, ruling that although Power’s access of Facebook in a way that breached the terms of use (i.e., through bots) was not “without permission,” gaining access by evading Facebook’s IP-blocking measures violated the statutory prohibition, since this artifice showed sufficient awareness by Power that its use was unauthorized for the imposition of criminal sanctions to pass the test of constitutionality. (Power’s disregarding of Facebook’s cease-and-desist letter no doubt also contributed to the court’s conclusion.) Furthermore, while the suit was not brought under the CFAA, the court used caselaw interpreting the “without authorization” language in the CFAA as guidance in analyzing the corresponding “without permission” language in the California computer crimes statute.

The Power Ventures ruling conflicts with the ConnectU decision from the same forum, meaning that clarification by the appellate courts is needed now more than ever. Given the EFF’s determination to push this issue as part of its crusade against (what it claims is) misuse of website user agreements to regulate behavior on the Internet, the Ninth Circuit will likely weigh in before long. Still, the Power Ventures decision is significant, because it undercuts the argument that website terms of use alone can define the scope of permitted access for purposes of anti-hacking and computer crime laws.

Scrape, but don’t get caught?

With all that said, the case sends a mixed message to data scrapers: go ahead and scrape, but don’t get caught (or stop if you do). For this reason the EFF was unsatisfied with the ruling, contending in its website blog (as it did before the court) that if accessing a site in defiance of the terms of use is not “without permission,” accessing it in defiance of technical measures to enforce the same terms of use should also be permissible.

On the other hand, it is settled law that website terms of use form a binding contract between the user and the site owner, so user activity that breaches the contract and causes provable, quantifiable damage under the common law of contracts (which most data scraping does not) is still actionable. If terms of use were not legally enforceable or had to meet stringent requirements of clarity and conspicuousness to be enforceable (which is where the EFF could be heading, given its repeated criticism of terms of use as impenetrable, unilateral contracts of adhesion that normal web users can’t be expected to read and understand), the resulting cost and risk burden might curtail free content and social networking tools on the Internet.

To scrape or not to scrape? In the wake of Power Ventures, that is certainly the question.