Tropo / Dave / Bookmarks : data

Data: Where can I get large datasets open to the public? - Q...
    http://www.quora.com/Data/Where-can-I-get-large-datasets-ope...
    tags: data

Where can I get large datasets open to the public? - Quora
    http://www.quora.com/Where-can-I-get-large-datasets-open-to-...
    tags: data

Hacker News | Free, Public Data Sets
    http://news.ycombinator.com/item?id=2165497
    tags: data

Google Public Data Explorer
    http://www.google.com/publicdata/directory
    tags: data

Tagged and Cleaned Wikipedia (TC Wikipedia) and its Ngram
    http://nlp.cs.nyu.edu/wikipedia-data/
    tags: wikipedia data ngram

Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD) : ...
    A collection of daily weather measurements (temperature, wind speed, humidity, pressure, &c.) from 9000+ weather stations around the world.
    http://aws.amazon.com/datasets/2759/185-0886336-5629151
    tags: data weather

Data & Research
    http://www.faa.gov/data_research/
    tags: data faa

Natural Earth
    Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.
    http://www.naturalearthdata.com/
    tags: data map earth geo gis

Eureqa | Cornell Computational Synthesis Laboratory
    Eureqa is a software tool for detecting equations and hidden mathematical relationships in your data. Its primary goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data.
    http://ccsl.mae.cornell.edu/eureqa
    tags: data visualization viz ai machinelearning ml

Home : ClueWeb09 Wiki
    http://boston.lti.cs.cmu.edu/clueweb09/wiki/tiki-index.php?p...
    tags: data

gridded population of the world
    global rural urban mapping project
    http://sedac.ciesin.columbia.edu/gpw/global.jsp
    tags: data population world

Summary of Results | 2002 National-Scale Air Toxics Assessme...
    Of the 181 air toxics included in the 2002 national-scale assessment, the risk characterization considers the risk of both cancer and noncancer effects from inhalation of 124 of these air toxics -- the subset of pollutants with health data based on chronic exposure. The purpose of this national-scale assessment is to understand these cancer risks and noncancer health effects in order to help the EPA and others to identify pollutants and source categories of greatest potential concern, and to set priorities for the collection of additional information to improve future assessments. The assessment represents a "snapshot" in time for characterizing risks from exposure to air toxics. The national-scale assessment is not designed to characterize risks sufficiently for regulatory action.
    http://www.epa.gov/ttn/atw/nata2002/risksum.html
    tags: data toxic

2002 Assessment Results | 2002 National-Scale Air Toxics Ass...
    These maps show the geographic patterns of estimated cumulative cancer or noncancer risk due to inhalation of air toxics. EPA developed these maps to inform both national and more localized efforts to collect air toxics information and characterize emissions (e.g., prioritize pollutants/geographic areas of interest for more refined data collection such as monitoring). See more information about limitations. These files not only present the risk levels for each tract, but also the HAPs contributing to risk levels and their percent contribution. An additional overlay provides the names and locations for each source indentified in the 2002 NEI for that tract.
    http://www.epa.gov/ttn/atw/nata2002/tables.html
    tags: data toxic

IP address geolocation SQL database :: IPInfoDB
    The SQL database behind ipinfodb.com is offered for free. We offer the database in different formats (SQL, CSV), city or country precision, 3 or 4 IP digits precision and data in single or multiple tables. Available information in the database : ISO country code, country name, FIPS region code, region name, city, zipcode, latitude, longitude and GMT/DST timezone. The database is updated during the first week of each month.
    http://ipinfodb.com/ip_database.php
    tags: data ip geolocation

SDA Web Application
    http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss06
    tags: data

IP address geolocation SQL database
    http://www.iplocationtools.com/sql_database.php
    tags: data geo geolocation

IP address city geolocation HTTP API
    http://blogama.org/node/60
    tags: geo data

IR Datasets
    http://boston.lti.cs.cmu.edu/callan/Data/#Web
    tags: data ir

Supplement data. Data expo 09. ASA Statistics Computing and ...
    Supplemental data Airports airports.csv describes the locations of US airports, with the fields: * iata: the international airport abbreviation code * name of the airport * city and country in which airport is located. * lat and long: the latitude and longitude of the airport This majority of this data comes from the FAA, but a few extra airports (mainly military bases and US protectorates) were collected from other web sources by Ryan Hafen and Hadley Wickham.
    http://stat-computing.org/dataexpo/2009/supplemental-data.ht...
    tags: data

Data expo 09. ASA Statistics Computing and Graphics
    Data expo Airline on-time performance Have you ever been stuck in an airport because your flight was delayed or cancelled and wondered if you could have predicted it if you'd had more data? This is your chance to find out. The data The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. This is a large dataset: there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed. To make sure that you're not overwhelmed by the size of the data, we've provide two brief introductions to some useful tools: linux command line tools and sqlite, a simple sql database.
    http://stat-computing.org/dataexpo/2009/
    tags: data

Dictionaries - OpenOffice.org Wiki
    http://wiki.services.openoffice.org/wiki/Dictionaries#Englis...
    tags: spelling data

(theinfo)
    This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It's a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.
    http://theinfo.org/
    tags: data viz

RFE
    This guide is sponsored by the American Economic Association. It lists more than 2,000 resources in 97 sections and sub-sections available on the Internet of interest to academic and practicing economists, and those interested in economics. Almost all resources are also described. In selecting resources for RFE, I exercise some editorial judgment and select items that either offer a substantial amount of information, or are specialized to a given area. Those searching the Internet for economic information might also wish to try the Economics Search Engine (ESE). It indexes 11,000 economics web sites from around the world. Searches with it only return their contents.
    http://www.aeaweb.org/RFE/
    tags: data

Andr‽s Corrada-Emmanuel Research
    The Enron Email Dataset Email research has taken a giant step forward with a positive side-effect from the whole Enron debacle. We now have, thanks to the foresight of a few researchers, an email corpus from an actual, living corporation.
    http://ciir.cs.umass.edu/~corrada/enron/
    tags: enron corpus data

NOAA/NGDC/MGG-Topography, Digital Terrain Data
    http://www.ngdc.noaa.gov/mgg/topo/topo.html
    tags: geo data

The National Map Seamless Server
    http://seamless.usgs.gov/
    tags: geo data

Terrain data images
    These subdirectories represent a viewable form of the NOAA GLOBE database. The data has been reduced from 16-bit resolution to 8-bit false color. For original data, contact Ed Falk or visit the GLOBE web site. Higher-resolution (1" and less) data is available from the USGS. See Ed Falk for some samples, including the Bay Area.
    http://peregrine/efalk/Terrain/
    tags: data geo

StockMorph ‽ Sources of Stock Market Data
    http://www.stockmorph.com/sources-of-stock-market-data/
    tags: stock data nasdaq djia sp nyse

Welcome to NHGIS — National Historical Geographic Informat...
    http://www.nhgis.org/
    tags: data census

U.S.Census Bureau - TIGER/Line‽
    http://www.census.gov/geo/www/tiger/
    tags: data geo census

The Memory Hole > OSHA's Lost Workday Injury and Illness Dat...
     OSHA's Lost Workday Injury and Illness Database Injury/Illness Rates for Tens of Thousands of Companies Identified by Name OSHA fought in court for 2 years to keep this material secret
    http://www.thememoryhole.org/osha/lwdii.htm
    tags: data memoryhole ohsa

 


Search for data on del.icio.us