Data: Where can I get large datasets open to the public? - Q...
http://www.quora.com/Data/Where-can-I-get-large-datasets-ope...
tags: data
Where can I get large datasets open to the public? - Quora
http://www.quora.com/Where-can-I-get-large-datasets-open-to-...
tags: data
Hacker News | Free, Public Data Sets
http://news.ycombinator.com/item?id=2165497
tags: data
Google Public Data Explorer
http://www.google.com/publicdata/directory
tags: data
Tagged and Cleaned Wikipedia (TC Wikipedia) and its Ngram
http://nlp.cs.nyu.edu/wikipedia-data/
tags: wikipedia data ngram
Daily Global Weather Measurements, 1929-2009 (NCDC, GSOD) : ...
A collection of daily weather measurements (temperature, wind speed, humidity, pressure, &c.) from 9000+ weather stations around the world.
http://aws.amazon.com/datasets/2759/185-0886336-5629151
tags: data weather
Data & Research
http://www.faa.gov/data_research/
tags: data faa
Natural Earth
Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.
http://www.naturalearthdata.com/
tags: data map earth geo gis
Eureqa | Cornell Computational Synthesis Laboratory
Eureqa is a software tool for detecting equations and hidden mathematical relationships in your data. Its primary goal is to identify the simplest mathematical formulas which could describe the underlying mechanisms that produced the data.
http://ccsl.mae.cornell.edu/eureqa
tags: data visualization viz ai machinelearning ml
Home : ClueWeb09 Wiki
http://boston.lti.cs.cmu.edu/clueweb09/wiki/tiki-index.php?p...
tags: data
gridded population of the world
global rural urban mapping project
http://sedac.ciesin.columbia.edu/gpw/global.jsp
tags: data population world
Summary of Results | 2002 National-Scale Air Toxics Assessme...
Of the 181 air toxics included in the 2002 national-scale assessment, the risk characterization considers the risk of both cancer and noncancer effects from inhalation of 124 of these air toxics -- the subset of pollutants with health data based on chronic exposure. The purpose of this national-scale assessment is to understand these cancer risks and noncancer health effects in order to help the EPA and others to identify pollutants and source categories of greatest potential concern, and to set priorities for the collection of additional information to improve future assessments. The assessment represents a "snapshot" in time for characterizing risks from exposure to air toxics. The national-scale assessment is not designed to characterize risks sufficiently for regulatory action.
http://www.epa.gov/ttn/atw/nata2002/risksum.html
tags: data toxic
2002 Assessment Results | 2002 National-Scale Air Toxics Ass...
These maps show the geographic patterns of estimated cumulative cancer or noncancer risk due to inhalation of air toxics. EPA developed these maps to inform both national and more localized efforts to collect air toxics information and characterize emissions (e.g., prioritize pollutants/geographic areas of interest for more refined data collection such as monitoring). See more information about limitations. These files not only present the risk levels for each tract, but also the HAPs contributing to risk levels and their percent contribution. An additional overlay provides the names and locations for each source indentified in the 2002 NEI for that tract.
http://www.epa.gov/ttn/atw/nata2002/tables.html
tags: data toxic
IP address geolocation SQL database :: IPInfoDB
The SQL database behind ipinfodb.com is offered for free. We offer the database in different formats (SQL, CSV), city or country precision, 3 or 4 IP digits precision and data in single or multiple tables. Available information in the database : ISO country code, country name, FIPS region code, region name, city, zipcode, latitude, longitude and GMT/DST timezone. The database is updated during the first week of each month.
http://ipinfodb.com/ip_database.php
tags: data ip geolocation
SDA Web Application
http://sda.berkeley.edu/cgi-bin/hsda?harcsda+gss06
tags: data
IP address geolocation SQL database
http://www.iplocationtools.com/sql_database.php
tags: data geo geolocation
IP address city geolocation HTTP API
http://blogama.org/node/60
tags: geo data
IR Datasets
http://boston.lti.cs.cmu.edu/callan/Data/#Web
tags: data ir
Supplement data. Data expo 09. ASA Statistics Computing and ...
Supplemental data Airports airports.csv describes the locations of US airports, with the fields: * iata: the international airport abbreviation code * name of the airport * city and country in which airport is located. * lat and long: the latitude and longitude of the airport This majority of this data comes from the FAA, but a few extra airports (mainly military bases and US protectorates) were collected from other web sources by Ryan Hafen and Hadley Wickham.
http://stat-computing.org/dataexpo/2009/supplemental-data.ht...
tags: data
Data expo 09. ASA Statistics Computing and Graphics
Data expo Airline on-time performance Have you ever been stuck in an airport because your flight was delayed or cancelled and wondered if you could have predicted it if you'd had more data? This is your chance to find out. The data The data consists of flight arrival and departure details for all commercial flights within the USA, from October 1987 to April 2008. This is a large dataset: there are nearly 120 million records in total, and takes up 1.6 gigabytes of space compressed and 12 gigabytes when uncompressed. To make sure that you're not overwhelmed by the size of the data, we've provide two brief introductions to some useful tools: linux command line tools and sqlite, a simple sql database.
http://stat-computing.org/dataexpo/2009/
tags: data
Dictionaries - OpenOffice.org Wiki
http://wiki.services.openoffice.org/wiki/Dictionaries#Englis...
tags: spelling data
(theinfo)
This is a site for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them. It's a place where they can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects.
http://theinfo.org/
tags: data viz
RFE
This guide is sponsored by the American Economic Association. It lists more than 2,000 resources in 97 sections and sub-sections available on the Internet of interest to academic and practicing economists, and those interested in economics. Almost all resources are also described. In selecting resources for RFE, I exercise some editorial judgment and select items that either offer a substantial amount of information, or are specialized to a given area. Those searching the Internet for economic information might also wish to try the Economics Search Engine (ESE). It indexes 11,000 economics web sites from around the world. Searches with it only return their contents.
http://www.aeaweb.org/RFE/
tags: data
Andr‽s Corrada-Emmanuel Research
The Enron Email Dataset Email research has taken a giant step forward with a positive side-effect from the whole Enron debacle. We now have, thanks to the foresight of a few researchers, an email corpus from an actual, living corporation.
http://ciir.cs.umass.edu/~corrada/enron/
tags: enron corpus data
NOAA/NGDC/MGG-Topography, Digital Terrain Data
http://www.ngdc.noaa.gov/mgg/topo/topo.html
tags: geo data
The National Map Seamless Server
http://seamless.usgs.gov/
tags: geo data
Terrain data images
These subdirectories represent a viewable form of the NOAA GLOBE database. The data has been reduced from 16-bit resolution to 8-bit false color. For original data, contact Ed Falk or visit the GLOBE web site. Higher-resolution (1" and less) data is available from the USGS. See Ed Falk for some samples, including the Bay Area.
http://peregrine/efalk/Terrain/
tags: data geo
StockMorph ‽ Sources of Stock Market Data
http://www.stockmorph.com/sources-of-stock-market-data/
tags: stock data nasdaq djia sp nyse
Welcome to NHGIS — National Historical Geographic Informat...
http://www.nhgis.org/
tags: data census
U.S.Census Bureau - TIGER/Line‽
http://www.census.gov/geo/www/tiger/
tags: data geo census
The Memory Hole > OSHA's Lost Workday Injury and Illness Dat...
OSHA's Lost Workday Injury and Illness Database Injury/Illness Rates for Tens of Thousands of Companies Identified by Name OSHA fought in court for 2 years to keep this material secret
http://www.thememoryhole.org/osha/lwdii.htm
tags: data memoryhole ohsa
|