database

Pages tagged database:

Rules of Database App Aging - Push cx
http://push.cx/2009/rules-of-database-app-aging

"all fields become optional" etc. good stuff.

Rule 3. (too true!) Chatter Always Expands

This will be incomprehensible to non developers in the audience, but oh god, this is so painfully, painfully true.

"I mentioned I’ve learned some rules of how database apps change over time, now that I’ve done a few dozen. They are: ... "

Home - Who Runs Gov - Government directory
http://whorunsgov.com/

site du washington post sur l'administration obama (participatif)

Awesome site provided by the Washington Post that provides in-depth information on many of Washington's inside elite.

WhoRunsGov.com offers a unique look at the world of Washington through its key players and personalities.

Washington Post experiment

WWW SQL Designer - default
http://ondras.zarovi.cz/sql/demo/?keyword=default

Visual relational database designer, done entirely in <canvas> and javascript. Rather swanky-looking.

wwwsqldesigner - Google Code
http://code.google.com/p/wwwsqldesigner/

This tool allows you to draw and create database schemas (E-R diagrams) directly in browser, without the need for any external programs (flash)

ブラウザ上でさくさくデータベースの設計ができる『WWW SQL Designer』がすごすぎる - IDEA*IDEA ～百式管理人のライフハックブログ～
http://www.ideaxidea.com/archives/2009/01/www_sql_designer.html

これは次回使いたい。

DIY: How to write a book - Boing Boing
http://www.boingboing.net/2009/01/27/diy-how-to-write-a-b.html

How to write a book notes by Steven Johnson

Listable
http://www.listable.org/

vie guthrie

Listable, create and share lists with JSON, SQL, and plaintext output

via guthrie

listable, list making, serving site

having so much fun with this already (via sccottt)

Reject Database for iPhone Developer
http://iphone-rejectdb.appspot.com/
BBC - Radio Labs - How we make websites
http://www.bbc.co.uk/blogs/radiolabs/2009/01/how_we_make_websites.shtml

how good is the BBC?

15 Websites to Trace People Online | MakeUseOf.com
http://www.makeuseof.com/tag/15-websites-to-trace-people-online/

There are many websites that search standard social networks like MySpace or Facebook, but Piple is one resource that conducts a “deep web” dig for the name you’re looking for on “non-typical sites.” The search results from Piple are pretty impressive. Y

the private world of yesterday is now an online world with open access to social networks, government databases, and public records.

pentru tine cu drag

dewey
http://deweymusic.org/

search engine for archive.org music

music library

I wrote a Perl script to crawl through the Live Music Archive and make an XML file of all the streamable songs, and now I'm putting the information from the XML file into MySQL databases to populate this interface. I'm also tweaking the interface to make it look nice and neat. Dewey is named after the Dewey Decimal System because it organizes the Live Music Archive library in the same way that the Dewey Decimal System does. I understand that sifting through that many artists is a bit daunting if you don't know what you're looking for. I've added Genre tags that are pulled from Last.fm to help you find your style, and a search function to help you find exactly what you want. While I look into better options, and as the database grows, let me suggest that you use the "Concerts From Today" tab to narrow down your listening possibilities while browsing. I hope to be adding rating, tagging and commenting functionality soon.

32 Tips To Speed Up Your MySQL Queries | AjaxLine
http://www.ajaxline.com/32-tips-to-speed-up-your-mysql-queries
How to Paginate Data with PHP - NETTUTS
http://nettuts.com/tutorials/php/how-to-paginate-data-with-php/

How to Paginate Data with PHP

from NETTUTS

An explanation of how to paginate data using PHP -- code is included.

Catalogue of Digitized Medieval Manuscripts: About Us
http://manuscripts.cmrs.ucla.edu/
twtbase - Twitter Application Database
http://twtbase.com/

Application Databace

Is the Relational Database Doomed? - ReadWriteWeb
http://www.readwriteweb.com/archives/is_the_relational_database_doomed.php

Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud.

Article about where key/value databases should be used over relational databases, with some examples of dbs available.

purpose of the key/value databases. is the paradigm changing these days?

Tokyo Cabinet: Beyond Key-Value Store - igvita.com
http://www.igvita.com/2009/02/13/tokyo-cabinet-beyond-key-value-store/

SAVE N SHARE

database blog

A database lib

bdb alternative und sehr schnell

Ask SM [PHP]: Form Validation, Converting MySQL to XML | Developer's Toolbox | Smashing Magazine
http://www.smashingmagazine.com/2009/02/05/ask-sm-php-questions/

Ask SM sobre PHP: Validacion de formularios, MySQLtoXML y mas ...

smashingmagazine.com: form validation & converting mysql to xml

InfoQ: CouchDB and Me
http://www.infoq.com/presentations/katz-couchdb-and-me

In this talk from RubyFringe, Damien Katz explains what drove him to create CouchDB, why he chose Erlang and what made him decide to sell his house to work on Free Software.

Very inspiring.

CodeProject: Visual Representation of SQL Joins. Free source code and programming help
http://www.codeproject.com/KB/database/Visual_SQL_Joins.aspx
Medpedia - Welcome
http://medpedia.com/
漢(オトコ)のコンピュータ道: MySQLを高速化する10の方法
http://nippondanji.blogspot.com/2009/02/mysql10.html

tuning tips

DBのチューニング

Araelium Group : Querious - MySQL Database Tool
http://www.araelium.com/querious/

Looks hot. Hopefully less crash prone than MySQL Administrator. Will try.

MySQL Database editing app for OS X

DeepPeep: discover the hidden web
http://www.deeppeep.org/index.jsp

DeepPeep is a search engine specialized in Web forms. The current beta version currently tracks 13,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services.

Moteur de recherche pour le web invisible

PostgreSQLを高速化する16のポイント
http://neta.ywcafe.net/000960.html

ここ数年のPostgreSQLの進化はすさまじく、2009年現在においてその性能はOracleと同等かそれより速い。

postgresqlを使う時に読み返したい

New Search Technologies Mine the Web More Deeply - NYTimes.com
http://www.nytimes.com/2009/02/23/technology/internet/23search.html

"Now a new breed of technologies is taking shape that will extend the reach of search engines into the Web’s hidden corners. When that happens, it will do more than just improve the quality of search results — it may ultimately reshape the way many companies do business online."

Google now indexes a trillion web pages - but that's just a fraction of what's out there. So, what does it miss?

...google is built for a static web...................

Amazon Exposes 1 Terabyte of Public Data to Developers - ReadWriteWeb
http://www.readwriteweb.com/archives/amazon_exposes_1_terrabyte_of.php
How FriendFeed uses MySQL to store schema-less data - Bret Taylor's blog
http://bret.appspot.com/entry/how-friendfeed-uses-mysql

Interesting article about MySQL scalability problems.

Directed Edge News » Blog Archive » On Building a Stupidly Fast Graph Database
http://blog.directededge.com/2009/02/27/on-building-a-stupidly-fast-graph-database/

on-building-a-stupidly-fast-graph-database

connected to and things that connect to them. These are symmetrical — so creating a link from item A to item B, creates a reference from item B to item A.

New Search Technologies Mine the Web More Deeply - NYTimes.com
http://www.nytimes.com/2009/02/23/technology/internet/23search.html?_r=1

An interesting look at the daunting task of connecting/mining the interwebs.

Search engines are starting to penetrate databases that are set up to respond to typed queries.

how to search databases, semantic seb

Plurk Open Source - LightCloud - Distributed and persistent key value database
http://opensource.plurk.com/LightCloud/

aid, here is what it takes to do 10.000 gets and sets:

HowFriendFeedUsesMySqlToStoreSchemaLessData - FriendFeed では MySQL を使いどのようにスキーマレスのデータを保存しているのか
http://hyuki.com/yukiwiki/wiki.cgi?HowFriendFeedUsesMySqlToStoreSchemaLessData

昨日のの邦訳

Dare Obasanjo aka Carnage4Life - Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes
http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx

"Database sharding is the process of splitting up a database across multiple machines to improve the scalability of an application. The justification for database sharding is that after a certain scale point it is cheaper and more feasible to scale a site horizontally by adding more machines than to grow it vertically by adding beefier servers."

SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);

Jonathan Ellis's Programming Blog - Spyced: All you ever wanted to know about writing bloom filters
http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html
Ruby on Rails + MySQL で全文検索 - ドワンゴ研究開発ブログ
http://info.dwango.co.jp/rd/2009/02/ruby-on-rails-mysql.html

よいまとめ

Adam Gotterer - How we cache at CollegeHumor
http://www.adamgotterer.com/2009/03/01/how-we-cache-at-collegehumor/

CollegeHumor

CollegeHumor memcache use

Monty says: Oops, we did it again (MySQL 5.1 released as GA with crashing bugs)
http://monty-says.blogspot.com/2008/11/oops-we-did-it-again-mysql-51-released.html

Andrew is the original developer of MySQL & has recently left Sun.

The reason I am asking you to be very cautious about MySQL 5.1 is that there are still many known and unknown fatal bugs in the new features that are still not addressed.

redis - Google Code
http://code.google.com/p/redis/

Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.

“Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements. “In order to be very fast but at the same time persistent the whole dataset is taken in memory and from time to time and/or when a number of changes to the dataset are performed it is written asynchronously on disk. You may lost the last few queries that is acceptable in many applications but it is as fast as an in memory DB (beta 6 of Redis includes initial support for master-slave replication in order to solve this problem by redundancy).”

A nice fast K/V data store, with some nice list/set features.

Amazon Web Services Blog: New AWS Public Data Sets - Economics, DBpedia, Freebase, and Wikipedia
http://aws.typepad.com/aws/2009/02/new-aws-public-data-sets-economics-dbpedia-freebase-and-wikipedia.html

We have just released four additional AWS public data sets, and have updated another one. In the Economics category, we have added a set of transportation databases from the US Bureau of Transportation Statistics. Data and statistics are provided for aviation, maritime, highway, transit, rail, pipeline, bike & pedestrian, and other modes of transportation, all in CSV format. I was able to locate employment data for our hometown airline and found out that they employed 9,322 full-time and 1,122 part-time employees as of the end of 2007. In the Encyclopedic category, we have added access to the DBpedia Knowledge Base, the Freebase Data Dump, and the Wikipedia Extraction, or WEX.

amazon

Performance, Scalabilty and Architecture - Java and .NET Application Performance Management (dynaTrace Blog) » Understanding Caching in Hibernate - Part One : The Session Cache
http://blog.dynatrace.com/2009/02/16/understanding-caching-in-hibernate-part-one-the-session-cache/
10 Ways to Automatically & Manually Backup MySQL Database | Noupe
http://www.noupe.com/php/10-ways-to-automatically-manually-backup-mysql-database.html
Everything You Need to Get Started With MySQL - NETTUTS
http://net.tutsplus.com/tutorials/php/everything-you-need-to-get-started-with-mysql/

Everything You Need to Get Started With MySQL - NETTUTS

Are Cloud Based Memory Architectures the Next Big Thing? | High Scalability
http://highscalability.com/are-cloud-based-memory-architectures-next-big-thing

We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon. Let's take a short trip down web architecture lane: # It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database # It's 1995: Scale-up the database. # It's 1998: LAMP # It's 1999: Stateless + Load Balanced + Database + SAN # It's 2001: In-memory data-grid. # It's 2003: Add a caching layer. # It's 2004: Add scale-out and partitioning. # It's 2005: Add asynchronous job scheduling and maybe a distributed file system. # It's 2007: Move it all into the cloud. # It's 2008: Cloud +

What makes Memory Based Architectures different from traditional architectures is that memory is the system of record. Also discussed Jim Starkey NimbusDB

13 Great WordPress Speed Tips & Tricks for MAX Performance | Noupe
http://www.noupe.com/wordpress/13-great-wordpress-speed-tips-tricks-for-max-performance.html

Performance is a key factor for any successful website. And since WordPress is becoming more popular than ever, it will only be at its best when raised in the

13 Great WordPress Speed Tips & Tricks for MAX Performance

IP address geolocation SQL database | Share your knowledge!
http://blogama.org/node/58
Academic Reference and Research Index, accessing selected reference sites
http://www.academicindex.net/
漢(オトコ)のコンピュータ道: さらにMySQLを高速化する7つの方法
http://nippondanji.blogspot.com/2009/03/mysql7.html

おまけ： Sharding

Web屋のネタ帳さんの記事では16のポイントが紹介されているが、漢（オトコ）のコンピュータ道の記事は10の方法だったのであと6つ足りない。オトコは数で勝負！！というわけで今日はネタを振り絞ってさらに7つのMySQL高速化テクニックを紹介しよう。

さらにMySQLを高速化する7つの方法

TheOfficialBoard
http://www.theofficialboard.com/

Search for organizational charts with The Official Board. We provide constantly updated organizational charts of the world’s 20 000 largest corporations. A strong personal network is the key to professional success. The Official Board is constantly developed and updated by our members in real time. That means our members are the first to know. Free registration is required. Most of the information can be accessed for free by our members. Then, by adding to the database, they become allowed to search for deeper information in each organizational chart.

View the organizational charts of the world's 20 000 largest corporations.

Welcome to The Official Board. We provide constantly updated organizational charts of the world's 20 000 largest corporations.

org charts of companies

Ger information om företagets ledning. Wiki

org charts

"We provide constantly updated organizational charts of the world’s 20 000 largest corporations."

See also the techcrunch article.

Most common passwords list from 3 databases
http://blog.jimmyr.com/Password_analysis_of_databases_that_were_hacked_28_2009.php

List of most commonly used passwords

A detailed password analysis of compromised passwords from myspace, phpbb, and singles.org

Singles.orgのパスワード、やけに宗教的な語句が多いなと思ったら、キリスト教徒用出会いサイトなのね

Media Database
http://www.trackvia.com/misc/media-database.htm

Includes country, publication, title, beat.

Media Database powered by TrackVia that features media on Twitter.

MediaOnTwitter database

A US datebase of (US) media types in Twitter

Amazon Elastic MapReduce
http://aws.amazon.com/elasticmapreduce/

There's a growing trend to provide some pretty awesome IT services over the internet. Seems to me that's the way it will mostly be in a decade's time - or less.

Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).

Who needs infrastructure? Keep your data somewhere else. Process your data somewhere else. You can now run your small data business out of your garage. Just photoshop a nice office for the investors.

フォント専門サイト fontnavi(フォントナビ)
http://fontnavi.jp/index.aspx

フォント探しに便利

フォントを扱ってるサイト。有料のみ？

便利なフォント検索サイトがOPEN! 3,000以上の日本語フォントからすぐに探せます。

Test Center: Slacker databases break all the old rules | InfoWorld | Test Center | March 24, 2009 | By Peter Wayner
http://www.infoworld.com/archives/emailPrint.jsp?R=printThis&A=/article/09/03/24/12TC-databases_1.html

non-relation db comparision

Alternatives to Windows, Mac, Linux and online applications | AlternativeTo.net
http://alternativeto.net/
Good Advice on Keeping Your Database Simple and Fast. - All Things Distributed
http://www.allthingsdistributed.com/2009/03/keeping_your_database_simple_and_fast.html

Keeping your database simple and fast is often difficult if you use higher level frameworks such as ActiveRecords in Ruby or Java object persistence technologies such as Hibernate. There is a lot of magic that is happening out of sight that you have no control over. If you then have to scale your application it is often the relational database that these technologies require that becomes the performance and scaling bottleneck. Often requiring complex custom implementations of partitioning and sharding to make it work. The AWS services Amazon S3 and Amazon SimpleDB were designed to handle the dominant storage usage patterns within Amazon and they greatly reduced our need to rely on relational storage for scaling our systems. But it is almost never the case that a single storage technique is used in applications and services that need to operate at enterprise scale. For example it is a common pattern that objects stored in S3 using a primary key, have a collection of secondary keys (e.g

allthingsdistributed.com allthingsdistributed.com Database DatabaseSimpleandFast

IP address geolocation SQL database
http://www.iplocationtools.com/sql_database.php

Free downloadable IP address to long/lat SQL database

free IP address geolocation SQL database

IP Location tools :: IP Location Tools
http://www.iplocationtools.com/

IP Location tools

Make: Online : Free, unlimited IP address geolocation with MySQL
http://blog.makezine.com/archive/2009/03/free_unlimited_ip_address_geolocati.html?CMP=OTC-0D6B48984890

Zur Spezifikation der Ausschreibung hinzufügen

For example, if you have an ip of 74.125.45.100 (google.com)

駅データ無料ダウンロード『駅データ．ｊｐ』
http://www.ekidata.jp/
JSONの可能性がグンと拡がるぞ！ JSONスキーマ - 檜山正幸のキマイラ飼育記
http://d.hatena.ne.jp/m-hiyama/20090413/1239581682

JSONスキーマ

Search for Recipes by Ingredient - Recipe Puppy
http://www.recipepuppy.com/

thanks to lifehacker for this link

Search engine for finding recipes based on what you have in the house!

Some Notes on Distributed Key Stores « random($foo)
http://randomfoo.net/2009/04/20/some-notes-on-distributed-key-stores

Distributed Key Stores

(Anti RDBMS) Key-value stores

Persevere: The JSON database and JavaScript application server
http://www.persvr.org/
Welcome to Essential Evidence Plus
http://www.essentialevidenceplus.com/
Twitterで使えるbot50徹底レビュー！
http://ascii.jp/elem/000/000/410/410755/

これは非常に興味ある。見ておこう。

これはいろいろ参考になりそう！いくつかFollow増やそうかなあ。

Twitterで使えるbot50徹底レビュー！ Twitterで使えるbot50徹底レビュー！約1000万人が利用しているという、つぶやき共有型SNS「Twitter」。そこには「bot」という便利なサービスがある。Twitter上でフォロー（発言を共有するユーザーリストに追加）するだけで最新ニュースが分かったり、「＠」をつけてbot向けに発言するだけで晩御飯のレシピが分かったりと、うまく活用すれば非常に便利だ。だがその数はおそろしく多く、どれを使えばいいのかよく分からない。そこで今回は... はてなブックマーク - Twitterで使えるbot50徹底レビュー！はてなブックマークに追加 gin-oi2 gin-oi2 *まとめ, twitter 最近botばかりフォロー追加してるわ

Jedi/Sector One's random thoughts - An overview of modern SQL-free databases
http://00f.net/2009/an-overview-of-modern-sql-free-databases
Official Google Blog: Adding search power to public data
http://googleblog.blogspot.com/2009/04/adding-search-power-to-public-data.html

All the data we've used in this first launch are produced and published by the U.S. Bureau of Labor Statistics and the U.S. Census Bureau's Population Division. They did the hard work! We just made the data a bit easier to find and use. Since Google's acquisition of Trendalyzer two years ago, we have been working on creating a new service that make lots of data instantly available for intuitive, visual exploration.

Google launched a new search feature that makes it easy to find and compare public data. So for example, when comparing Santa Clara county data to the national unemployment rate, it becomes clear not only that Santa Clara's peak during 2002-2003 was really dramatic, but also that the recent increase is a bit more drastic than the national rate. If you go to Google.com and type in [unemployment rate] or [population] followed by a U.S. state or county, you will see the most recent estimates. Once you click the link, you'll go to an interactive chart that lets you add and remove data for different geographical areas.

http://www.google.com/publicdata?ds=usunemployment&met=unemployment_rate&idim=county:CN060850#met=unemployment_rate&idim=county:PS060900

Adding search power to public data 4/28/2009 12:17:00 PM Earthquakes are not the only thing that can shake Silicon Valley. After the dot-com bubble burst back in 2000 the unemployment rate of Santa Clara county went up to 9.1%. During the last couple of months, it has gone up again:

Google has launched a cool, if somewhat limited, new feature that makes it easier to search for and visualize statistics gleaned from public data. You can search for "unemployment rate" or "population" for any area in the United States and Google will provide you with information from the US Bureau of Labor Statistics and the Census Bureau.

"We just launched a new search feature that makes it easy to find and compare public data... If you go to Google.com and type in [unemployment rate] or [population] followed by a U.S. state or county, you will see the most recent estimates... Once you click the link, you'll go to an interactive chart that lets you add and remove data for different geographical areas."

Alternatives to SQL Databases [LWN.net]
http://lwn.net/Articles/328487/

Traditional SQL databases with "ACID" properties (Atomicity, Consistency, Isolation and Durability) give strong guarantees about what happens when data is stored and retrieved. These guarantees make it easier for application developers, freeing them from thinking about exactly how the data is stored and indexed, or even which database is running. However, these guarantees come with a cost.

iostat -x « domas mituzas: vaporware, inc.
http://dammit.lt/2009/03/11/iostat/
ActiveRecord Optimization with Scrooge - igvita.com
http://www.igvita.com/2009/02/27/activerecord-optimization-with-scrooge/

Plugin that monitors the fields you're actually using from queries you make and over time dynamically adjusts your queries to retrieve only the fields you need. Apparently includes some magic to go re-query for more fields if you attempt to use one you hadn't loaded in the trimmed query. Amazing-looking stuff, though since we're currently using a DB on the same machine, transferring lots of extra data isn't nearly as expensive.

Dynamic query optimization is a hotbed of research in the database industry. Each and every query you execute goes through a rigorous optimization phase which tries to squeeze every last bit of performance: deciding which indexes to use, the execution order and sort order to minimize the number in-memory tables, etc. However, one thing the database has no access to is the application layer knowledge of which data the user is actually using after it is retrieved. Often times, the query fetches all of the columns when only a few are required, which is exactly the pattern that Lourens Naudé is seeking to optimize with his new plugin: scrooge.

Dynamic Query Optimization

BASE: AN ACID ALTERNATIVE - ACM Queue
http://queue.acm.org/detail.cfm?id=1394128

Excellent description of BASE design patterns

If ACID provides the consistency choice for partitioned databases, then how do you achieve availability instead? One answer is BASE (basically available, soft state, eventually consistent).

漢(オトコ)のコンピュータ道: MySQLのEXPLAINを徹底解説!!
http://nippondanji.blogspot.com/2009/03/mysqlexplain.html
世界史講義録
http://www.geocities.jp/timeway/

これはすごい！世界史好きだったので、時々読みに来よう！

高校世界史授業を誌上公開。脱線話も含め、可能な限り再現。古代史、中世史、近代史、東洋史、西洋史。生徒達に世界史を語ります。歴史の面白さ、楽しさを、伝えることが出来れば幸いです。

Backup your Database in Git | Viget Extend
http://www.viget.com/extend/backup-your-database-in-git/

When you think about it, a database dump is just SQL code, so why not manage it the same way you manage the rest of your code — in a source code manager? Setting such a scheme up is dead simple. On your production server, with git installed:

DB設計時のサイズ見積もり - よねのはてな
http://d.hatena.ne.jp/yone098/20090512/1242088638

各種ＤＢのサイズ計算方法リンク集

よく使うDB(Oracle/MySQL/PostgreSQL/SQLServer)における設計時のサイズ見積もりで使うサイトの備忘録。

よく使うDB(Oracle/MySQL/PostgreSQL/SQLServer)における設計時のサイズ見積もりで使うサイトの備忘録

Data.gov
http://www.data.gov/

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data. Visit today with us, but come back often. With your help, Data.gov will continue to grow and change in the weeks, months, and years ahead.

WOW "The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government."

The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.

The new U.S. federal open data site is live! "Data.gov will open up the workings of government by making economic, healthcare, environmental, and other government information available on a single website, allowing the public to access raw data and transform it in innovative ways."

Drop ACID and Think About Data | High Scalability
http://highscalability.com/drop-acid-and-think-about-data

nice summary of different data stores...

With YQL Execute, the Internet becomes your database (Yahoo! Developer Network Blog)
http://developer.yahoo.net/blog/archives/2009/04/yql_execute.html

The Yahoo! Query Language lets you query, filter, and join data across any web data source or service on the web. Using our YQL web service, apps run faster with fewer lines of code and a smaller network footprint. YQL uses a SQL-like language because it is a familiar and intuitive method for developers to access data. YQL treats the entire web as a source of table data, enabling developers to select * from Internet.

YQL + Linked Data = possibilities

Execute elements run server-side JavaScript with E4X (na

How SQLite Is Tested
http://www.sqlite.org/testing.html

Describes the testing of SQLite. Great overview of various testing techniques and how they've been applied to a significant software project.

Socrata | Making Data Social
http://www.socrata.com/

"Opening government to new audiences and constituencies is the 21st century battle cry in societies everywhere. At the heart of this movement is open government data, readily accessible over the internet, in a form that maximizes comprehension, interactivity, participation, and sharing, delivered at a fraction of the cost of today's data download sites."

This used to be the site called blist.

AWESOME source of data sets, .csv

10 Essential SQL Tips for Developers - Nettuts+
http://net.tutsplus.com/tutorials/other/10-essential-sql-tips-for-developers/
MIT Database Systems (6.830) TA Course Notes - marcua's blog
http://blog.marcua.net/post/117671929/mit-database-systems-6-830-ta-course-notes

Computer Science and Artificial Intelligence Laboratory. Navigate * Website * Twitter * Subscribe * Archives * Random Subscribe by Email MIT Database Systems (6.830) TA Course Notes In Fall 2008, I had the pleasure of TAing Database Systems with Sam Madden, Mike Stonebraker, and Evan Jones. I figured that I could take notes to help students follow the lectures while clarifying any confusing points that were raised during discussion. It would also help me avoid the embarrassment of forgetting something mentioned during a lecture and having students explain it to me during office hours:). I decided to take notes in plain text, mostly out of laziness. This turned out to be a challenge for drawing things like query plans, but forced me to distill explanations into a conversational tone that provided an alternative to traditional diagrams. Some students in the class told me that they benefited from and enjoyed the notes, and so I decided to open them up for reuse

Why CouchDB?
http://books.couchdb.org/relax/why-couchdb

man, I really really wish I understood this stuff.

shows

“Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated.”

ebook on why you would choose couchdb

CouchDB: Perform like a pr0n star
http://www.slideshare.net/mattetti/couchdb-perform-like-a-pr0n-star

Check out this SlideShare Presentation : CouchDB: Perform like a pr0n star http://tinyurl.com/cukfou [from http://twitter.com/josefrichter/statuses/1588959474]

Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
http://www.slideshare.net/iamcal/scalable-web-architectures-common-patterns-and-approaches-web-20-expo-nyc-presentation
A List Apart: Articles: Indexing the Web—It’s Not Just Google’s Business
http://www.alistapart.com/articles/indexing-the-web-its-not-just-googles-business/

a basic one about optimizing database query execution time

Indexing the Web

Performance comparison: key/value stores for language model counts - Brendan O'Connor's Blog
http://anyall.org/blog/2009/04/performance-comparison-keyvalue-stores-for-language-model-counts/

The first one is to use an in-memory data store, and communicate using the memcached protocol. This is, of course, *exactly* comparable to Memcached — behaviorally indistinguishable! — and it does worse. The second option is to do that, except switch to an on-disk data store. It’s pretty ridiculous that that’s still the same speed — communication overhead is completely dominating the time. Fortunately, Tyrant comes with a binary protocol. Using that substantially improves performance past Memcached levels, though less than a direct in-process database. Yes, communication across processes incurs overhead. No news here, I guess.

"Tokyo Tyrant is a server implemented on top of Cabinet that implements a similar key/value API except over sockets. It’s incredibly flexible; it was very easy to run it in several different configurations. The first one is to use an in-memory data store, and communicate using the memcached protocol. This is, of course, *exactly* comparable to Memcached — behaviorally indistinguishable! — and it does worse. The second option is to do that, except switch to an on-disk data store. It’s pretty ridiculous that that’s still the same speed — communication overhead is completely dominating the time. Fortunately, Tyrant comes with a binary protocol. Using that substantially improves performance past Memcached levels, though less than a direct in-process database. Yes, communication across processes incurs overhead. No news here, I guess."

Google Fusion Tables (Pre-Alpha)
http://tables.googlelabs.com/Home

Fusion Tables é uma plataforma online com nova tecnologia que uniformiza diversos tipos de dados e promete economia às empresas.

Official Google Research Blog: Google Fusion Tables
http://googleresearch.blogspot.com/2009/06/google-fusion-tables.html

Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly. Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we

Project Voldemort Blog : Building a terabyte-scale data cycle at LinkedIn with Hadoop and Project Voldemort
http://project-voldemort.com/blog/2009/06/building-a-1-tb-data-cycle-at-linkedin-with-hadoop-and-project-voldemort/

Not one of those "we're using hadoop, now we're cool" articles. Well written!

Hadoop

British Newspapers - Home
http://newspapers.bl.uk/blcs

Search British Newspapers from 1800-1900. Many with free content

Explore two million pages of 19th century newspapers

Neo4j - a Graph Database that Kicks Buttox | High Scalability
http://highscalability.com/neo4j-graph-database-kicks-buttox

If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j, a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships."

another graphbase

SitePen Blog » JavaScriptDB: Persevere’s New High-Performance Storage Engine
http://www.sitepen.com/blog/2009/04/20/javascriptdb-perseveres-new-high-performance-storage-engine/

JavaScriptDB: Persevereâ€™s New High-Performance Storage Engine April 20th, 2009 at 8:47 pm by Kris Zyp The latest beta of Persevere features a new native object storage engine called JavaScriptDB that provides high-end scalability and performance. Persevere now outperforms the common PHP and MySQL combination for accessing data via HTTP by about 40% and outperforms CouchDB by 249%. The new storage engine is designed and optimized specifically for persisting JavaScript and JSON data with dynamic object structures. It is also built for extreme scalability, with support for up to 9,000 petabytes of JSON/JS data in addition to any binary data.

IP address geolocation SQL database :: IPInfoDB
http://ipinfodb.com/ip_database.php

Complete (City)

The SQL database behind ipinfodb.com is offered for free. We offer the database in different formats (SQL, CSV), city or country precision, 3 or 4 IP digits precision and data in single or multiple tables. Available information in the database : ISO country code, country name, FIPS region code, region name, city, zipcode, latitude, longitude and GMT/DST timezone. The database is updated during the first week of each month.

Wescript
http://wescript.net/

自動アップデートしてくれるグリモンのなんかそういうの

ユーザースクリプとの管理、アプデート、人気スクリプトのランキングとか。

greasemonkeyのスクリプトの更新チェック、管理

Wescript is utility for userscript runtime environments, such as Greasemonkey. It's useful for finding popular userscripts and checking userscript updates.

Should you go Beyond Relational Databases? | Think Vitamin
http://thinkvitamin.com/dev/should-you-go-beyond-relational-databases/

Relational databases, such as MySQL, PostgreSQL and various commercial products, have served us well for many years. Lately, however, there has been a lot of discussion on whether the relational model is reaching the end of its life-span, and what may come after it.

Alternatives to SQL dbs - document, key-value, graph databases

braindump: NOSQL debrief
http://blog.oskarsson.nu/2009/06/nosql-debrief.html

NOSQL debrief

braindump: NOSQL debrief

First ever meeting of the NoSQL community. Lists all the presentations that were given.

No to SQL? Anti-database movement gains steam
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9135086

No to SQL? Anti-database movement gains steam

The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party. The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive. Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data. "Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system]," said Jon Travis, principal engineer at Java toolmaker SpringSource, one of the 10 presenters at the NoSQL confab (PDF). NoSQL-based alternatives "just give you what you need," Travis said. Open source rises up The movement's chief champions are Web and Java developers, many of whom learned to get by at their cash-strapped startups without Ora

The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party.

piece on an alternative approach to data management

A Comparison of Open Source Search Engines « zooie’s blog
http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/

a first step to investigate search engine.

Later this month we will be presenting a half day tutorial on Open Search at SIGIR. It’ll basically focus on how to use open source software and cloud services for building and quickly prototyping advanced search applications. Open Search isn’t just about building a Google-like search box on a free technology stack, but encouraging the community to extend and embrace search technology to improve the relevance of any application.

up and running with cassandra :: snax
http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/

Cassandra is a hybrid non-relational database in the same class as Google's BigTable. It is more featureful than a key/value store like Dynomite, but supports fewer query types than a document store like MongoDB. Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.

Backup2Mail — Send MySQL database backup to your mailbox
http://www.backup2mail.com/

Backup2Mail is mini PHP application that creates regular backups of your MySQL database and sends it to configurable e-mail address. The whole process is scheduled with a help of Cron, a Unix program that runs programs at scheduled times.

Send MySQL database backup to your mailbox

http://www.backup2mail.com/

SQL Databases Are An Overapplied Solution (And What To Use Instead)
http://adam.blog.heroku.com/past/2009/7/8/sql_databases_are_an_overapplied_solution_and_what_to_use_instead/

SQL Databases Are An Overapplied Solution (And What To Use Instead)

Official Google Research Blog: Large-scale graph computing at Google
http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html

I want one of these! "We have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel). Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. "

Kernel

So many things to learn and apply in business deals.

http://spinn3r.com/rank

Adding Simplicity - An Engineering Mantra: Shard Lessons
http://www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard-lessons.html

No, not SHARED lessons, I mean SHARD lessons. I have to admit that until about a year ago I didn't really know the term shards in relation to databases. Now don't confuse that with not understanding how databases can be horizontally scaled. I was introduced to that concept and helped to define the various ways it can be done but we just called it splits. Regardless of what you call it, there are some interesting challenges that are introduced. The well known challenges of consistency are discussed ad nauseam, even by me, so I'm not going there with this article. But besides that, there are some other lessons to learn when applying the pattern to your data.

Worth reading just for the section on intelligently designing shard counts. Great discussion on picking counts that smooth your cost step function

How b-tree database indexes work and how to tell if they are efficient (100' level) | mattfleming.com
http://mattfleming.com/node/192

A team member thought we should add an index on a 90 million row table to improve performance. The field on which he wanted to create this index had only four possible values. To which I replied that an index on a low cardinality field wasn't really going to help anything. My boss then asked me why wouldn't it help? I sputtered around for a response but ended up telling him that I'd get back to him with a reasonable explanation.

Imported from http://twitter.com/newsycombinator/status/2645303258 How b-tree database indexes work and how to tell if they are efficient http://bit.ly/dd6mf

TwitterAlikeExample - redis - Google Code
http://code.google.com/p/redis/wiki/TwitterAlikeExample

Case study on Redis

Facebook, Hadoop, and Hive | DBMS2 -- DataBase Management System Services
http://www.dbms2.com/2009/05/11/facebook-hadoop-and-hive/

Just wanted to add that even though there is a single point of failure the reliability due to software bugs has not been an issue and the dfs Namenode has been very stable. The Jobtracker crashes that we have seen are due to errant jobs - job isolation is not yet that great in hadoop and a bad query from a user can bring down the tracker (though the recovery time for the tracker is literally a few minutes). There is some good work happening in the community though to address those issues.

I few weeks ago, I posted about a conversation I had with Jeff Hammerbacher of Cloudera, in which he discussed a Hadoop-based effort at Facebook he previously directed. Subsequently, Ashish Thusoo and Joydeep Sarma of Facebook contacted me to expand upon and in a couple of instances correct what Jeff had said. They also filled me in on Hive, a data-manipulation add-on to Hadoop that they developed and subsequently open-sourced.

4store - Scalable RDF storage
http://4store.org/

4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications

4store is a fast, scalable clustered RDF database

4store is an efficient, scalable and stable RDF database

4store, an efficient, scalable and stable RDF database 4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications. It has been providing the base platform for around 3 years. At times holding and running queries over databases of 15GT, supporting a Web application used by thousands of people.

"4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications. It has been providing the base platform for around 3 years. At times holding and running queries over databases of 15GT, supporting a Web application used by thousands of people."

Social Media Brand Engagement Database - ENGAGEMENTdb
http://www.engagementdb.com/

Want to know not just what companies are doing on the social web but how well they're doing it? We have brought you just the tool to measure and monitor brand engagement: for the first time ever, ENGAGEMENTdb ranks the world's most valuable brands based on how they leverage social media to interact with customers.

My Thoughts on NoSQL - Die in a Fire - Eric Florenzano’s Blog
http://www.eflorenzano.com/blog/post/my-thoughts-nosql/

Over the past few years, relational databases have fallen out of favor for a number of influential people in our industry. I'd like to weigh in on that, but before doing so, I'd like to give my executive summary of the events leading up to this movement

Tokyo Cabinet

Обзор нескольких опенсурсных нереляционных БД.

Thoughts on NoSQL, Tokyo Cabinet, CouchDB, Redis, and Cassandra.

The Soldier in Later Medieval England
http://www.icmacentre.ac.uk/soldier/database/index.php

Database of soldiers who fought in wars during the Medieval era, including the Hundred Years War. Not sure how to use this just yet...

A team led by Dr. Adrian Bell and Prof. Anne Curry, with funding from the Arts and Humanities Research Council, have put up a stunning new database of military service records of medieval soldiers serving from 1369 and 1453: While the database’s primary purpose seems to be exploring the lives of individual soldiers of note, There are great many potential applications for large observation (large-n) quantitative studies of conflict and health. Variables in the database include: First Name, Last Name, Status, Rank, Captain’s Name, Commander’s Name, Year of Service, Nature of Activity, Reference Number, and Membrane. Read the project details for more information.

HadoopDB Project
http://db.cs.yale.edu/hadoopdb/hadoopdb.html

An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.

HadoopDB is: 1. A hybrid of DBMS and MapReduce technologies that targets analytical workloads 2. Designed to run on a shared-nothing cluster of commodity machines, or in the cloud 3. An attempt to fill the gap in the market for a free and open source parallel DBMS 4. Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems. 5. As scalable as Hadoop, while achieving superior performance on structured data analysis workloads

DBMS Musings: Announcing release of HadoopDB (longer version)
http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html

my students Azza Abouzeid and Kamil Bajda-Pawlikowski developed HadoopDB. It's an open source stack that includes PostgreSQL, Hadoop, and Hive, along with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and an interface that accepts queries in MapReduce or SQL and generates query plans that are processed partly in Hadoop and partly in different PostgreSQL instances spread across many nodes in a shared-nothing cluster of machines. In essence it is a hybrid of MapReduce and parallel DBMS technologies. But unlike Aster Data, Greenplum, Pig, and Hive, it is not a hybrid simply at the language/interface level. It is a hybrid at a deeper, systems implementation level. Also unlike Aster Data and Greenplum, it is free and open source.

Filesuffix.com the filename extension database
http://www.filesuffix.com/

sitio ideal para quienes se rompen la cabeza buscando el software indicado que abra un tipo de archivo desconocido. Su base de datos brinda información detallada de prácticamente cualquier extensión: tipo, categoría, descripción, software con el cual abrir el archivo, etc. Además si utilizas Firefox disponen de un search plugin que te facilitará las búsquedas. Uno de los mayores inconvenientes de esta web es que los resultados privilegian la descarga de software propietario por encima de opciones confiables y funcionables de software libre.

filename extension database

Choosing a non-relational database; why we migrated from MySQL to MongoDB « Boxed Ice Blog
http://blog.boxedice.com/2009/07/25/choosing-a-non-relational-database-why-we-migrated-from-mysql-to-mongodb/
RethinkDB - The database for solid state drives.
http://www.rethinkdb.com/
Naming that tune in 140 characters or less :: Lyric Rat
http://lyricrat.com/

Type in some lyrics and the Lyric Rat will find the song for you. Also available as @LyricRat on Twitter. Tweet some lyrics.

TheUserManualSite.com - We found it so you don't have to!
http://www.manualsonline.com/

Locate hard-to-find user manuals, discover new features, and realize the potential of the products you rely on. ManualsOnline pairs self-help and product information with a growing community of engaged product owners.

use for manual help

TheUserManualSite.com - We found it so you don't have to!

%postname%
http://lazytechie.com/top-84-mysql-performance-tips/

MySQL is a widely used and fast SQL database server. It is a client/server implementation that consists of a server daemon (mysqld) and many different client programs/libraries. Here are very useful tips for all mysql DBA’s, Developers these tips are noted from MySQL Camp 2006 suggested by mysql community experts.

# # Don’t use DISTINCT when you have or could use GROUP BY

Don’t use deprecated features

NoSQL: If Only It Was That Easy « Marked As Pertinent
http://bjclark.me/2009/08/04/nosql-if-only-it-was-that-easy/

Intéressant, une étude des différentes db alternatives sous l'angle de la scalabilité

http://aws.amazon.com/s3/

data store scaling technologies

Tweet Blocker - Cleaning up the Twitterverse
http://tweetblocker.com/

Herramienta para poder detectar qué followers pueden ser spam

Use TweetBlocker to remove spam from your account. Developer API available.

Cleaning up the Twitterverse

Riak - A Decentralized Database
http://riak.basho.com/

Riak combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.

Performance, Scalabilty and Architecture - Java and .NET Application Performance Management (dynaTrace Blog) » Understanding Caching in Hibernate - Part Two : The Query Cache
http://blog.dynatrace.com/2009/02/16/understanding-caching-in-hibernate-part-two-the-query-cache/

In the last post I wrote on caching in Hibernate in general as well as on the behavior of the session cache. In this post we will have a closer look at the QueryCache. I will not explain the query cache in details as there are very good articles like Hibernate: Truly Understanding the Second-Level and Query Caches.

GFS: Evolution on Fast-forward - ACM Queue
http://queue.acm.org/detail.cfm?id=1594206

Google File System

ACM Queue, August 7, 2009

SQL pie chart | code.openark.org
http://code.openark.org/blog/mysql/sql-pie-chart

Shown below is a (single query) SQL-generated pie chart. I will walk through the steps towards making this happen, and conclude with what, I hope you’ll agree, are real-world, useful usage samples.

ASCII art via SQL

uery) SQL-generated pie chart. I will walk through the steps towards making this happen, and conclude with what, I hope you’ll agree, are real-world, useful usage samples. +-------------------------------------------------------

Pie Chart in SQL

create an ascii art pie chart with a single sql query

SQL でアスキーアート的なもの。

DataSF - DataSF - Liberating City Data
http://www.datasf.org/

Why can't every city have this?

City of SF opens site containing datasets

"DataSF is a clearinghouse of datasets available from the City & County of San Francisco. While there is plenty of room for improvement, our goal in releasing this site is: 1) improve access to data, 2) help our community create innovative apps, 3) understand what datasets you'd like to see, 4) get feedback on the quality of our datasets."

"DataSF is a clearinghouse of datasets available from the City & County of San Francisco. While there is plenty of room for improvement, our goal in releasing this site is: (1) improve access to data (2) help our community create innovative apps (3) understand what datasets you'd like to see (4) get feedback on the quality of our datasets."

Directed Edge - Home
http://directededge.com/index.html

Recommendations engine plug-in

Empfehlungs-Engine a ala Amazon

Conversie van postcode naar straat + woonplaats
http://kvdb.net/projects/6pp/

Door de data van http://openkvk.nl bevat #6pp al meer dan 50% van het totaal aantal postcodes in Nederland. Dank aan JWvdV voor de import.Project 6PP ontsluit vrije geografische gegevens in Nederland. Plaatsen, postcodes, straten en geo-coördinaten zijn toegankelijk als wiki, webservice en downloads.

Project 6PP ontsluit vrije geografische gegevens in Nederland. Plaatsen, postcodes, straten en geo-coördinaten zijn toegankelijk als wiki, webservice en downloads

Developers: Never Mind the APIs, Here's YQL Execute - ReadWriteWeb
http://www.readwriteweb.com/archives/theres_a_great_amount_of.php

Read: Developers: Never Mind the APIs, Here's YQL Execute [feedly] http://tr.im/koyE [from http://twitter.com/krisnelson/statuses/1693267224]

RWW's @jolieodell dares to tackle the powerful beast that is the new YQL Execute http://bit.ly/J1gxO and so far has lived to tell the tale [from http://twitter.com/marshallk/statuses/1680054262]

...includes explanation of what YQL is, starting with: a sophisticated solution that is agnostic across all Internet platforms and that lowers both the burden of labor and the barriers to entry for social and other web application developers

Make Firefox Faster by Vacuuming Your Database - Firefox - Lifehacker
http://lifehacker.com/5344418/make-firefox-faster-by-vacuuming-your-database

Components.classes["@mozilla.org/browser/nav-history-service;1"].getService(Components.interfaces.nsPIPlacesDatabase).DBConnection.executeSimpleSQL("VACUUM");

How XML Threatens Big Data : Dataspora Blog
http://dataspora.com/blog/xml-and-big-data/

Back in 2000, I went to France to build a genomics platform. A biotech hired me to combine their in-house genome data with that of public repositories like Genbank. The problem was the repositories, all with millions of records, each had their own format. It sounded like a massive, nightmarish data interoperability project. And an ideal fit for a hot new technology : XML

Three Rules for XML Rebels 1. Stop Inventing New Formats 2. Obey the Fifteen Minute Rule 3. Embrace Lazy Data Modeling

Un point de vue intéressant sur le xml, à rebours des conceptions en sciences de l'info (en tout cas les miennes)

Excellent thoughtful article on data bureaucracy and the limitations of XML.

Adminer
http://www.adminer.org/en/

Single-file PHP MySQL database administration tool

Adminer (formerly phpMinAdmin) is a full-featured MySQL management tool written in PHP. Conversely to phpMyAdmin, it consist of a single file ready to deploy to the target server.

an alternative to phpMyAdmin

csharp-sqlite - Project Hosting on Google Code
http://code.google.com/p/csharp-sqlite/
Vacuum Firefox databases for better performance, now with no restart - Mozilla Links
http://mozillalinks.org/wp/2009/08/vacuum-firefox-databases-for-better-performance-now-with-no-restart/

speed up

Vacuum Firefox databases for better performance, now with no restart

Components.classes["@mozilla.org/browser/nav-history-service;1"].getService(Components.interfaces.nsPIPlacesDatabase).DBConnection.executeSimpleSQL("VACUUM");

Petabytes on a budget: How to build cheap cloud storage | Backblaze Blog
http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

Build your own custom Backblaze Storage Pods: 67 terabyte 4U servers for $7,867.

みんなのきょうの料理 - NHK「きょうの料理」で放送された7年間の料理レシピや献立が探せる！
http://www.kyounoryouri.jp/

テレビ7年間のレシピを簡単検索！| みんなのきょうの料理

HealthBase - Powered by NetBase
http://healthbase.netbase.com/

Health meta search engine

This site aggregates search results from all sorts of medical sites so you get a lot of info in one place.

- Powered by NetBase

DataMasher
http://www.datamasher.org/

Infográficos de dados públicos

1. Pick a data set - /> orange circle Poverty Rate 2. Choose an operator - /> choose: - × ÷ 3. Pick another data set - /> blue circle Unemployment Your Mashup! - /> venn diagram Poverty Rate Unemployment

To empower people to discover and discuss government data through manipulation and mapping.

DataMasher is a tool that takes these vast quantities of information and allows you to whittle it down into simpler terms, offering an easy way to get hard data on certain topics without any intrusive media spin.

database_software [Internet Mindmap]
http://internetmindmap.com/database_software

Good list

list of many database softwares and categorized

Liste de logiciels de bases de données classés par modèle de gestion de l'information (relationnel, XML, RDF...)

What Visualization Tool/Software Should You Use? – Getting Started | FlowingData
http://flowingdata.com/2009/09/03/what-visualization-toolsoftware-should-you-use-getting-started/
"Anonymized" data really isn't—and here's why not - Ars Technica
http://arstechnica.com/tech-policy/news/2009/09/your-secrets-live-online-in-databases-of-ruin.ars

birthdate

Digg the Blog » Blog Archive » Looking to the future with Cassandra
http://blog.digg.com/?p=966

answer is 3TB database???

"The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes."

Wow, cassandra uses a lot of disk space. Trade offs!

Dare Obasanjo aka Carnage4Life - Building Scalable Databases: Denormalization, the NoSQL Movement and Digg
http://www.25hoursaday.com/weblog/2009/09/10/BuildingScalableDatabasesDenormalizationTheNoSQLMovementAndDigg.aspx

As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.

bit on why non-SQL dbs are used in social networking sites

WTF is a SuperColumn? An Intro to the Cassandra Data Model — Arin Sarkissian
http://arin.me/code/wtf-is-a-supercolumn-cassandra-data-model

Nice detailed examples on NoSQL data modeling in Cassandra.

bobby-tables.com: A guide to preventing SQL injection
http://bobby-tables.com/
twitter公式ナビゲーター twinavi
http://twinavi.jp/

初心者のためのtwitterサイト。

話題のサービスtwitterの総合ナビゲーションサイト。初心者のための使い方ガイド、おすすめアカウント、twitterを活用したユニークなサービスなどを総合的に紹介。ここに来ればtwitterの全てがわかる。

MICDS Library | Home
http://www.micdslibrary.com/

libary

QuirkeyBlog » Blog Archive » Sammy.js, CouchDB, and the new web architecture
http://www.quirkey.com/blog/2009/09/15/sammy-js-couchdb-and-the-new-web-architecture/
AggData | AggData
http://www.aggdata.com/

The goal of AggData is to play a small part in making this sought-out data more accessible, portable and reliable.

great source for aggregated data

AggData is short for aggregate data, which means a set of data that is collected together in one place. On this site, the AggData will come in the form of a list of records, where each record has details about a specific object in the group.

data aggregated by web scraping

another free data library.

30 Resources to Find the Data You Need | FlowingData
http://flowingdata.com/2009/10/01/30-resources-to-find-the-data-you-need/

Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there. Universities Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere.

漢(オトコ)のコンピュータ道: なぜMySQLのサブクエリは遅いのか。
http://nippondanji.blogspot.com/2009/03/mysql_25.html

サブクエリ、あんまりよくない理由

というわけでMySQLによるサブクエリの処理について見てきたが、きちんと気をつけて使えばサブクエリも高速に実行される。もちろんJOINに書き換えた方が速いのは言うまでもないが、SQL文のメンテナンスし易さなどを考えるとサブクエリで処理を書きたい！という人も居るのではないだろうか。そんな方は次の事に気をつけてサブクエリを使って頂きたい。 * サブクエリの種類 * 外部クエリとサブクエリの評価の順序 * 外部クエリにおいてフェッチされる行数 * サブクエリで利用されるインデックス * テンポラリテーブルのサイズ

Amazon Web Services Blog: Don't Forget: You Can Use Amazon SimpleDB For Free!
http://aws.typepad.com/aws/2009/10/dont-forget-you-can-use-amazon-simpledb-for-free.html
SQL Databases Don't Scale
http://adamblog.heroku.com/past/2009/7/6/sql_databases_dont_scale/
Graphs in the database: SQL meets social networks – techPortal
http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/

"Graphs are ubiquitous. Social or P2P networks, thesauri, route planning systems, recommendation systems, collaborative filtering, even the World Wide Web itself is ultimately a graph! Given their importance, it’s surely worth spending some time in studying some algorithms and models to represent and work with them effectively. In this short article, we’re going to see how we can store a graph in a DBMS. Given how much attention my talk about storing a tree data structure in the db received, it’s probably going to be interesting to many. Unfortunately, the Tree models/techniques do not apply to generic graphs, so let’s discover how we can deal with them."

How Google Taught Me to Cache and Cash-In | High Scalability
http://highscalability.com/how-google-taught-me-cache-and-cash

A user named Apathy in this thread on how Reddit scales some of their features, shares some advice he learned while working at Google and other major companies. To be fair, I [Apathy] was working at Google at the time, and every job I held between 1995 and 2005 involved at least one of the largest websites on the planet. I didn't come up with any of these ideas, just watched other smart people I worked with who knew what they were doing and found (or wrote) tools that did the same things. But the theme is always the same: # Cache everything you can and store the rest in some sort of database (not necessarily relational and not necessarily centralized). How do you go about applying this strategy?

ing caches is a clasisc strategy for milking your servers as much as possilbe. First look for an exact match. If that's not foun

PostgreSQL Tips and Tricks | gtuhl: startup technology
http://blog.gtuhl.com/2009/08/07/postgresql-tips-and-tricks/

Here’s a dozen tips for working with a PostgreSQL database. It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help.

Here’s a dozen tips for working with a PostgreSQL database. It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help. If you want more detail read the amazing documention. My list of tips was very long so I just chopped off a dozen for this post.

Sql Antipatterns Strike Back
http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed

Great presentation about good SQL practice.

interesting but it's a slideshow rather than an article

Factual
http://www.factual.com/

Service for basically creating shared databases: sounds quite interesting!

Factual is a platform where anyone can share and mash open data on any subject. For example, you might find a directory of California restaurants, a database of endocrinologists, or a list of American Idol finalists. We provide smart tools to help the community build and maintain a trusted source of structured data. And this data can be used through widgets and APIs to help application developers and content publishers be more innovative and productive.

open data edit

On line data collections

Data!

msarnoff.org ChipDB - integrated circuit quick reference
http://www.msarnoff.org/chipdb/

Online reference for integrated circuits. Very quick reference (as opposed to loading up the 30 MB PDF which is a scan of photocopied document that look liked it was stored in somebody's back pocket).

The End of a DBMS Era (Might be Upon Us) | blog@CACM | Communications of the ACM
http://cacm.acm.org/blogs/blog-cacm/32212-the-end-of-a-dbms-era-might-be-upon-us/fulltext

"Relational database management systems (DBMSs) have been remarkably successful in capturing the DBMS marketplace. To a first approximation they are “the only game in town,” and the major vendors (IBM, Oracle, and Microsoft) enjoy an overwhelming market share. They are selling “one size fits all”; i.e., a single relational engine appropriate for all DBMS needs. Moreover, the code line from all of the major vendors is quite elderly, in all cases dating from the 1980s. Hence, the major vendors sell software that is a quarter century old, and has been extended and morphed to meet today’s needs. In my opinion, these legacy systems are at the end of their useful life. They deserve to be sent to the “home for tired software.” Here’s why."

Google Public Sector
http://www.google.com/publicsector/

Tools for Public Sector

one-stop shop of tips and tools for the public sector from Google

Most people reach government and other public sector websites by using Google and other search engines. This site is a guide to the tools and best practices that can help you reach, communicate and engage with your community. Most of these tools are free, so they can also help you do more with less.

Google: Tools for Public Sector Organizations. Make your agency website, and the information it offers, easier to find.

Internet Archive: A Future for Books -- BookServer
http://www.archive.org/bookserver/

Referenced in Chronicle Wired

The widespread success of digital reading devices has proven that the world is ready to read books on screens. As the audience for digital books grows, we can evolve from an environment of single devices connected to single sources into a distributed system where readers can find books from sources across the Web to read on whatever device they have. Publishers are creating digital versions of their popular books, and the library community is creating digital archives of their printed collections. BookServer is an open system to find, buy, or borrow these books, just like we use an open system to find Web sites.

The BookServer is a growing open architecture for vending and lending digital books over the Internet. Built on open catalog and open book formats, the BookServer model allows a wide network of publishers, booksellers, libraries, and even authors to make their catalogs of books available directly to readers through their laptops, phones, netbooks, or dedicated reading devices. BookServer facilitates pay transactions, borrowing books from libraries, and downloading free, publicly accessible books.

This would be awesome to install on all of the school servers as part of plan ceibal.

NoSQL: Distributed and Scalable Non-Relational Database Systems | Linux Magazine
http://www.linux-mag.com/cache/7579/1.html

From @jesserobbins

Non-SQL oriented distributed databases are all the rage in some circles. They’re designed to scale from day 1 and offer reliability in the face of failures.

NoSQL: Distributed and Scalable Non-Relational Database Systems

l

Business Information and News: Track, Connect and Share - Tracked.com
http://www.tracked.com/

Today, we are proud to launch Tracked.com, a new kind of business service. Tracked.com is the only website in the world where business information, communications and connections come together to enhance your business life.

By http://bit.ly/Tweets2Delicious

Cassandra and Ruby: A Love Affair? | Engine Yard Blog
http://www.engineyard.com/blog/2009/cassandra-and-ruby-a-love-affair/

"Most of today’s up and coming key-value stores are more than just simple key-value stores. You saw this when we looked at Tokyo Cabinet which, in addition to simple key-value capabilities, adds more sophisticated abilities, such as database-like tables. In this post we’ll look at Cassandra — a modern key-value store that continues this trend. Cassandra was originally developed by Facebook and released to open source last year. The Facebook team describes Cassandra as (Google) BigTable running on top of an Amazon Dynamo-like infrastructure."

Most of today's and up and coming key-value stores are more than just simple key-value stores. Cassandra is a modern key-value store that continues this trend.

Why I like Redis
http://simonwillison.net/2009/Oct/22/redis/

Like mongodb but lives in memory with replication and periodic store-to-disk. Like memcached but with data structures. Great for non-critical data or replicated critical data.

Facebook | Engineering @ Facebook's Notes
http://www.facebook.com/note.php?note_id=89508453919
OpenSecrets | OpenSecrets.org Goes OpenData - Capital Eye
http://www.opensecrets.org/news/2009/04/opensecretsorg-goes-opendata.html

Portal que intenta hacer pública información sobre los secretos de Washington DC

RT @cshirky: RT THIS is a big deal. OpenSecrets.org releases 200 million [gov't] data records. Today. http://bit.ly/fdXS [from http://twitter.com/danielgillval/statuses/1512025784]

Measuring Link-Bait of Articles I have flagged in the past.

OpenSecrets.org opens up its data -- feel free to mashup information on campaign finacnce, lobbying, personal finances and much more

Amazon Relational Database Service (Amazon RDS)
http://aws.amazon.com/rds/

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS gives you access to the full capabilities of a familiar MySQL database. This means the code, applications, and tools you already use today with your existing MySQL databases work seamlessly with Amazon RDS. Amazon RDS automatically patches the database software and backs up your database, storing the backups for a user-defined retention period. You also benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your relational database instance via a single API call. As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use.

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.MySQL

Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS gives you access to the full capabilities of a familiar MySQL database.

MongoDB: A Light in the Darkness! (Key Value Stores Part 5) | Engine Yard Blog
http://www.engineyard.com/blog/2009/mongodb-a-light-in-the-darkness-key-value-stores-part-5/

Really interesting article about mongoDB and about the installation procedure

"MongoDB can be thought of as the goodness that erupts when a traditional key-value store collides with a relational database management system, mixing their essences into something that’s not quite either, but rather something novel and fascinating. -- MongoDB support is available in many languages, making it a good choice for a system that has to work in a polyglot environment; all of the major languages have support."

レプリケーションしてるMySQLで、マスタやスレーブが障害停止した場合のリカバリプラン - (ひ)メモ
http://d.hatena.ne.jp/hirose31/20091023/1256259405

mysqlのレプリケーションを用いた構成について書いたblog記事。まだきちんと読んでいない。フェールオーバー時のデータの完全性を保証しない構成。想定する運用フローが書いてあるので、自分で考えるときの参考にする

data.australia.gov.au – beta
http://data.australia.gov.au/

data.australia.gov.au is the home of Australian government public information datasets. We encourage you to make government information even more useful by mashing-up the data to create something new and exciting! Make sure you pay attention to the licence attached to the datasets you are interested in using.

data.australia.gov.au is the home of Australian government public information datasets. Like Data.gov, it has a wide variety of downloadable government data on topics such as crime, weather, and public lands--as well as some very Australian topics, such as the location and attributes of barbecues on public lands.

the home of Australian government public information datasets. We encourage you to make government information even more useful by mashing-up the data to create something new and exciting! Make sure you pay attention to the licence attached to the datasets you are interested in using. Each licence should make clear what you can and can’t do with the data. If you’re unsure, please contact the contributing agency.

data.australia.gov.au is the home of Australian government public information datasets. We encourage you to make government information even more useful by mashing-up the data to create something new and exciting! Make sure you pay attention to the licence attached to the datasets you are interested in using. Each licence should make clear what you can and can’t do with the data. If you’re unsure, please contact the contributing agency.

Open database life: MyISAMとInnoDBのどちらを使うべきか
http://opendatabaselife.blogspot.com/2009/10/myisaminnodb.html

MySQLのInnnoDB、MyISAMの利点・欠点がまとめられている。参考になります。

自分は、特別な事情が無い限り、5.1最新版に含まれるInnoDB Pluginを勧めています。

fault-tolerance.png (PNG Image, 784x393 pixels)
http://browsertoolkit.com/fault-tolerance.png

(PNG-Grafik, 784x393 Pixel)

All I want for Christmas is a SQL database with no JOINs, secondary indexes, UNIONs, views, character sets or anything else. Just exact and range primary key lookups, GROUP BY, ORDER BY, LIMIT and SQL_CALC_FOUND_ROWS.

AMIS Technology blog » Blog Archive » Oracle RDBMS 11gR2 - Solving a Sudoku using Recursive Subquery Factoring
http://technology.amis.nl/blog/6404/oracle-rdbms-11gr2-solving-a-sudoku-using-recursive-subquery-factoring

Solving a Sudoku using Recursive Subquery Factoring

OracleのSQL再起処理で数独を解く

AMIS Technology blog » Blog Archive » Oracle RDBMS 11gR2 - Solving a Sudoku using Recursive Subquery Factoring

Dare Obasanjo aka Carnage4Life - Facebook Seattle Engineering Road Show: Mike Shroepfer on Engineering at Scale at Facebook
http://www.25hoursaday.com/weblog/2009/10/29/FacebookSeattleEngineeringRoadShowMikeShroepferOnEngineeringAtScaleAtFacebook.aspx

Article summarizing presentation by Facebook on some of their scaling challenges and solutions.

Rackspace Cloud Computing & Hosting | NoSQL Ecosystem
http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosystem/

Good introduction to the "NoSQL" space (initially not a fan of the term, but I guess it is going to stick...), highlighting the different designs used by the options in the space, and the benefits/drawbacks of those designs.

Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as “NoSQL databases.”

How to Secure Your New WordPress Installation | Digging into WordPress
http://digwp.com/2009/11/how-to-secure-your-new-wordpress-installation/

One of the best ways to ensure strong security for your WordPress-powered site is to secure its foundations during the installation process. Of course these techniques can be implemented at any point during the life of your site, but stetting them before the game starts prevents headaches and saves time. We’ll start with the WordPress database.

One of the best ways to ensure strong security for your WordPress-powered site is to secure its foundations during the installation process. Of course these techniques can be implemented at any point during the life of your site, but stetting them before the game starts prevents headaches and saves time. We’ll start with the WordPress database

ne of the best ways to ensure strong security for your WordPress-powered site is to secure its foundations during the installation process. Of course these techniques can be implemented at any point during the life of your site, but stetting them before the game starts prevents headaches and saves time. We’ll start with the WordPress database..

SQL Databases Don't Scale
http://adam.blog.heroku.com/past/2009/7/6/sql_databases_dont_scale/

"Sharding kills most of the value of a relational database."

sql database db

(特にMyISAMを使っていた)ウェブ屋さんがInnoDBを使う場合の設定項目 - kazuhoのメモ置き場
http://d.hatena.ne.jp/kazuhooku/20091029/1256775791

innodbの書き込み速度を上げるtipsと、その他設定について

sudo hdparm -W 0 /dev/sda

Jonathan Ellis's Programming Blog - Spyced: CouchDB: not drinking the kool-aid
http://spyced.blogspot.com/2008/12/couchdb-not-drinking-kool-aid.html

Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages. It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.

pskomoroch's dataset Bookmarks on Delicious
http://delicious.com/pskomoroch/dataset

Resource list of public datasets

Jet Profiler for MySQL
http://www.jetprofiler.com/

Is real-time query performance and diagnostics tool for the MySQL database server.

Java desktop graphical MySQLprofiler. Free version.

Real-time query performance and diagnostics tool for the MySQL database server.

Fixing Poor MySQL Default Configuration Values (by Jeremy Zawodny)
http://jeremy.zawodny.com/blog/archives/011421.html

4 tips buenos para mejorar el desempeño de MySQL.

MySQL configuration variables that have defaults which have proven to be problematic in a high-volume production environment

データベースの基礎を理解しよう！　プログラミング未経験から始めるPHP入門：CodeZine
http://codezine.jp/article/detail/3685
Couchdbkit - Welcome to the Couchdbkit project
http://couchdbkit.org/
Copyright Watch | Global Transparency in Copyright Law
http://www.copyright-watch.org/

Copyright Watch, hosted by the Electronic Frontier Foundation, is designed for the purposes of sharing and comparing the copyright laws of countries around the world. As the world has become connected through the Internet the creation and global sharing of content has become very easy. At the same time the misuse of copyrighted content has become easier too. Sometimes copyright violations may be the result of conflicting copyright laws. Copyright Watch aims to provide a place where copyright laws can be compared and changes to copyright laws can be updated. Applications for Education Copyright Watch could be useful for teaching about the differences between copyright laws. Copyright Watch might also be useful as a part of a discussion about the purpose of copyright laws.

Copyright Watch collects and monitors copyright laws from all over the world.

Global Transparency in Copyright Law. "Copyright Watch was begun by an international group of copyright experts, drawn from the Access to Knowledge community. We’d like to thank Corporacion Innovarte, the Electronic Frontier Foundation, Electronic Information for Libraries (eIFL.net), the International Federation of Library Associations, Professor Michael Geist, the Third World Network, and the Bangalore Centre for Internet and Society for their support."

Fabulous website to checkout if you are unsure of what copyright laws exist in which countries?

実録、ほぼ無停止なMySQLのフェイルオーバ (動画もあるよ) - (ひ)メモ
http://d.hatena.ne.jp/hirose31/20091111/1257942168

keepalived --vrrp で、マルチマスターフェイルオーバーする

Under the Covers of the Google App Engine Datastore ‎(2008 Google I/O Session Videos and Slides)‎
http://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore

Presentation on how googleapps datastore implements filtering and sorting on top of bigtable. Basically, all queries are translated to bigtable prefix scans or range scans, without needing any in-memory postprocessing, all rows returned from the scan are relevant to, and in order, for the query. There's a built-in 'single property index' (or two actually: one asc and one desc) which can obviously be used for single-property searches, but also for queries consisting of only equals clauses, by doing multiple range scans and taking the intersection (not sure at which level this happens). More complex queries need specific pre-defined indexes. Index tables only have keys, no columns with values. Indexes are updated synchronously, so everything stays consistent (at the cost of contention problems?). Some mention of string-byte considerations when doing range queries. No fulltext queries. Ends with some talk on transactions.

EXPLAIN EXTENDED: efficient database queries in SQL.
http://explainextended.com/
In the Woods - A Closer Look at SQL Joins
http://blog.themeforest.net/tutorials/a-closer-look-at-sql-joins/

JOINs em Mysql

Lawnchair
http://brianleroux.github.com/lawnchair/

A client side JSON document store. Want to see this in Node.JS.

Sorta like a couch except smaller and outside, also, a client side JSON document store. Perfect for webkit mobile apps that need a lightweight, simple and elegant persistence solution.

"Sorta like a couch except smaller and outside, also, a client side JSON document store. Perfect for webkit mobile apps that need a lightweight, simple and elegant persistence solution."

Top 20+ MySQL Best Practices - Nettuts+
http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/

5. Index and Use Same Column Types for Joins

** Posted using Viigo: Mobile RSS, Sports, Current Events and more **

20 Best practices

Pragmatic Programming Techniques: NOSQL Patterns
http://horicky.blogspot.com/2009/11/nosql-patterns.html

A nice overview of some of the more popular patterns in NoSQL architecture

Performance, Scalability and Architecture - Java and .NET Application Performance Management (dynaTrace Blog) » Understanding Caching in Hibernate - Part Three : The Second Level Cache
http://blog.dynatrace.com/2009/03/24/understanding-caching-in-hibernate-part-three-the-second-level-cache/

Understanding Caching in Hibernate – Part Three : The Second Level Cache Performance, Scalability and Architecture – Java and .NET Application Performance Management (dynaTrace Blog)

In particular I read a whitepaper several years ago a

In the last posts I already covered the session cache as well as the query cache. In this post I will focus on the second-level cache. The Hibernate Documentation provides a good entry point reading on the second-level cache. The key characteristi

Top 20+ MySQL Best Practices - Nettuts+
http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+nettuts+%28NETTUTS%29

Database operations often tend to be the main bottleneck for most web applications today. It's not only the DBA's (database administrators) that have to worry about these performance issues. We as programmers need to do our part by structuring tables properly, writing optimized queries and better code. Here are some MySQL optimization techniques for programmers.

» Scalable Web Applications Programming the new world: Programming your life and the net, one day at a time
http://blog.nickbelhomme.com/php/scalable-web-applications_158
窓の杜 - 【REVIEW】肥大化した「Firefox」の内部データベースをボタン一発で最適化「Vacuum Places」
http://www.forest.impress.co.jp/docs/review/20090824_310430.html
mysql と drizzle の負荷テストツール「skyload」が凄い！ - id:kazuhookuのメモ置き場
http://d.hatena.ne.jp/kazuhooku/20090707/1246950315

MySQLの負荷テストに便利らしい。

Natural Earth
http://www.naturalearthdata.com/

Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.

Introducing Redis: a fast key-value database | Zen and the Art of Programming
http://antoniocangiano.com/2009/03/11/introducing-redis-a-key-value-database/
Extending Tokyo Cabinet DB with Lua - igvita.com
http://www.igvita.com/2009/07/13/extending-tokyo-cabinet-db-with-lua/

Tokyo Cabinet is a trove of hidden of gems, the more you learn about it, the more you will appreciate the design and technical decisions behind it. By database standards it is a young project (started in 2007), but since it is a successor to the QDBM project developed by Hirabayashi-san (2000-2007), we could make the argument that it has been, in fact, nine years in the making.

漢(オトコ)のコンピュータ道: MySQLレプリケーションを安全に利用するための10のテクニック
http://nippondanji.blogspot.com/2009/03/mysql10.html

2年に1回起きる可能性があるわけで、確かに何か対策した方がよいですね。しかしケーブルや機材を取り替えて推奨値以上の環境を用意すれば2000万年に1回。10GbEthernetでも200万年に1回の確率ですね・・・それでも対策しないよりも対策した方がよいですが・・・ Okunoさんはバイナリログがネットワークによって化けた経験がおありなんですよね？

"1. マルチマスターレプリケーションを利用しない非常によくある誤解なのだが、HAにしたいからといってマルチマスター構成にしているユーザをたまに見かける。マルチマスターとは２台のMySQLサーバで構成するトポロジのことで、２つのサーバが互いに相手のマスターかつスレーブとなりレプリケーションを行う。マルチマスターは両方のホストで更新が可能なのだが、片方のサーバ行われた更新は非同期でもう一方のサーバへ適用されるため、更新を行っている方のサーバがクラッシュした場合には更新が失われる可能性がある。"

assertTrue( ): NoSQL Required Reading
http://asserttrue.blogspot.com/2009/12/nosql-required-reading.html

Starting from Dynamo, ending with (roughly) follow @nosqlupdate on Twitter.

Materials that you need to read in order to get started with NoSQL

List of resources to read to get up-to-speed on the NoSQL movement.

Harish Mallipeddi's Blog - CouchDB naked
http://blog.poundbang.in/post/132952897/couchdb-naked

Good explanation of how CouchDB indexes.

how couchdb b-trees work internally

Why I think Mongo is to Databases what Rails was to Frameworks // RailsTips by John Nunemaker
http://railstips.org/2009/12/18/why-i-think-mongo-is-to-databases-what-rails-was-to-frameworks

Below are 7 Mongo and MongoMapper related features that I have found to be really awesome while working on switching Harmony, a new website management system by my company, Ordered List, to Mongo from MySQL.

The more I work with Mongo the more I am coming around to this way of thinking. I tell no lie when I say that I now approach Mongo with the same kind of excitement I first felt using Rails. For some, that may be enough, but for others, you probably require more than a feeling to check out a new technology

Top 20+ MySQL Best Practices | TuVinhSoft .,JSC
http://blog.tuvinh.com/top-20-mysql-best-practices/
Free Tools for the SQL Server DBA Part 2 - SQL Server Central
http://www.sqlservercentral.com/articles/Tools/64908
MySQL Showdown: Querious vs. Sequel Pro - TheAppleBlog
http://theappleblog.com/2009/02/27/mysql-showdown-querious-vs-sequel-pro/
Alex Miller - Hibernate query cache considered harmful?
http://tech.puredanger.com/2009/07/10/hibernate-query-cache/

As

Alex Miller's technical blog on Java, concurrency, programming, design, languages, and more

Hibernate et la gestion du cache

NoSQL with MySQL in Ruby - Friendly
http://friendlyorm.com/
Google: "We're Not Doing a Good Job with Structured Data" - ReadWriteWeb
http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php

That's something that's a bit troublesome - if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Enabling a Google-like search from structured sources (databases)

Google and Yahoo approaching structued Web

Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Yahoo!オークションでのMySQL 冗長化技術 (Yahoo! JAPAN Tech Blog)
http://techblog.yahoo.co.jp/cat207/db/mysql_failover/

よくある構成っちゃ、よくある構成。MySQL ClusterとかMySQL Proxyとかの完成度があがっていくとこの部分の構成がもっとシンプルかつ強いシステムになっていくのかも

「dual master + ソフトウエアロードバランサ + Nagios」「masterサーバーに障害があった場合には、(Nagios のイベントハンドラで)それに直結する slave のヘルスチェック用ポートを閉じる」

事例

dual master, ソフトウェアロードバランサ, 仮想DNS, gethostbyname書き換え, masterが片方落ちたら直下のslaveもnagiosのイベントハンドラで落として一貫性保つ

Data Sets | GroupLens Research
http://www.grouplens.org/taxonomy/term/14
もう1つの、DBのかたち、分散Key-Valueストアとは (1/3) - ＠IT
http://www.atmarkit.co.jp/fjava/rensai4/bigtable01/01.html

キーバリューストアの解説「CAP定理」では、分散システムで以下の3つを同時に保証することは不可能であることが示されています。 * データの整合性（Consistency） * データの可用性（Availability） * データの分散化（Partition-tolerance）

>RDBとは別の、クラウド時代のデータベースとして注目を浴びている「分散Key-Valueストア」。その本命ともいえる、Googleの数々のサービスの基盤技術「Bigtable」について徹底解説どうかなあ…

Bigtable, SimpleDB, Tokyo Tyrant

Sql Antipatterns Strike Back
http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back

"Common blunders of SQL database design, queries, and software development. Presented as a tutorial at the MySQL Conference & Expo 2009."

none

Trees In The Database - Advanced data structures
http://www.slideshare.net/quipo/trees-in-the-database-advanced-data-structures?type=presentation

A presentation about modelling trees relationally and storing them in an SQL database.

Storing tree structures in a bi-dimensional table has always been problematic. The simplest tree models are usually quit

trees in database

パブリックドメイン・クラシック
http://public-domain-archive.com/classic/

パブリックドメイン・クラシック

Nice! できるなら英訳してくれ。

Vineet Gupta: NoSql Databases – Part 1 - Landscape
http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html

At Directi, we are taking a hard look at the way our applications need to store and retrieve data, and whether we really need to use a traditional RDBMS for all scenarios. This does not mean that we will eschew relational systems altogether. What it means is that we will use the best tool for the job – we will use non-relational options wherever needed and not throw everything at a relational database with a mindless one-size-fits-all approach. ... ... This post covers the current landscape of the NoSQL space. In a subsequent post, I intend to cover in more detail the various problem areas addressed by NoSQL systems and the specific algorithms used.

Really detailed description of a number of NoSQL solutions. Interesting reading on Cassandra and Voldemort.

This post covers the current landscape of the NoSQL space. In a subsequent post, I intend to cover in more detail the various problem areas addressed by NoSQL systems and the specific algorithms used.

The Scale-Out Blog: Simple HA with PostgreSQL Point-In-Time Recovery
http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html

O mica documentatie cum sa faci un warm standby server PostgreSQL pentru HA ca sa replici o baza de date

FleetDB
http://fleetdb.org/

FleetDB is a schema-free database optimized for agile development.

Consensus Protocols: Two-Phase Commit at Paper Trail
http://hnr.dnsalias.net/wordpress/?p=90

Nice article on 2pc

terrastore - Project Hosting on Google Code
http://code.google.com/p/terrastore/
ウノウラボ Unoh Labs: RDBで階層構造を扱うには？
http://labs.unoh.net/2009/06/rdb.html

how to store tree structure on RDB

階層構造

SQL for Beginners - Nettuts+
http://net.tutsplus.com/tutorials/other/sql-for-beginners/

*****

BrowserCouch Documentation
http://hg.toolness.com/browser-couch/raw-file/blog-post/index.html

BrowserCouch is an attempt at an in-browser MapReduce implementation.

BrowserCouch is an attempt at an in-browser MapReduce implementation. It's written entirely in JavaScript and intended to work on all browsers, gracefully upgrading when support for better efficiency or feature set is detected.Not coincidentally, this library is intended to mimic the functionality of CouchDB on the client-side, and may even support integration with CouchDB in the futur

"BrowserCouch is an attempt at an in-browser MapReduce implementation. It's written entirely in JavaScript and intended to work on all browsers, gracefully upgrading when support for better efficiency or feature set is detected. Not coincidentally, this library is intended to mimic the functionality of CouchDB on the client-side, and may even support integration with CouchDB in the future."

MySQL utility commands
http://www.electrictoolbox.com/mysql-utility-commands/
PHP Design - Biggest Database Oversights - Justin Carmony's Blog
http://www.justincarmony.com/blog/2008/10/25/php-design-biggest-database-oversights/

One in particular now has grown out of hand so bad that we've decided to start from scratch for a whole new version. Why? Lets say you have 3000+ php files, and your boss says "Hrm, we're seeing some problems with performance. Can you display at the bottom of each page the # of queries you use on that page?" If you coded your entire project like the example above, you would be totally screwed. You would have to find each and every mysql_query() and add some counter at the end. It would be a managing Nightmare. So how cold you solve this problem?

データセンターが「落ちる」ことを想定したグーグルのアーキテクチャ－ Blog on Publickey
http://www.publickey.jp/blog/09/post_46.html

あとで

DB設計の神ツール「ERMaster」なら、ここまでできる (1/3) - ＠IT
http://www.atmarkit.co.jp/fjava/rensai4/devtool11/devtool11_1.html

ERMasterは、ほかのツールに比べ、直感的で分かりやすいUI（ユーザーインターフェイス）に、カスタマイズ可能な、Excelで出力できるテーブル定義書、辞書機能など痒いところに手が届くERモデリングのツールです。本稿では、このERMasterについてご紹介します。

ERMasterは、ほかのツールに比べ、直感的で分かりやすいUI（ユーザーインターフェイス）に、カスタマイズ可能な、Excelで出力できるテーブル定義書、辞書機能など痒いところに手が届くERモデリングのツールです

いくつかの無料で利用できるツールが提供されているので、筆者はそれらを利用していましたが、最近「ERMaster」と呼ばれるEclipseプラグインの存在を知りました。　ERMasterは、ほかのツールに比べ、直感的で分かりやすいUI（ユーザーインターフェイス）に、カスタマイズ可能な、Excelで出力できるテーブル定義書、辞書機能など痒いところに手が届くERモデリングのツールです。本稿では、このERMasterについてご紹介します。

World Government Data | guardian.co.uk
http://www.guardian.co.uk/world-government-data

Tehgrauniad's search engine for government data sets.

more info : http://www.guardian.co.uk/news/datablog/2010/jan/07/government-data-world

The one-stop shop for World Government datasets from The Guardian.

Buscador de datos gubernamentales mundiales de The Guardian

Governments around the globe are opening up their data vaults – allowing you to check out the numbers for yourself. This is the Guardian’s gateway to that information. Search for government data here from the UK (including London), USA, Australia and New Zealand – and look out for new countries and places as we add them.

Collections Search Center, Smithsonian Institution
http://collections.si.edu/search/

Launched in Jan. 2010 SI is a new collections search center that contains more than 2 million searchable records and 265,900 resrouces (including images, videos, sound files, and electronic journals) from the Smithsonian's libraries, archives, and museums.

Search over 2 million records with 265,900 images, video and sound files, electronic journals and other resources from the Smithsonian's museums, archives

Federated search for the Smithsonian's museums archives, and libraries for images, video, sound files, and electronic journals

SIRIS - Smithsonian Institution Research Information System

recommended by sla

James on Software | Introducing Friendly: NoSQL With MySQL in Ruby
http://jamesgolick.com/2009/12/16/introducing-friendly-nosql-with-mysql-in-ruby.html

I've been a big proponent of NoSQL for a while. I have played with just about all of the new generation of data stores. We almost got cassandra running in production once, and we've been running mongodb in production for about six months now. But, here's the thing: as awesome as these new dbs are, they're still young. Our app generates a ton of data and gets pretty serious traffic. So, we started hitting walls quickly. To make a long story short, we decided to fall back to MySQL. It's battle hardened. We know its production characteristics and limitations. Backups are a science. We know we can count on it. But, we have a lot of data, and adding fields and indexes was starting to get painful. Flexible schemas are one of the things that attracted me to NoSQL in the first place. Then, I remembered this article about How FriendFeed uses MySQL to store schema-less data. So, I decided to implement the system they describe in the article. Since we put Friendly in to production, we've seen

Friendly makes MySQL look like a document store. When you save an object, it seralizes all of its attributes to JSON and stores them in a single field. To query your data, Friendly creates and maintains indexes in separate tables. It even has write-through and read-through caching built right in.

Introducing Friendly: NoSQL With MySQL in Ruby Dec 16 2009 I've been a big proponent of NoSQL for a while. I have played with just about all of the new generation of data stores. We almost got cassandra running in production once, and we've been running mongodb in production for about six months now.

Unlocking innovation | data.gov.uk
http://data.gov.uk/home

"Advised by Sir Tim Berners-Lee and Professor Nigel Shadbolt and others, government are opening up data for reuse. This site seeks to give a way into the wealth of government data and is under constant development. We want to work with you to make it better. We’re very aware that there are more people like you outside of government who have the skills and abilities to make wonderful things out of public data. These are our first steps in building a collaborative relationship with you.[...]"

ça y est ! le site open data UK est public !

Advised by Sir Tim Berners-Lee and Professor Nigel Shadbolt and others, government are opening up data for reuse. This site seeks to give a way into the wealth of government data and is under constant development. We want to work with you to make it better. We’re very aware that there are more people like you outside of government who have the skills and abilities to make wonderful things out of public data. These are our first steps in building a collaborative relationship with you.

9 Tips For Working with MySQL Databases » DevSnippets
http://devsnippets.com/article/9-tips-for-working-with-mysql-databases.html
Less Than Dot - Blog - The Ten Most Asked SQL Server Questions And Their Answers
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/the-ten-most-asked-sql-server-questions--1

Getting all rows from one table and only the latest from the child table

1) Selecting all the values from a table for a particular date 2) Search all columns in all the tables in a database for a specific value 3) Splitting string values 4) Select all rows from one table that don't exist in another table 5) Getting all rows from one table and only the latest from the child table 6) Getting all characters until a specific character 7) Return all rows with NULL values in a column 8) Row values to column (PIVOT) 9) Pad or remove leading zeroes from numbers 10) Concatenate Values From Multiple Rows Into One Column

分散Key-Valueストア「kumofs」を公開しました！ - 古橋貞之の日記
http://d.hatena.ne.jp/viver/20100118/p1
NHibernate Unit Testing
http://ayende.com/Blog/archive/2009/04/28/nhibernate-unit-testing.aspx

How to unit test NHibernate code using an in memory SQL Lite database.

: base(typeof(Blog).Assembly)

Unit Testing NHibernate from Ayende

5 useful PHP functions for MySQL data fetching - AnyExample.com
http://www.anyexample.com/programming/php/5_useful_php_functions_for_mysql_data_fetching.xml

Muy, interesante, recomendaciones para recuperar información de mysql

13 Useful WordPress SQL Queries You Wish You Knew Earlier | Onextrapixel - Showcasing Web Treats Without Hitch
http://www.onextrapixel.com/2010/01/30/13-useful-wordpress-sql-queries-you-wish-you-knew-earlier/

WordPress is driven by a MySQL database. This is something active WordPress users would know. However, if you only just read about it here from us, here’s what you should know. MySQL is a free relational database management system

Bytepawn - Scalable Web Architectures and Application State
http://bytepawn.com/2009/06/17/scalable-web-architectures-and-application-state/

Note about Code-State-Cache-Data (CSCD) pattern in scalable web applications.

Short Article propounding the use of a "Code-State-Cache-Data-Architecture" (CSCD) instead of just CD or CCD applications. Basically saying that you should forget about stateful apps if you wan't maximum performance...

Application state - Data you can restore from the database or afford to lose if server is restarted (logged in users). He recommends storing this in-memory. "Application state goes into an in-memory key-value store like Tokyo Tyrant. Cache data goes into Memcached. Persistent data goes into a database"

"What he needs is the insight to identify state, cached data and persistent data in his application. Application state goes into an in-memory key-value store like Tokyo Tyrant. Cache data goes into Memcached. Persistent data goes into a database. Note that the seperation of code and application state may be beneficial later, because it allows you to scale easily by adding new memory servers. ... Let's call this the Code-State-Cache-Data (CSCD) pattern. What Damian originally had was a Code-Data (CD) pattern, and later he optimized to get a Code-Cache-Data (CCD) pattern"

Four ways to optimize paginated displays | MySQL Performance Blog
http://www.mysqlperformanceblog.com/2008/09/24/four-ways-to-optimize-paginated-displays/

A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples. You know the design pattern: display 20 results in some most-relevant order. Show a "next" and "previous" link. And usually, show how many items are in the whole list and how many pages of results there are. Rendering such a display can consume more resources than the entire rest of the site! As an example, I'm looking at slow log analysis results (with our microslow patches, set to log all queries) for one client; the slow log contains 6300 seconds' worth of queries, and the two main queries for the paginated display consumed 2850 and 380 seconds, respectively.

Rendering such a display can consume more resources than the entire rest of the site!

A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples.

Tuning MySQL Performance with MySQLTuner | HowtoForge - Linux Howtos and Tutorials
http://www.howtoforge.com/tuning-mysql-performance-with-mysqltuner

Perl script for reporting back on your MySQL config.

Debugging and Tuning MySQL performance

Handy script to gather suggestions on mysql tuning

Bulletproof backups for MySQL | Carsonified
http://carsonified.com/blog/dev/bulletproof-backups-for-mysql/

Great comment on using XFS and snapshots to reduce downtime.

python-sqlparse - Google Code
http://code.google.com/p/python-sqlparse/

sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.

Support for parsing, splitting and formatting SQL statements.

World Cinema Foundation
http://worldcinemafoundation.net/

Watch restored films online. Clips available directly, but by the look of it, you have to register to see the whole movie

The World Cinema Foundation is a natural expansion of my love for movies. Seventeen years ago, together with my fellow filmmakers, we created The Film Foundation to help preserve American cinema. Much has been accomplished and much work remains to be done, but The Film Foundation has created a base upon which we can build. There is now, I believe, a film preservation consciousness.

Mr. Moore gets to punt on sharding - (37signals)
http://www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding

I guess the conclusion is that there’s no use in preempting the technological progress of tomorrow. Machines will get faster and cheaper all the time, but you’ll still only have the same limited programming resources that you had yesterday. If you can spend them on adding stuff that users care about instead of prematurely optimizing for the future, you stand a better chance of being in business when that tomorrow finally rolls around.

From 37signals

VerkkoStadi Technologies is looking for a Hardcore PHP Developer. See more on the Job Board.

データベースを用いたセッションデータ管理について - Slow Dance
http://d.hatena.ne.jp/LukeSilvia/20090523/p1

InnoDBは行ロックに対応。6000万レコード。。。

Web アプリケーションとは切っても切れないセッション機構。DB ベースでセッション管理を行なって得られた知見と、それを元に考察した結果をまとめてみます。

25+ Alternative & Open Source Database Engines
http://www.webresourcesdepot.com/25-alternative-open-source-databases-engines/

25+ Alternative & Open Source Database Engines

RT @tweetlicius: 25+ Alternative & Open Source Database Engines - http://bit.ly/cRDaOW

Free Web Resources Everyday - WebResourcesDepot

Common Queries Tree
http://www.artfulsoftware.com/infotree/queries.php?&bw=1280

Common MySQL Queries

Common MySQL Queries (Extending Chapter 9 of Get it Done with MySQL 5&6)

Common Queries,Common MySQL Queries,Common SQL Queries

JSINQ - LINQ to Objects for JavaScript - Home
http://www.codeplex.com/jsinq

There are side-benefits to immersing yourself in MS-land. Like finding really solid data manipulation libraries written in javascript.

JSINQ is a complete implementation of LINQ to Objects for JavaScript. It allows you to write SQL-like queries against arrays, DOM node lists or your own custom enumerable types.

JSINQ is the JavaScript library that allows you to write SQL-like queries against arrays and DOM node lists. JSINQ is a complete implementation of LINQ to Objects in JavaScript. What that means is that if you know LINQ and you know JavaScript, you know JSINQ. JSINQ is both an API-compatible implementation of System.Linq.Enumerable and a complete query-expression compiler. That's right: you can write LINQ-style queries in JavaScript. And if that isn't enough: JSINQ is also very liberally licensed, well-document, reasonably well-tested (the Enumerable-part) and currently in beta. So give it a go!

NYTimes Exposes 2.8 Million Articles in New API - ReadWriteWeb
http://www.readwriteweb.com/archives/nytimes_exposes_huge_api.php

The New York Times did just that this afternoon when it announced that it has released a new Application Programming Interface (API) offering every article the paper has written since 1981, 2.8 million articles. The API includes 28 searchable fields and updated content every hour.

The New York Times announced that it has released a new Application Programming Interface (API) offering every article the paper has written since 1981, 2.8 million articles. The API includes 28 searchable fields and updated content every hour.

paperplanes. A Collection Of Redis Use Cases
http://www.paperplanes.de/2010/2/16/a_collection_of_redis_use_cases.html

Almaz

Redis' particular way of treating data requires some rethinking how to store your data to benefit from speed, atomicity and its data types. I've already written about Redis in abundance, this post's purpose is to compliment them with real-world scenarios. Maybe you can gather some ideas on how to deal with things.

Weil Redis praktisch ist.

WordPress : 10+ life saving SQL queries
http://www.catswhocode.com/blog/wordpress-10-life-saving-sql-queries

les SQL de base pour gérer son blog sur wordpress

Même si il ya beaucoup de choses que vous pouvez faire dans WordPress, parfois vous avez besoin d'une solution rapide pour corriger un problème spécifique. Dans ces cas, travailler directement sur la base de données peut être salvateur. Voici donc 10 requêtes SQL extrêmement utiles pour WordPress.

A Comparison of Approaches to Large-Scale Data Analysis - MapReduce vs. DBMS Benchmarks
http://database.cs.brown.edu/sigmod09/

"The following information is meant to provide documentation on how others can recreate the benchmark trials used in our SIGMOD 2009 paper."

A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

Leo's Chronicle: ぜひ押さえておきたいデータベースの教科書
http://leoclock.blogspot.com/2009/01/blog-post_07.html
chive - MySQL database management tool
http://www.chive-project.com/
Notes from a production MongoDB deployment « Boxed Ice Blog
http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/

Mongo DB Production

Interesting blog post detailing production experiences with mongodb.

Dealing with Duplicate Person Data - Proud to Use Perl
http://proudtouseperl.com/2009/04/dealing-with-duplicate-person-data.html

I've recently been working on a fairly large project that that has contact information for almost 2 million people. These records contain details for both online and offline actions. Since the data can come from multiple sources there exist many duplicate records. Duplicate records mean more processing for our code, more storage space and more hassle for our clients who have to deal with these duplicates. All in all, bad things to leave lying around. In this article we'll look at some strategies that I used to identify and remove these duplicates. All code in this article are samples, and we'll leave the task of assembling them into a final working program up to the reader. CPAN is your Friend Like all good Perl projects, we will make heavy use of the CPAN. It makes our lives so much easier and every day I'm more in awe at the quality and bredth of solutions I find there. For this project we'll be using Text::LevenshteinXS, Lingua::EN::Nickname and Parallel::ForkManager. What is a Du

Funny to see people still using perl these days but great example

Knowledge Innovation For Technology In Education
http://kite.missouri.edu/

creating databases

Cassandra @ Twitter: An Interview with Ryan King « MyNoSQL
http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king

RT @kvz: Why Twitter is dropping MySQL in favor of Cassandra: http://bit.ly/dyeiXF

RT @DZone "Cassandra @ Twitter: An Interview with Ryan King « MyNoSQL" http://dzone.com/WbTY

MyNoSQL: Please include anything I’ve missed.

Sphinx - text search The Pirate Bay way • The Register
http://www.theregister.co.uk/2009/05/08/dziuba_sphinx/

and it's on track to become the open source world's canonical answer to the question of text search. MySQL and Solr, the two popular solutions, are showing their age. MySQL introduced full-text search in late 2000 as a way to more intelligently search blobs of text stored in databases. You can work a full-text clause into a query, and MySQL will rank the result rows by how relevant it thinks they are to the query. MySQL uses textbook search algorithms and doesn't allow for a lot of relevance tuning. It's like a drawing from a five year old: The heart is in the right place, but everybody knows that kids suck at drawing. Implementation details aside, MySQL still suffers from scalability problems. Having ignored the trend of chip manufacturers to build multiple cores into CPUs, hoping that this unpleasant trend that required them to actually think about multi-threading would just blow over sooner or later, MySQL's ability to handle parallelism is, well, see the five year old's drawing.

Sphinx can index 10 megabytes of data per second and can search up to 100 gigabytes of text on a single processor. It also supports multi-machine distributed searching, as in the case of Craigslist.

Dennis Forbes on Software and Technology - Getting Real about NoSQL and the SQL-Isn't-Scalable Lie
http://www.yafla.com/dforbes/Getting_Real_about_NoSQL_and_the_SQL_Isnt_Scalable_Lie/

SQL is Scalable and NoSQL Isn’t For Everyone The point is one that I think all rational people already realize: The ACID RDBMS isn’t appropriate for every need, nor is the NoSQL solution.

"[Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]"

PHP Tutorials Examples Introduction to PHP and MySQL
http://www.phpro.org/tutorials/Introduction-to-PHP-and-MySQL.html
HyperGraphDB - A Graph Database
http://www.kobrix.com/hgdb.jsp

HyperGraphDB is a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

幕末・明治期日本古写真メタデータ・データベース-[撮影対象から探す]
http://oldphoto.lb.nagasaki-u.ac.jp/jp/category.html

撮影対象から探す日本古写真集

Key-Value Store勉強会に行ってきました - blog.katsuma.tv
http://blog.katsuma.tv/2009/02/key_value_store_study.html

"# LuxIO (ラックスIO)"# 普通のB+-tree # 特徴1 * mapped index * index部を全部mmap o index部を実メモリより小さいシステムが対象 # 特徴2 * 長いvalue * 4Gまで * node size(page size)をこえたvalueも余計なオーバーヘッドなしで扱える # 特徴3 * 効率的なappend * paddingなしでLinkedListのデータ構造 # SSDに向いてる？ # 使い道 * key-valともに小さいデータで構想なアクセスが必要な場合 * 実メモリ以下のデータベースという制約あり * 大きなvalueを扱いたい場合 * 大きなvalueをどんどん追記したい # 向かない処理 * 削除が多い処理 * 小さいデータをたくさんリンク o seekのオーバーヘッドが大きすぎる * Read,Writeの激しいアプリ # 分散はたぶんしない # Hashはつくるかも # read lockはなくしたい * 読み込みを重きをおく"

Key-Value型データ設計に関して。いくつかのシステムの特徴などのメモ。

Urbantastic - Tech Tuesday: The Fiddly Bits
http://blog.urbantastic.com/post/81336210/tech-tuesday-the-fiddly-bits

# My own setup.

An architectural approach that uses mostly static HTML and JSON, powered by CouchDB.

In my last post I promised to talk a little about the technology that underlies Urbantastic. It’s not the usual suspects, so it’s worth some explanation.

Annotated link http://www.diigo.com/bookmark/http%3A%2F%2Fblog.urbantastic.com%2Fpost%2F81336210%2Ftech-tuesday-the-fiddly-bits

Splitting static and dynamic data, moving the synthesis of the two to the client with javascript.

10 sql tips to speed up your database
http://www.catswhocode.com/blog/10-sql-tips-to-speed-up-your-database

SQL optimization

10 sql tips to speed up your database http://bit.ly/9uIi6k #sql

100 Time-Saving Search Engines for Serious Scholars | Online Universities
http://www.onlineuniversities.com/blog/2010/03/100-time-saving-search-engines-for-serious-scholars/

Undergraduates and grad students alike will appreciate the usefulness of these search engines that allow them to find books, journal articles and even primary source material for whatever kind of research they’re working on and that return only serious, academic results so time isn’t wasted on unprofessional resources.

CouchDB with CouchRest in 5 minutes « The Merbist
http://merbist.com/2009/05/17/couchdb-with-couchrest-in-5-minutes/

The other night, during our monthly SDRuby meetup, lots of people were very interested in learning more about CouchDB and Ruby. I tried to show what Couch was all about but I didn’t have time to show how to use CouchDB with Ruby. Here is me trying to do that in 10 minutes or less. I’ll assume you don’t have CouchDB installed.

CouchDB with CouchRest in 5 minutes The other night, during our monthly SDRuby meetup, lots of people were very interested in learning more about CouchDB and Ruby. I tried to show what Couch was all about but I didn’t have time to show how to use CouchDB with Ruby. Here is me trying to do that in 10 minutes or less. I’ll assume you don’t have CouchDB installed.

Try jLinq Online
http://www.hugoware.net/TryOnline

jLinq is a Javascript library that makes working with complex arrays easy. jLinq was built based on adding functions to the core library so it is easily extended with your own custom code. jLinq is free and open source so you can contribute your creations and help improve the library!

jLINQ

配列をデータベースっぽく扱うことができるJavaScriptのライブラリ『jLinq』

カーリル | 日本最大の図書館蔵書検索サイト
http://calil.jp/

複数の図書館の蔵書とAmazonのデータベースを同時に検索するMixed Search検索

カーリルは全国4300以上の図書館/図書室から現在の貸し出し状況を簡単に検索できるサービスです。

データベースパフォーマンスに関する、僕が知りうる限り最高の教科書 - 山本大＠クロノスの日記
http://d.hatena.ne.jp/iad_otomamay/20090805/1249479181

データベースパフォーマンスアップの教科書基本原理編

どこの現場に行っても正解を導く方程式は一緒なので応用が利く Oracle、SQLServer、MySQLと色々なDBのチューニングをしてきましたが、どれもRDBの理論に基いているので基本原理を知ればチューニングは可能なはずインデックススキャンの種類や、実行計画の読み方もわかりやすく詳しい

Are Commercial Databases Worth It? - Coding the Wheel
http://www.codingthewheel.com/archives/are-commercial-databases-worth-it

I've worked with expensive SQL Server and Oracle setups for most of my career. I've defended them viciously against all comers and contrarians. I've participated in late-night guerilla flame wars and drunken bar brawls. And I've sought out with relentless tunnel vision those pieces of propaganda which support my foregone conclusion: that SQL Server and/or Oracle are (or were) the best choices for the organization. I used to be a commercial database advocate. These databases have put food on my table for a dozen years, you see. I am (or was) what you might call an entrenched practicioner, not necessarily an expert, but a practicioner. And in the manner of entrenched practicioners around the world, I've treated you heretics with the sadistic undercutting and poisonous rancor you've deserved! "MySQL?" I would sneer. "PostgreSQL? Thanks, but this a serious project. We need a database we can depend on." Ahem.

googled "why pay for commercial database" and found this among other articles

Are Commercial Databases Worth It? http://bit.ly/Du96H (via @newsycombinator) [from http://twitter.com/tadej/statuses/1664387681]

Se cuestiona la idoneidad de escoger una BD comercial com Oracle o SQL Server frente a sus alternativas Open Source.

ioannis cherouvim » Blog Archive » The * stupidest things I’ve done in my programming job
http://blog.cherouvim.com/the-stupidest-things-ive-done-in-my-programming-job/

EAV

I don't aree with all of them, but still...

Visual Guide to NoSQL Systems - Nathan Hurst's Blog
http://blog.nahurst.com/visual-guide-to-nosql-systems

Good discussion in the comments as well.

where is my milk from?
http://whereismymilkfrom.com/

RT @dogfishbeer:Thanks to @beerwars and @foodinc care about beer & food; now find out where milk in you fridge is from http://bit.ly/cVyDhn!

You'd be surprised. Did you know different brands of milk often come from the same dairy - and the same cows? Often, the same dairy provides milk for store and brand names, only differentiating them by their label! Most dairy products, especially milk have a state and plant code. Go get the milk out of your fridge and, and find out which dairy it comes from.

Explore the world of configurators! — Configurator-Database
http://www.configurator-database.com/

This is the world's biggest configurator database, featuring over 500 web-based configurators.

This site is home to the world's biggest configurator database. Scan over 500 web-based configurators now and follow the up-to-date discussion of these configurators in our blog.

eBay’s two enormous data warehouses | DBMS2 -- DataBase Management System Services
http://www.dbms2.com/2009/04/30/ebays-two-enormous-data-warehouses/

trics on eBay’s main Teradata data warehouse include: * >2 petabytes of user data

Millions of queries per day

Statistieken over de databaseverwerking van ebay

mixi Engineers’ Blog » 3行でできる超お手軽全文検索
http://alpha.mixi.co.jp/blog/?p=1112

タグ検索と全文検索といえば、Tokyo Dystopiaが同じような機能を既に実現しています。TCにタグ検索と全文検索がサポートされたからもうTDは不要なのかと思われるかもしれませんが、そうではありません。転置インデックスのライブラリとしてはTDの方がはるかに効率的かつスケールする設計になっていて、また業務に必要なカスタマイズを容易にするためにシンプルな実装になっています。一方でTCの転置インデックスは、パフォーマンスやスケーラビリティではTDに劣りますが、ものすごく簡単に導入できることが特徴です。既にテーブルDBでデータの管理をしているならば、setindexホゲホゲという文を書くだけで1分以内に検索機能を強化することができるのです

The Apache Cassandra Project
http://cassandra.apache.org/

une base données massivement parallèle et avec l'esprit "bigtable", provient de facebook

The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.

nothingmuch's most awesome Perl blog EVAR!!1one: Why I don't use CouchDB
http://blog.woobling.org/2009/05/why-i-dont-use-couchdb.html

Keep this as a reference to common couch FUD :)

High Scalability - High Scalability - Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
http://highscalability.com/blog/2010/3/23/digg-4000-performance-increase-by-sorting-in-php-rather-than.html

# # Scaling practices turn a relational database into a non-relational database. To scale at Digg they followed a set of practices very similar to those used at eBay. No joins, no foreign key constraints (to scale writes), primary key look-ups only, limited range queries, and joins were done in memory. When implementing the comment feature a 4,000 percent increase in performance was created by sorting in PHP instead of MySQL. All this effort required to make a relational database scale basically meant you were using a non-relational database anyway. So why not just use a non-relational database from the start?

As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful:

RT @Sebdz: RT: @programmateur: Digg: 4000 % performance increase by sorting in PHP rather than MySQL (via @mrboo) - http://bit.ly/ckma10

♻ @n1k0: "Scaling practices turn a relational database into a non-relational database" http://n1k.li/4v (via @nsilberman)

Typically for relatively static data sets, relatively low query volumes, and relatively high latency requirements.

JDbMonitor - Monitor JDBC Performance For Slow SQL Queries
http://www.jdbmonitor.com/

JDbMonitor is a tool to monitor & analyse database performance for any Java application. Easily determine your application's database performance and analyse problems down to specific SQL statement.

Tool for monitoring JDBC database activity

Monitor JDBC Performance For Slow SQL Queries

Data Marketplace : Find, buy and sell data online
http://datamarketplace.com/

a place where one can buy and sell structured datasets online - e.g. the WAL MART Location in the US - weekly Oilprices since 1970. If a dataset is not available, you can request it and bid an amount with a set deadline for delivery

Find, buy and sell data online

Kazuho@Cybozu Labs: Pacific という名前の分散ストレージを作り始めた件
http://developer.cybozu.co.jp/kazuho/2009/06/pacific-18c7.html
WTF is a SuperColumn? An Intro to the Cassandra Data Model — Arin Sarkissian
http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

Introductory blog post about the Cassandra data model.

jStorage - simple JavaScript plugin to store data locally
http://www.jstorage.info/
PERCONA PERFORMANCE CONFERENCE 2009 SCHEDULE :: PERCONA
http://conferences.percona.com/percona-performance-conference-2009/schedule.html

These are all mysql oriented, but it sure seems like there are some fantastic principles to be pulling from here. Ex "covering indexes; orders-of-magnitude improvements." Or one on optimizing disk i/o.

Slides from a great variety of sessions @ Percona Performance Conference 2009 (April)

slides of the conference are available onliny

I, Cringely . The Pulpit . Data Debasement | PBS
http://www.pbs.org/cringely/pulpit/2008/pulpit_20081003_005424.html

The second time through the Appistry team tossed the database, at least for its duties as a processing platform, instead keeping the transaction -- in fact ALL transactions -- in memory at the same time. This made the work flow into read-process-write (eventually). The database became more of an archive and suddenly a dozen commodity PCs could do the work of one Z-Series mainframe, saving a lot of power and money along the way.

Data | The World Bank
http://data.worldbank.org/

Site regroupant un gros paquet de données de la banque mondiale.

秒間120万つぶやきを処理、Twitterシステムの“今” －＠IT
http://www.atmarkit.co.jp/news/201004/19/twitter.html

twitter DB fan out　メール

TwitterのDB構成

Redis tutorial, April 2010 - by Simon Willison
http://simonwillison.net/static/2010/redis-tutorial/

posted by thraxil: http://quimby.ccnmtl.columbia.edu/ircbot/web/?y=2010&m=04&d=26#20100426105402

Awesome tutorial.

These slides and notes were originally written to accompany a three hour Redis tutorial I gave at the NoSQL Europe conference on the 22nd of April 2010.

mysqlでいちいちshow databasesとか打つのがめんどい→readlineのマクロで解決 - (ひ)メモ
http://d.hatena.ne.jp/hirose31/20090531/1243777478
Bulk Data Downloads: A Breakthrough in Government Transparency - O'Reilly Radar
http://radar.oreilly.com/2009/03/bulk-data-downloads-government-transparency-breakthrough.html

Wow this is potentially huge! Thoughts? RT @timoreilly:Bulk Data Downloads:A Breakthrough in Government Transparency http://bit.ly/EizO3 [from http://twitter.com/jhelmus/statuses/1283585077]

On getting greater access to government documents and data, with an amendment now in the House

Thriving Organizational Patterns - Wagn
http://wagn.org/Thriving_Organizational_Patterns

a Wikiwhere people write together + a Databasewhere people organize information + a Content Management Systemwhere people build cool websites = a Wagn.where people organize cool websites together

mixi Engineers’ Blog » DBMによるテーブルデータベース
http://alpha.mixi.co.jp/blog/?p=290

テーブルデータベースとは簡単に言えば、リレーショナルデータベースのテーブルのように、複数の列からなるレコードを格納できるデータベースです。SQLや表結合などの複雑な機能はサポートしませんが、そのぶん高速に動作します。つまり、DBMの速度で動くリレーショナル風データベースです（厳密にはリレーショナルデータベースではありません）。

Tokyo Cabinet DBM の使い方に関するチュートリアル

またひとつbigtableに近づいた感じ。mikioプロダクトはどのバージョンをどうやって入れて運用するのが楽なのかよく分からんところか。

table database

Database Versioning
http://adam.blog.heroku.com/past/2009/3/2/database_versioning/

Migrations bother me. On one hand, migrations are the best solution we have for the problem of versioning databases. The scope of that problem includes merging schema changes from different developers, applying schema changes to production data, and creating a DRY representation of the schema. But even though migrations is the best solution we have, it still isn’t a very good one.

Check the brainstorming at the end. I love where he's going. Short version: a schema.yml file identified by its SHA1 hash. Migrations are for translating data between versions. Great comments at the end by the smart people in the community.

On one hand, migrations are the best solution we have for the problem of versioning databases. The scope of that problem includes merging schema changes from different developers, applying schema changes to production data, and creating a DRY representation of the schema.

HBase vs Cassandra: why we moved « Bits and Bytes.
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/

HBase vs Cassandra: why we moved

The Twitter Engineering Blog: Introducing Gizzard, a framework for creating distributed datastores
http://engineering.twitter.com/2010/04/introducing-gizzard-framework-for.html
Zend_Acl part 3: creating and storing dynamic ACLs | CodeUtopia
http://codeutopia.net/blog/2009/02/18/zend_acl-part-3-creating-and-storing-dynamic-acls/

In this third post of the series, I’ll talk about using dynamic ACLs: How to store an ACL in a database, and construct it from there when needed. This post builds on the things introduced in part 1 and part 2.

Hibernate Performance Tuning | Javalobby
http://java.dzone.com/articles/hibernate-performance-tuning

t Level Cache (aka Transaction layer level cache)

net.sf.ehcache.hibernate.Provider

performance tuning tips for hibernate.Best article

Steve Huffman on Lessons Learned at Reddit | Carsonified
http://carsonified.com/blog/dev/steve-huffman-on-lessons-learned-at-reddit/

Steve Huffman on Lessons Learned at Reddit By Keir Whitaker

Readings in Database Systems Web Supplement
http://redbook.cs.berkeley.edu/

This book is one of the fundamental database theory books available today. A list of the papers featured in the book, as well as various lecture notes, are listed. Need to track down some of these papers.

4 Steps To a Professional Database Design | ProgrammerFish - Everything that's programmed!
http://www.programmerfish.com/4-steps-to-a-professional-database-design/

Just as you require a blueprint to build a house, you will need a database blueprint in order to implement a database successfully .

黒澤デジタルアーカイブ
http://www.afc.ryukoku.ac.jp/Komon/kurosawa/index.html

The world greatest director

онлайн-архив акиры куросавы

kurosawa archive

Free Kurosawa movies

Opened last year by Kyoto’s Ryukoku University, the archive honors Akira Kurosawa, Japan’s celebrated filmmaker who brought us The Seven Samurai, Rashomon, Ikiru, etc. and won an Oscar for Lifetime Achievement in 1989. What will you find here? A good 20,000 items. Screenplays, manuscripts, photos, sketches, newspaper clippings, notes, etc. You won’t find a larger Kurosawa collection on the web.

Akira Kurosawa Digital Archive

Cassandra By Example | Rackspace Cloud Computing & Hosting
http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example/

Maybe I should learn to use Cassandra someday.

SQL Server 2005 Paging – The Holy Grail - SQL Server Central
http://www.sqlservercentral.com/articles/T-SQL/66030/

The paging and ranking functions introduced in 2005 are old news by now, but the typical ROW_NUMBER OVER() implementation only solves part of the problem. Nearly every application that uses paging gives some indication of how many pages (or total records) are in the total result set. The challenge is to query the total number of rows, and return only the desired records with a minimum of overhead? The holy grail solution would allow you to return one page of the results and the total number of rows with no additional I/O overhead. In this article, we're going to explore four approaches to this problem and discuss their relative strengths and weaknesses. For the purposes of comparison, we'll be using I/O as a relative benchmark.

Electrical What ?!
http://electricalwhat.com/

Frustrated by the difficulty of searching schematic symbols through long lists with little information led to the creation of Electrical What !?, a database of electronic components. Electrical What !? displays all electronic components in a easily scanable and cataloged format. However what truly sets Electrical What !? apart from your average reference book is the ability to search by appearance. Using these tools Electrical What !? hopes to make looking for electronic symbols a breeze.

Electrical symbols for schematics and print reading

Pillbox - prototype pill identification system
http://pillbox.nlm.nih.gov/index.html

enables to search unknown solid-dosage medications (tablets/capsules) based on physical characteristics and images. The system combines high-resolution images of tablets and capsules with FDA-approved appearance information (imprint, shape, color, etc.) to enable users to visually search for and identify an unknown solid dosage pharmaceutical. This system is designed for use by emergency physicians, first responders, other health care providers, Poison Control Center staff, and concerned citizens. Annotated link http://www.diigo.com/bookmark/http%3A%2F%2Fpillbox.nlm.nih.gov

A site that identifies pills

無いから作った人たち：ITpro
http://itpro.nikkeibp.co.jp/article/OPINION/20090216/324752/

"memcachedの特徴は、データをキャッシュするメモリーとして、通常のPCサーバーの物理メモリーを利用すること。大容量データを複数のPCサーバーのメモリーに分散しておくために、「キー・バリュー型データストア」と呼ぶ方法を採用している。データをいったん非正規化し、「キー」とそれに対応する「値（バリュー）」にしてから保存する。データをキーと値の組み合わせにすることで、複数のサーバーに分散しておける。"

README - redis - Google Code
http://code.google.com/p/redis/wiki/README

a database implementing a dictionary, where every key is associated with a value. every single value has a type. The following types are supported: * Strings * Lists * Sets * Sorted Set (since version 1.1)

maybe the guy is not suitable to address such compare?

Persistent in-memory key value database compared to memcached

tructures and algorithms. Indeed both algorithms and data structures in Redis are properly choosed in order to obtain the best performance.

blog.TBODA.com | 5 Useful SQL Server Scripts
http://blog.tboda.com/post/5-Useful-SQL-Server-Scripts.aspx
MySQLによるデータウェアハウス構築 (Yahoo! JAPAN Tech Blog)
http://techblog.yahoo.co.jp/web/yahoo/mysql/

バッチ

Vegetarian Recipe Search - Find Vegetarian Recipes
http://vegetarianrecipe.us/

Recommended by Lifehacker

Find Vegetarian recipes with this vegetarian recipe search engine.

Stack Overflow Creative Commons Data Dump - Blog - Stack Overflow
http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/

Awesome, Stack Overflow release all of their public web data under a CC license.

mixi Engineers’ Blog » 100行のCプログラムでWebチャットを実装する方法
http://alpha.mixi.co.jp/blog/?p=1029

Tokyo Cabinet

Free Geolocation API tool : CodeDiesel
http://www.codediesel.com/tools/free-geolocation-api-tool/

http://iplocationtools.com/ip_query.php?ip=

Using PHP and cURL, pinging for the details of an IP address to nail down the city, country, zip, latitude, longitude etc, of a visitor

http://iplocationtools.com

Web λ.0 - Functional programming for the Web: Sky is the limit
http://weblambdazero.blogspot.com/2008/09/sky-is-limit.html

Using tokyocabinet as backing store for Mnesia

This is only the 3rd blog post I found about mnesiaex and support for tokyocabinet. The comments are worth reading!

Life is beautiful: マルチスレッド・プログラミングの落とし穴、その２
http://satoshi.blogs.com/life/2008/09/post-1.html

bookmark してなかったのか… >そう考えると、私にはCreate/Update/Deleteのリクエストに対して、クライアントを待たせながら（つまり、HTTP Requestの処理に必要なスレッド・プロセスを保持したまま）データベースに変更をかけることが根本的に間違っているように思える。これは同感なんだが、非同期にして comet 的に処理するとしても、他のリクエストとの整合性が必要なケースは存在するので、そこを確実にする配慮が必要になる筈。

問題の分割。実装詳細は詳しいのがほかにいくらでもあると思う

Simple Wins : Daytime Running Lights
http://jchrisa.net/drl/_design/sofa/_show/post/Simple-Wins

Background on jchrisa's Toast (standalone chat app in CouchDB+JS+HTML)

The point is to show how CouchDB's "databasey" features, because they are implemented using HTTP, can be leveraged to make powerful end-user experiences, with just a minimum of code.

Map Fields - A Rails plugin to ease the importing of CSV files @ Ramblings on Rails
http://ramblingsonrails.com/map-fields-a-rails-plugin-to-ease-the-importing-of-csv-files

a very nice solution for importing csv files, it handles the mapping which is a very common problem

OMFG!!!! Manbabies are ready!!!!

Ease way to import csv files

「キー・バリュー型データストア」開発者が大集合した夜：ITpro
http://itpro.nikkeibp.co.jp/article/OPINION/20090226/325527/

記者にとって驚きだったのは、現在日本で開発されているキー・バリュー型データストアがこの3つに留まらないことだった。しかも開発者は総じて若い。勉強会に参加する80人近くの技術者も、ほぼ同年代だった。

キー・バリュー型データストア（またはキー・バリュー型データベース）は、大量のユーザーとデータを抱え、データベースのパフォーマンス問題とコスト高に頭を悩ませるWeb企業が注目する技術である。

mmalone's django-caching at master - GitHub
http://github.com/mmalone/django-caching/tree/master

"Mike Malone shares code used by Pownce to add QuerySet level caching to Django. It’s a smart implementation—a CachingQuerySet class inspects the arguments passed to get(), and if they’re just a straight forward exact PK lookup hits memcache for the object before hitting the database. Signals are used to invalidate the cache."

Some examples of transparently caching things in Django. An example Django app that uses custom managers, fields, and QuerySets to transparently cache objects.

Some examples of transparently caching things in Django.

mmalone's django-caching app

What is data science? - O'Reilly Radar
http://radar.oreilly.com/2010/06/what-is-data-science.html

The future belongs to the companies who figure out how to collect and use data successfully. In this in-depth piece, O'Reilly editor Mike Loukides examines the unique skills and opportunities that flow from data science.

aspects Business Intelligence, Text Mining, and other statistical analysis

SQLike - a small query engine
http://www.thomasfrank.se/sqlike.html

SQLike is a small (10 kB) query engine for JavaScript and ActionScript. Its functionality and syntax is similar to that of SQL and it can be used to query arrays of objects or arrays of arrays.

VoltDB: Fast, Scalable SQL RDBMS with ACID
http://www.voltdb.com/

SCALABLE, OPEN-SOURCE SQL DBMS WITH ACID

MySQL Format Date | date_format Tool
http://www.mysqlformatdate.com/

DESIGN

FamilySearch.org - Family History and Genealogy Records
http://fsbeta.familysearch.org/

Beta program to digitize records.

NoSQL at Twitter (NoSQL EU 2010)
http://www.slideshare.net/kevinweil/nosql-at-twitter-nosql-eu-2010

A discussion of the different NoSQL-style datastores in use at Twitter, including Hadoop (with Pig for analysis), HBase,

Twitters NoSQL slides

A discussion of the different NoSQL-style datastores in use at Twitter, including Hadoop (with Pig for analysis), HBase, Cassandra, and FlockDB.

cassandra,thrift, hdfs, hbase, scribe,pig,lzo, flockdb

interesting presentation on #NoSQL at #twitter by @kevinweil http://bit.ly/99h8BK [from http://twitter.com/behi_at/statuses/13587582774]

Falsehoods Programmers Believe About Names: MicroISV on a Shoestring
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

This blog is about the business aspects of running Bingo Card Creator, a small software company. A brief summary of the last few years is available here. If you like what you see, I encourage you to sign up for the RSS feed. Thanks for visiting!

怪異・妖怪画像データベース
http://www.nichibun.ac.jp/YoukaiGazouMenu/

Base de données des monstres de la mythologie japonaise.

怪異・妖怪画像データベース怪異・妖怪画像データベース Copyright (c)2010- International Research Center for Japanese Studies, Kyoto, Japan. All rights reserved. はてなブックマーク - 怪異・妖怪画像データベースはてなブックマークに追加 gin-oi2 gin-oi2 データベース, **お役立ち, *webサービスいつか、役に立つときが来るかも…

CriticalPast.com: Search over 57000 videos and 7 million photos
http://www.criticalpast.com/

View more than 57,000 historic videos and 7 million photos for FREE in one of the world's largest collections of royalty-free archival stock footage. Offering immediate downloads in more than 10 formats starting at just $1.97 (Consumer); $30 (Pro).

archive of "over 57000 videos and 7 million photos" especially of the mid-20th century (1930s to '60s) - "offering imediate downloads in more than 10 formats starting at just $1.97 (Consumer); $30 (Pro)."

Getting Started with HTML5 Local Databases « Dark Crimson Blog
http://blog.darkcrimson.com/2010/05/local-databases/

HTML 5 local databases. Gives step by step instructions for setting one up!

Starting with Safari 4, iPhone/iPad OS3, Chrome 5, and Opera 10.5 (Desktop), HTML5 Local Databases are now supported. I’ve been reading about local databases for quite some time and decided to do a write up with some basic examples on how to get started.

14 Starting with Safari 4, iPhone/iPad OS3, Chrome 5, and Opera 10.5 (Desktop), HTML5 Local Databases are now supported. I’ve been reading about local databases for quite some time and decided to do a write up with some basic examples on how to get started.

Membase.org
http://www.membase.org/

For those familiar with memcached, membase provides on-the-wire protocol compatibility, but adds disk persistence; hierarchical storage management; data replication; live cluster reconfiguration and rebalancing; and secure multi-tenancy with data partitioning. Like memcached, membase is simple, fast and elastic.

Persistent Key/Value Storage

Membase is an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput.

from oreilly news link

Membase is an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. It scales linearly from a single-server deployment to a cluster of thousands of machines. And because membase does not require creation of a schema before storing data, it is a flexible, cost-effective place to Store Lots of Stuff.

Membase is an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. It scales linearly from a single-server deployment to a cluster of thousands of machines. And because membase does not require creation of a schema before storing data, it is a flexible, cost-effective place to Store Lots of Stuff. The original membase source code was released as Open Source by NorthScale, Zynga and NHN to membase.org in June 2010.

Why you Should be using PHP’s PDO for Database Access | Nettuts+
http://net.tutsplus.com/tutorials/php/why-you-should-be-using-phps-pdo-for-database-access/

Many PHP programmers learned how to access databases by using either the mysql or mysqli extensions. Since PHP 5.1, there’s been a better way. PHP Data Objects (PDO) provide methods for prepared statements and working with objects that will make you far more productive!

This is an eazy to learn tutorial for PDO.

Many PHP programmers learned how to access databases by using either the mysql or mysqli extensions. Since PHP 5.1, there's been a better way. PHP Data Objects

ripe for SQL Injection!

A fast, fuzzy, full-text index using Redis | PlayNice.ly
http://playnice.ly/blog/2010/05/05/a-fast-fuzzy-full-text-index-using-redis/

PlayNice.ly is entirely based on a data-structure server called Redis. Redis is one of several new key-value databases which break away from traditional relational data architecture. It is simple, flexible, and blazingly fast. So why not use the tools we have already?

redis.smembers("word:" + metaphone("python"))

Interesting post about being able to search data in redis using indexing and phonetic algorthms.

5 Rails Plugins to Help Optimize Your MySQL | Purify Blog
http://blog.purifyapp.com/2010/06/15/optimise-your-mysql/

Bullet / SlimScrooge/ Query Reviewer / Rails Indexes / Ambitious Query Indexer

PHPで大規模ブラウザゲームを開発してわかったこと
http://www.slideshare.net/ketaiorg/php-4638298
Migrating to CouchDB — CouchDB: The NoSQL Document Database
http://www.couch.io/migrating-to-couchdb
freebase-gridworks - Project Hosting on Google Code
http://code.google.com/p/freebase-gridworks/

"an open data cleansing tool"

Freebase Gridworks is a power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. All in the comfort and privacy of your own computer.

Top 10 MySQL GUI Tools — DatabaseJournal.com
http://www.databasejournal.com/features/mysql/article.php/3880961/article.htm
InfoQ: 又拍网架构中的分库设计
http://www.infoq.com/cn/articles/yupoo-partition-database

又拍网和大多数Web2.0站点一样，构建于大量开源软件之上，包括MySQL、PHP、nginx、Python、memcached、redis、Solr、Hadoop和RabbitMQ等等。又拍网的服务器端开发语言主要是PHP和Python，其中PHP用于编写Web逻辑（通过HTTP和用户直接打交道），而Python则主要用于开发内部服务和后台任务。在客户端则使用了大量的Javascript，这里要感谢一下MooTools这个JS框架，它使得我们很享受前端开发过程。另外，我们把图片处理过程从PHP进程里独立出来变成一个服务。这个服务基于nginx，但是是作为nginx的一个模块而开放REST API。

B上我们都建立了shard_001和shard_002两个逻辑数据库， Node-A上的shard_001和Node-B上的shard_001组成一个Shard，而同一时间只有一个逻辑数据库处于Active状态

InfoQ: 又拍网架构中的分库设计
http://www.infoq.com/cn/articles/yupoo-partition-database

又拍网和大多数Web2.0站点一样，构建于大量开源软件之上，包括MySQL、PHP、nginx、Python、memcached、redis、Solr、Hadoop和RabbitMQ等等。又拍网的服务器端开发语言主要是PHP和Python，其中PHP用于编写Web逻辑（通过HTTP和用户直接打交道），而Python则主要用于开发内部服务和后台任务。在客户端则使用了大量的Javascript，这里要感谢一下MooTools这个JS框架，它使得我们很享受前端开发过程。另外，我们把图片处理过程从PHP进程里独立出来变成一个服务。这个服务基于nginx，但是是作为nginx的一个模块而开放REST API。

B上我们都建立了shard_001和shard_002两个逻辑数据库， Node-A上的shard_001和Node-B上的shard_001组成一个Shard，而同一时间只有一个逻辑数据库处于Active状态

HASHCRACK.COM - Reverse Hash Lookup for MD5, SHA1, MySQL, NTLM and Lanman-Password-Hashes
http://hashcrack.com/index.php
HASHCRACK.COM - Reverse Hash Lookup for MD5, SHA1, MySQL, NTLM and Lanman-Password-Hashes
http://hashcrack.com/index.php
Reflections on MongoDB // Collective Idea
http://collectiveidea.com/blog/archives/2010/06/15/reflections-on-mongodb/

Reflections on MongoDB -- http://bit.ly/aHCUC9

Introduction to MySQL Triggers | Nettuts+
http://net.tutsplus.com/tutorials/databases/introduction-to-mysql-triggers/

mysql触发器介绍