Rules of Database App Aging - Push cx
"all fields become optional" etc. good stuff.
Rule 3. (too true!) Chatter Always Expands
This will be incomprehensible to non developers in the audience, but oh god, this is so painfully, painfully true.
"I mentioned I’ve learned some rules of how database apps change over time, now that I’ve done a few dozen. They are: ... "Home - Who Runs Gov - Government directory
site du washington post sur l'administration obama (participatif)
Awesome site provided by the Washington Post that provides in-depth information on many of Washington's inside elite.
WhoRunsGov.com offers a unique look at the world of Washington through its key players and personalities.
Washington Post experimentWWW SQL Designer - default
This tool allows you to draw and create database schemas (E-R diagrams) directly in browser, without the need for any external programs (flash)ブラウザ上でさくさくデータベースの設計ができる『WWW SQL Designer』がすごすぎる - IDEA*IDEA ～ 百式管理人のライフハックブログ ～
これは次回使いたい。DIY: How to write a book - Boing Boing
How to write a book notes by Steven JohnsonListable
Listable, create and share lists with JSON, SQL, and plaintext output
listable, list making, serving site
having so much fun with this already (via sccottt)Reject Database for iPhone Developer
how good is the BBC?15 Websites to Trace People Online | MakeUseOf.com
There are many websites that search standard social networks like MySpace or Facebook, but Piple is one resource that conducts a “deep web” dig for the name you’re looking for on “non-typical sites.” The search results from Piple are pretty impressive. Y
the private world of yesterday is now an online world with open access to social networks, government databases, and public records.
pentru tine cu dragdewey
search engine for archive.org music
I wrote a Perl script to crawl through the Live Music Archive and make an XML file of all the streamable songs, and now I'm putting the information from the XML file into MySQL databases to populate this interface. I'm also tweaking the interface to make it look nice and neat. Dewey is named after the Dewey Decimal System because it organizes the Live Music Archive library in the same way that the Dewey Decimal System does. I understand that sifting through that many artists is a bit daunting if you don't know what you're looking for. I've added Genre tags that are pulled from Last.fm to help you find your style, and a search function to help you find exactly what you want. While I look into better options, and as the database grows, let me suggest that you use the "Concerts From Today" tab to narrow down your listening possibilities while browsing. I hope to be adding rating, tagging and commenting functionality soon.32 Tips To Speed Up Your MySQL Queries | AjaxLine
How to Paginate Data with PHP
An explanation of how to paginate data using PHP -- code is included.Catalogue of Digitized Medieval Manuscripts: About Us
Application DatabaceIs the Relational Database Doomed? - ReadWriteWeb
Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud.
Article about where key/value databases should be used over relational databases, with some examples of dbs available.
purpose of the key/value databases. is the paradigm changing these days?Tokyo Cabinet: Beyond Key-Value Store - igvita.com
SAVE N SHARE
A database lib
bdb alternative und sehr schnellAsk SM [PHP]: Form Validation, Converting MySQL to XML | Developer's Toolbox | Smashing Magazine
Ask SM sobre PHP: Validacion de formularios, MySQLtoXML y mas ...
smashingmagazine.com: form validation & converting mysql to xmlInfoQ: CouchDB and Me
In this talk from RubyFringe, Damien Katz explains what drove him to create CouchDB, why he chose Erlang and what made him decide to sell his house to work on Free Software.
Very inspiring.CodeProject: Visual Representation of SQL Joins. Free source code and programming help
DBのチューニングAraelium Group : Querious - MySQL Database Tool
Looks hot. Hopefully less crash prone than MySQL Administrator. Will try.
MySQL Database editing app for OS XDeepPeep: discover the hidden web
DeepPeep is a search engine specialized in Web forms. The current beta version currently tracks 13,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services.
Moteur de recherche pour le web invisiblePostgreSQLを高速化する16のポイント
postgresqlを使う時に読み返したいNew Search Technologies Mine the Web More Deeply - NYTimes.com
"Now a new breed of technologies is taking shape that will extend the reach of search engines into the Web’s hidden corners. When that happens, it will do more than just improve the quality of search results — it may ultimately reshape the way many companies do business online."
Google now indexes a trillion web pages - but that's just a fraction of what's out there. So, what does it miss?
...google is built for a static web...................Amazon Exposes 1 Terabyte of Public Data to Developers - ReadWriteWeb
Interesting article about MySQL scalability problems.Directed Edge News » Blog Archive » On Building a Stupidly Fast Graph Database
connected to and things that connect to them. These are symmetrical — so creating a link from item A to item B, creates a reference from item B to item A.New Search Technologies Mine the Web More Deeply - NYTimes.com
An interesting look at the daunting task of connecting/mining the interwebs.
Search engines are starting to penetrate databases that are set up to respond to typed queries.
how to search databases, semantic sebPlurk Open Source - LightCloud - Distributed and persistent key value database
aid, here is what it takes to do 10.000 gets and sets:HowFriendFeedUsesMySqlToStoreSchemaLessData - FriendFeed では MySQL を使いどのようにスキーマレスのデータを保存しているのか
昨日のの邦訳Dare Obasanjo aka Carnage4Life - Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes
"Database sharding is the process of splitting up a database across multiple machines to improve the scalability of an application. The justification for database sharding is that after a certain scale point it is cheaper and more feasible to scale a site horizontally by adding more machines than to grow it vertically by adding beefier servers."
SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);Jonathan Ellis's Programming Blog - Spyced: All you ever wanted to know about writing bloom filters
よいまとめAdam Gotterer - How we cache at CollegeHumor
CollegeHumor memcache useMonty says: Oops, we did it again (MySQL 5.1 released as GA with crashing bugs)
Andrew is the original developer of MySQL & has recently left Sun.
The reason I am asking you to be very cautious about MySQL 5.1 is that there are still many known and unknown fatal bugs in the new features that are still not addressed.redis - Google Code
Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements.
“Redis is a key-value database. It is similar to memcached but the dataset is not volatile, and values can be strings, exactly like in memcached, but also lists and sets with atomic operations to push/pop elements. “In order to be very fast but at the same time persistent the whole dataset is taken in memory and from time to time and/or when a number of changes to the dataset are performed it is written asynchronously on disk. You may lost the last few queries that is acceptable in many applications but it is as fast as an in memory DB (beta 6 of Redis includes initial support for master-slave replication in order to solve this problem by redundancy).”
A nice fast K/V data store, with some nice list/set features.Amazon Web Services Blog: New AWS Public Data Sets - Economics, DBpedia, Freebase, and Wikipedia
We have just released four additional AWS public data sets, and have updated another one. In the Economics category, we have added a set of transportation databases from the US Bureau of Transportation Statistics. Data and statistics are provided for aviation, maritime, highway, transit, rail, pipeline, bike & pedestrian, and other modes of transportation, all in CSV format. I was able to locate employment data for our hometown airline and found out that they employed 9,322 full-time and 1,122 part-time employees as of the end of 2007. In the Encyclopedic category, we have added access to the DBpedia Knowledge Base, the Freebase Data Dump, and the Wikipedia Extraction, or WEX.
amazonPerformance, Scalabilty and Architecture - Java and .NET Application Performance Management (dynaTrace Blog) » Understanding Caching in Hibernate - Part One : The Session Cache
Everything You Need to Get Started With MySQL - NETTUTSAre Cloud Based Memory Architectures the Next Big Thing? | High Scalability
We are on the edge of two potent technological changes: Clouds and Memory Based Architectures. This evolution will rip open a chasm where new players can enter and prosper. Google is the master of disk. You can't beat them at a game they perfected. Disk based databases like SimpleDB and BigTable are complicated beasts, typical last gasp products of any aging technology before a change. The next era is the age of Memory and Cloud which will allow for new players to succeed. The tipping point is soon. Let's take a short trip down web architecture lane: # It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database # It's 1995: Scale-up the database. # It's 1998: LAMP # It's 1999: Stateless + Load Balanced + Database + SAN # It's 2001: In-memory data-grid. # It's 2003: Add a caching layer. # It's 2004: Add scale-out and partitioning. # It's 2005: Add asynchronous job scheduling and maybe a distributed file system. # It's 2007: Move it all into the cloud. # It's 2008: Cloud +
What makes Memory Based Architectures different from traditional architectures is that memory is the system of record. Also discussed Jim Starkey NimbusDB13 Great WordPress Speed Tips & Tricks for MAX Performance | Noupe
Performance is a key factor for any successful website. And since WordPress is becoming more popular than ever, it will only be at its best when raised in the
13 Great WordPress Speed Tips & Tricks for MAX PerformanceIP address geolocation SQL database | Share your knowledge!
Search for organizational charts with The Official Board. We provide constantly updated organizational charts of the world’s 20 000 largest corporations. A strong personal network is the key to professional success. The Official Board is constantly developed and updated by our members in real time. That means our members are the first to know. Free registration is required. Most of the information can be accessed for free by our members. Then, by adding to the database, they become allowed to search for deeper information in each organizational chart.
View the organizational charts of the world's 20 000 largest corporations.
Welcome to The Official Board. We provide constantly updated organizational charts of the world's 20 000 largest corporations.
org charts of companies
Ger information om företagets ledning. Wiki
"We provide constantly updated organizational charts of the world’s 20 000 largest corporations."
See also the techcrunch article.Most common passwords list from 3 databases
List of most commonly used passwords
A detailed password analysis of compromised passwords from myspace, phpbb, and singles.org
Includes country, publication, title, beat.
Media Database powered by TrackVia that features media on Twitter.
A US datebase of (US) media types in TwitterAmazon Elastic MapReduce
There's a growing trend to provide some pretty awesome IT services over the internet. Seems to me that's the way it will mostly be in a decade's time - or less.
Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).
Who needs infrastructure? Keep your data somewhere else. Process your data somewhere else. You can now run your small data business out of your garage. Just photoshop a nice office for the investors.フォント専門サイト fontnavi(フォントナビ)
便利なフォント検索サイトがOPEN! 3,000以上の日本語フォントからすぐに探せます。Test Center: Slacker databases break all the old rules | InfoWorld | Test Center | March 24, 2009 | By Peter Wayner
non-relation db comparisionAlternatives to Windows, Mac, Linux and online applications | AlternativeTo.net
Keeping your database simple and fast is often difficult if you use higher level frameworks such as ActiveRecords in Ruby or Java object persistence technologies such as Hibernate. There is a lot of magic that is happening out of sight that you have no control over. If you then have to scale your application it is often the relational database that these technologies require that becomes the performance and scaling bottleneck. Often requiring complex custom implementations of partitioning and sharding to make it work. The AWS services Amazon S3 and Amazon SimpleDB were designed to handle the dominant storage usage patterns within Amazon and they greatly reduced our need to rely on relational storage for scaling our systems. But it is almost never the case that a single storage technique is used in applications and services that need to operate at enterprise scale. For example it is a common pattern that objects stored in S3 using a primary key, have a collection of secondary keys (e.g
allthingsdistributed.com allthingsdistributed.com Database DatabaseSimpleandFastIP address geolocation SQL database
Free downloadable IP address to long/lat SQL database
free IP address geolocation SQL databaseIP Location tools :: IP Location Tools
IP Location toolsMake: Online : Free, unlimited IP address geolocation with MySQL
Zur Spezifikation der Ausschreibung hinzufügen
For example, if you have an ip of 126.96.36.199 (google.com)駅データ 無料ダウンロード 『駅データ．ｊｐ』
JSONスキーマSearch for Recipes by Ingredient - Recipe Puppy
thanks to lifehacker for this link
Search engine for finding recipes based on what you have in the house!Some Notes on Distributed Key Stores « random($foo)
Distributed Key Stores
Twitterで使えるbot50徹底レビュー！ Twitterで使えるbot50徹底レビュー！ 約1000万人が利用しているという、つぶやき共有型SNS「Twitter」。そこには「bot」という便利なサービスがある。Twitter上でフォロー（発言を共有するユーザーリストに追加）するだけで最新ニュースが分かったり、「＠」をつけてbot向けに発言するだけで晩御飯のレシピが分かったりと、うまく活用すれば非常に便利だ。だがその数はおそろしく多く、どれを使えばいいのかよく分からない。そこで今回は... はてなブックマーク - Twitterで使えるbot50徹底レビュー！ はてなブックマークに追加 gin-oi2 gin-oi2 *まとめ, twitter 最近botばかりフォロー追加してるわJedi/Sector One's random thoughts - An overview of modern SQL-free databases
All the data we've used in this first launch are produced and published by the U.S. Bureau of Labor Statistics and the U.S. Census Bureau's Population Division. They did the hard work! We just made the data a bit easier to find and use. Since Google's acquisition of Trendalyzer two years ago, we have been working on creating a new service that make lots of data instantly available for intuitive, visual exploration.
Google launched a new search feature that makes it easy to find and compare public data. So for example, when comparing Santa Clara county data to the national unemployment rate, it becomes clear not only that Santa Clara's peak during 2002-2003 was really dramatic, but also that the recent increase is a bit more drastic than the national rate. If you go to Google.com and type in [unemployment rate] or [population] followed by a U.S. state or county, you will see the most recent estimates. Once you click the link, you'll go to an interactive chart that lets you add and remove data for different geographical areas.
Adding search power to public data 4/28/2009 12:17:00 PM Earthquakes are not the only thing that can shake Silicon Valley. After the dot-com bubble burst back in 2000 the unemployment rate of Santa Clara county went up to 9.1%. During the last couple of months, it has gone up again:
Google has launched a cool, if somewhat limited, new feature that makes it easier to search for and visualize statistics gleaned from public data. You can search for "unemployment rate" or "population" for any area in the United States and Google will provide you with information from the US Bureau of Labor Statistics and the Census Bureau.
"We just launched a new search feature that makes it easy to find and compare public data... If you go to Google.com and type in [unemployment rate] or [population] followed by a U.S. state or county, you will see the most recent estimates... Once you click the link, you'll go to an interactive chart that lets you add and remove data for different geographical areas."Alternatives to SQL Databases [LWN.net]
Traditional SQL databases with "ACID" properties (Atomicity, Consistency, Isolation and Durability) give strong guarantees about what happens when data is stored and retrieved. These guarantees make it easier for application developers, freeing them from thinking about exactly how the data is stored and indexed, or even which database is running. However, these guarantees come with a cost.iostat -x « domas mituzas: vaporware, inc.
Plugin that monitors the fields you're actually using from queries you make and over time dynamically adjusts your queries to retrieve only the fields you need. Apparently includes some magic to go re-query for more fields if you attempt to use one you hadn't loaded in the trimmed query. Amazing-looking stuff, though since we're currently using a DB on the same machine, transferring lots of extra data isn't nearly as expensive.
Dynamic query optimization is a hotbed of research in the database industry. Each and every query you execute goes through a rigorous optimization phase which tries to squeeze every last bit of performance: deciding which indexes to use, the execution order and sort order to minimize the number in-memory tables, etc. However, one thing the database has no access to is the application layer knowledge of which data the user is actually using after it is retrieved. Often times, the query fetches all of the columns when only a few are required, which is exactly the pattern that Lourens Naudé is seeking to optimize with his new plugin: scrooge.
Dynamic Query OptimizationBASE: AN ACID ALTERNATIVE - ACM Queue
Excellent description of BASE design patterns
If ACID provides the consistency choice for partitioned databases, then how do you achieve availability instead? One answer is BASE (basically available, soft state, eventually consistent).漢(オトコ)のコンピュータ道: MySQLのEXPLAINを徹底解説!!
高校世界史授業を誌上公開。脱線話も含め、可能な限り再現。古代史、中世史、近代史、東洋史、西洋史。生徒達に世界史を語ります。歴史の面白さ、楽しさを、伝えることが出来れば幸いです。Backup your Database in Git | Viget Extend
When you think about it, a database dump is just SQL code, so why not manage it the same way you manage the rest of your code — in a source code manager? Setting such a scheme up is dead simple. On your production server, with git installed:DB設計時のサイズ見積もり - よねのはてな
The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. Although the initial launch of Data.gov provides a limited portion of the rich variety of Federal datasets presently available, we invite you to actively participate in shaping the future of Data.gov by suggesting additional datasets and site enhancements to provide seamless access and use of your Federal data. Visit today with us, but come back often. With your help, Data.gov will continue to grow and change in the weeks, months, and years ahead.
WOW "The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government."
The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government.
The new U.S. federal open data site is live! "Data.gov will open up the workings of government by making economic, healthcare, environmental, and other government information available on a single website, allowing the public to access raw data and transform it in innovative ways."Drop ACID and Think About Data | High Scalability
nice summary of different data stores...With YQL Execute, the Internet becomes your database (Yahoo! Developer Network Blog)
The Yahoo! Query Language lets you query, filter, and join data across any web data source or service on the web. Using our YQL web service, apps run faster with fewer lines of code and a smaller network footprint. YQL uses a SQL-like language because it is a familiar and intuitive method for developers to access data. YQL treats the entire web as a source of table data, enabling developers to select * from Internet.
YQL + Linked Data = possibilities
Describes the testing of SQLite. Great overview of various testing techniques and how they've been applied to a significant software project.Socrata | Making Data Social
"Opening government to new audiences and constituencies is the 21st century battle cry in societies everywhere. At the heart of this movement is open government data, readily accessible over the internet, in a form that maximizes comprehension, interactivity, participation, and sharing, delivered at a fraction of the cost of today's data download sites."
This used to be the site called blist.
AWESOME source of data sets, .csv10 Essential SQL Tips for Developers - Nettuts+
Computer Science and Artificial Intelligence Laboratory. Navigate * Website * Twitter * Subscribe * Archives * Random Subscribe by Email MIT Database Systems (6.830) TA Course Notes In Fall 2008, I had the pleasure of TAing Database Systems with Sam Madden, Mike Stonebraker, and Evan Jones. I figured that I could take notes to help students follow the lectures while clarifying any confusing points that were raised during discussion. It would also help me avoid the embarrassment of forgetting something mentioned during a lecture and having students explain it to me during office hours:). I decided to take notes in plain text, mostly out of laziness. This turned out to be a challenge for drawing things like query plans, but forced me to distill explanations into a conversational tone that provided an alternative to traditional diagrams. Some students in the class told me that they benefited from and enjoyed the notes, and so I decided to open them up for reuseWhy CouchDB?
man, I really really wish I understood this stuff.
“Django may be built for the Web, but CouchDB is built of the Web. I’ve never seen software that so completely embraces the philosophies behind HTTP. CouchDB makes Django look old-school in the same way that Django makes ASP look outdated.”
ebook on why you would choose couchdbCouchDB: Perform like a pr0n star
Check out this SlideShare Presentation : CouchDB: Perform like a pr0n star http://tinyurl.com/cukfou [from http://twitter.com/josefrichter/statuses/1588959474]Scalable Web Architectures: Common Patterns and Approaches - Web 2.0 Expo NYC
a basic one about optimizing database query execution time
Indexing the WebPerformance comparison: key/value stores for language model counts - Brendan O'Connor's Blog
The first one is to use an in-memory data store, and communicate using the memcached protocol. This is, of course, *exactly* comparable to Memcached — behaviorally indistinguishable! — and it does worse. The second option is to do that, except switch to an on-disk data store. It’s pretty ridiculous that that’s still the same speed — communication overhead is completely dominating the time. Fortunately, Tyrant comes with a binary protocol. Using that substantially improves performance past Memcached levels, though less than a direct in-process database. Yes, communication across processes incurs overhead. No news here, I guess.
"Tokyo Tyrant is a server implemented on top of Cabinet that implements a similar key/value API except over sockets. It’s incredibly flexible; it was very easy to run it in several different configurations. The first one is to use an in-memory data store, and communicate using the memcached protocol. This is, of course, *exactly* comparable to Memcached — behaviorally indistinguishable! — and it does worse. The second option is to do that, except switch to an on-disk data store. It’s pretty ridiculous that that’s still the same speed — communication overhead is completely dominating the time. Fortunately, Tyrant comes with a binary protocol. Using that substantially improves performance past Memcached levels, though less than a direct in-process database. Yes, communication across processes incurs overhead. No news here, I guess."Google Fusion Tables (Pre-Alpha)
Fusion Tables é uma plataforma online com nova tecnologia que uniformiza diversos tipos de dados e promete economia às empresas.Official Google Research Blog: Google Fusion Tables
Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly. Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as weProject Voldemort Blog : Building a terabyte-scale data cycle at LinkedIn with Hadoop and Project Voldemort
Not one of those "we're using hadoop, now we're cool" articles. Well written!
HadoopBritish Newspapers - Home
Search British Newspapers from 1800-1900. Many with free content
Explore two million pages of 19th century newspapersNeo4j - a Graph Database that Kicks Buttox | High Scalability
If you are Digg or LinkedIn you can build your own speedy graph database to represent your complex social network relationships. For those of more modest means Neo4j, a graph database, is a good alternative. A graph is a collection nodes (things) and edges (relationships) that connect pairs of nodes. Slap properties (key-value pairs) on nodes and relationships and you have a surprisingly powerful way to represent most anything you can think of. In a graph database "relationships are first-class citizens. They connect two nodes and both nodes and relationships can hold an arbitrary amount of key-value pairs. So you can look at a graph database as a key-value store, with full support for relationships."
The SQL database behind ipinfodb.com is offered for free. We offer the database in different formats (SQL, CSV), city or country precision, 3 or 4 IP digits precision and data in single or multiple tables. Available information in the database : ISO country code, country name, FIPS region code, region name, city, zipcode, latitude, longitude and GMT/DST timezone. The database is updated during the first week of each month.Wescript
Wescript is utility for userscript runtime environments, such as Greasemonkey. It's useful for finding popular userscripts and checking userscript updates.Should you go Beyond Relational Databases? | Think Vitamin
Relational databases, such as MySQL, PostgreSQL and various commercial products, have served us well for many years. Lately, however, there has been a lot of discussion on whether the relational model is reaching the end of its life-span, and what may come after it.
Alternatives to SQL dbs - document, key-value, graph databasesbraindump: NOSQL debrief
braindump: NOSQL debrief
First ever meeting of the NoSQL community. Lists all the presentations that were given.No to SQL? Anti-database movement gains steam
No to SQL? Anti-database movement gains steam
The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party. The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive. Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data. "Relational databases give you too much. They force you to twist your object data to fit a RDBMS [relational database management system]," said Jon Travis, principal engineer at Java toolmaker SpringSource, one of the 10 presenters at the NoSQL confab (PDF). NoSQL-based alternatives "just give you what you need," Travis said. Open source rises up The movement's chief champions are Web and Java developers, many of whom learned to get by at their cash-strapped startups without Ora
The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party.
piece on an alternative approach to data managementA Comparison of Open Source Search Engines « zooie’s blog
a first step to investigate search engine.
Later this month we will be presenting a half day tutorial on Open Search at SIGIR. It’ll basically focus on how to use open source software and cloud services for building and quickly prototyping advanced search applications. Open Search isn’t just about building a Google-like search box on a free technology stack, but encouraging the community to extend and embrace search technology to improve the relevance of any application.up and running with cassandra :: snax
Cassandra is a hybrid non-relational database in the same class as Google's BigTable. It is more featureful than a key/value store like Dynomite, but supports fewer query types than a document store like MongoDB. Cassandra was started by Facebook and later transferred to the open-source community. It is an ideal runtime database for web-scale domains like social networks.Backup2Mail — Send MySQL database backup to your mailbox
Backup2Mail is mini PHP application that creates regular backups of your MySQL database and sends it to configurable e-mail address. The whole process is scheduled with a help of Cron, a Unix program that runs programs at scheduled times.
Send MySQL database backup to your mailbox
http://www.backup2mail.com/SQL Databases Are An Overapplied Solution (And What To Use Instead)
SQL Databases Are An Overapplied Solution (And What To Use Instead)Official Google Research Blog: Large-scale graph computing at Google
I want one of these! "We have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel). Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. "
So many things to learn and apply in business deals.
http://spinn3r.com/rankAdding Simplicity - An Engineering Mantra: Shard Lessons
No, not SHARED lessons, I mean SHARD lessons. I have to admit that until about a year ago I didn't really know the term shards in relation to databases. Now don't confuse that with not understanding how databases can be horizontally scaled. I was introduced to that concept and helped to define the various ways it can be done but we just called it splits. Regardless of what you call it, there are some interesting challenges that are introduced. The well known challenges of consistency are discussed ad nauseam, even by me, so I'm not going there with this article. But besides that, there are some other lessons to learn when applying the pattern to your data.
Worth reading just for the section on intelligently designing shard counts. Great discussion on picking counts that smooth your cost step functionHow b-tree database indexes work and how to tell if they are efficient (100' level) | mattfleming.com
A team member thought we should add an index on a 90 million row table to improve performance. The field on which he wanted to create this index had only four possible values. To which I replied that an index on a low cardinality field wasn't really going to help anything. My boss then asked me why wouldn't it help? I sputtered around for a response but ended up telling him that I'd get back to him with a reasonable explanation.
Imported from http://twitter.com/newsycombinator/status/2645303258 How b-tree database indexes work and how to tell if they are efficient http://bit.ly/dd6mfTwitterAlikeExample - redis - Google Code
Case study on RedisFacebook, Hadoop, and Hive | DBMS2 -- DataBase Management System Services
Just wanted to add that even though there is a single point of failure the reliability due to software bugs has not been an issue and the dfs Namenode has been very stable. The Jobtracker crashes that we have seen are due to errant jobs - job isolation is not yet that great in hadoop and a bad query from a user can bring down the tracker (though the recovery time for the tracker is literally a few minutes). There is some good work happening in the community though to address those issues.
I few weeks ago, I posted about a conversation I had with Jeff Hammerbacher of Cloudera, in which he discussed a Hadoop-based effort at Facebook he previously directed. Subsequently, Ashish Thusoo and Joydeep Sarma of Facebook contacted me to expand upon and in a couple of instances correct what Jeff had said. They also filled me in on Hive, a data-manipulation add-on to Hadoop that they developed and subsequently open-sourced.4store - Scalable RDF storage
4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications
4store is a fast, scalable clustered RDF database
4store is an efficient, scalable and stable RDF database
4store, an efficient, scalable and stable RDF database 4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications. It has been providing the base platform for around 3 years. At times holding and running queries over databases of 15GT, supporting a Web application used by thousands of people.
"4store was designed by Steve Harris and developed at Garlik to underpin their Semantic Web applications. It has been providing the base platform for around 3 years. At times holding and running queries over databases of 15GT, supporting a Web application used by thousands of people."Social Media Brand Engagement Database - ENGAGEMENTdb
Want to know not just what companies are doing on the social web but how well they're doing it? We have brought you just the tool to measure and monitor brand engagement: for the first time ever, ENGAGEMENTdb ranks the world's most valuable brands based on how they leverage social media to interact with customers.My Thoughts on NoSQL - Die in a Fire - Eric Florenzano’s Blog
Over the past few years, relational databases have fallen out of favor for a number of influential people in our industry. I'd like to weigh in on that, but before doing so, I'd like to give my executive summary of the events leading up to this movement
Обзор нескольких опенсурсных нереляционных БД.
Thoughts on NoSQL, Tokyo Cabinet, CouchDB, Redis, and Cassandra.The Soldier in Later Medieval England
Database of soldiers who fought in wars during the Medieval era, including the Hundred Years War. Not sure how to use this just yet...
A team led by Dr. Adrian Bell and Prof. Anne Curry, with funding from the Arts and Humanities Research Council, have put up a stunning new database of military service records of medieval soldiers serving from 1369 and 1453: While the database’s primary purpose seems to be exploring the lives of individual soldiers of note, There are great many potential applications for large observation (large-n) quantitative studies of conflict and health. Variables in the database include: First Name, Last Name, Status, Rank, Captain’s Name, Commander’s Name, Year of Service, Nature of Activity, Reference Number, and Membrane. Read the project details for more information.HadoopDB Project
An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads.
HadoopDB is: 1. A hybrid of DBMS and MapReduce technologies that targets analytical workloads 2. Designed to run on a shared-nothing cluster of commodity machines, or in the cloud 3. An attempt to fill the gap in the market for a free and open source parallel DBMS 4. Much more scalable than currently available parallel database systems and DBMS/MapReduce hybrid systems. 5. As scalable as Hadoop, while achieving superior performance on structured data analysis workloadsDBMS Musings: Announcing release of HadoopDB (longer version)
my students Azza Abouzeid and Kamil Bajda-Pawlikowski developed HadoopDB. It's an open source stack that includes PostgreSQL, Hadoop, and Hive, along with some glue between PostgreSQL and Hadoop, a catalog, a data loader, and an interface that accepts queries in MapReduce or SQL and generates query plans that are processed partly in Hadoop and partly in different PostgreSQL instances spread across many nodes in a shared-nothing cluster of machines. In essence it is a hybrid of MapReduce and parallel DBMS technologies. But unlike Aster Data, Greenplum, Pig, and Hive, it is not a hybrid simply at the language/interface level. It is a hybrid at a deeper, systems implementation level. Also unlike Aster Data and Greenplum, it is free and open source.Filesuffix.com the filename extension database
sitio ideal para quienes se rompen la cabeza buscando el software indicado que abra un tipo de archivo desconocido. Su base de datos brinda información detallada de prácticamente cualquier extensión: tipo, categoría, descripción, software con el cual abrir el archivo, etc. Además si utilizas Firefox disponen de un search plugin que te facilitará las búsquedas. Uno de los mayores inconvenientes de esta web es que los resultados privilegian la descarga de software propietario por encima de opciones confiables y funcionables de software libre.
filename extension databaseChoosing a non-relational database; why we migrated from MySQL to MongoDB « Boxed Ice Blog
Type in some lyrics and the Lyric Rat will find the song for you. Also available as @LyricRat on Twitter. Tweet some lyrics.TheUserManualSite.com - We found it so you don't have to!
Locate hard-to-find user manuals, discover new features, and realize the potential of the products you rely on. ManualsOnline pairs self-help and product information with a growing community of engaged product owners.
use for manual help
TheUserManualSite.com - We found it so you don't have to!%postname%
MySQL is a widely used and fast SQL database server. It is a client/server implementation that consists of a server daemon (mysqld) and many different client programs/libraries. Here are very useful tips for all mysql DBA’s, Developers these tips are noted from MySQL Camp 2006 suggested by mysql community experts.
# # Don’t use DISTINCT when you have or could use GROUP BY
Don’t use deprecated featuresNoSQL: If Only It Was That Easy « Marked As Pertinent
Intéressant, une étude des différentes db alternatives sous l'angle de la scalabilité
data store scaling technologiesTweet Blocker - Cleaning up the Twitterverse
Herramienta para poder detectar qué followers pueden ser spam
Use TweetBlocker to remove spam from your account. Developer API available.
Cleaning up the TwitterverseRiak - A Decentralized Database
Riak combines a decentralized key-value store, a flexible map/reduce engine, and a friendly HTTP/JSON query interface to provide a database ideally suited for Web applications.Performance, Scalabilty and Architecture - Java and .NET Application Performance Management (dynaTrace Blog) » Understanding Caching in Hibernate - Part Two : The Query Cache
In the last post I wrote on caching in Hibernate in general as well as on the behavior of the session cache. In this post we will have a closer look at the QueryCache. I will not explain the query cache in details as there are very good articles like Hibernate: Truly Understanding the Second-Level and Query Caches.GFS: Evolution on Fast-forward - ACM Queue
Google File System
ACM Queue, August 7, 2009SQL pie chart | code.openark.org
Shown below is a (single query) SQL-generated pie chart. I will walk through the steps towards making this happen, and conclude with what, I hope you’ll agree, are real-world, useful usage samples.
ASCII art via SQL
uery) SQL-generated pie chart. I will walk through the steps towards making this happen, and conclude with what, I hope you’ll agree, are real-world, useful usage samples. +-------------------------------------------------------
Pie Chart in SQL
create an ascii art pie chart with a single sql query
SQL でアスキーアート的なもの。DataSF - DataSF - Liberating City Data
Why can't every city have this?
City of SF opens site containing datasets
"DataSF is a clearinghouse of datasets available from the City & County of San Francisco. While there is plenty of room for improvement, our goal in releasing this site is: 1) improve access to data, 2) help our community create innovative apps, 3) understand what datasets you'd like to see, 4) get feedback on the quality of our datasets."
"DataSF is a clearinghouse of datasets available from the City & County of San Francisco. While there is plenty of room for improvement, our goal in releasing this site is: (1) improve access to data (2) help our community create innovative apps (3) understand what datasets you'd like to see (4) get feedback on the quality of our datasets."Directed Edge - Home
Recommendations engine plug-in
Empfehlungs-Engine a ala AmazonConversie van postcode naar straat + woonplaats
Door de data van http://openkvk.nl bevat #6pp al meer dan 50% van het totaal aantal postcodes in Nederland. Dank aan JWvdV voor de import.Project 6PP ontsluit vrije geografische gegevens in Nederland. Plaatsen, postcodes, straten en geo-coördinaten zijn toegankelijk als wiki, webservice en downloads.
Project 6PP ontsluit vrije geografische gegevens in Nederland. Plaatsen, postcodes, straten en geo-coördinaten zijn toegankelijk als wiki, webservice en downloadsDevelopers: Never Mind the APIs, Here's YQL Execute - ReadWriteWeb
Read: Developers: Never Mind the APIs, Here's YQL Execute [feedly] http://tr.im/koyE [from http://twitter.com/krisnelson/statuses/1693267224]
RWW's @jolieodell dares to tackle the powerful beast that is the new YQL Execute http://bit.ly/J1gxO and so far has lived to tell the tale [from http://twitter.com/marshallk/statuses/1680054262]
...includes explanation of what YQL is, starting with: a sophisticated solution that is agnostic across all Internet platforms and that lowers both the burden of labor and the barriers to entry for social and other web application developersMake Firefox Faster by Vacuuming Your Database - Firefox - Lifehacker
Components.classes["@mozilla.org/browser/nav-history-service;1"].getService(Components.interfaces.nsPIPlacesDatabase).DBConnection.executeSimpleSQL("VACUUM");How XML Threatens Big Data : Dataspora Blog
Back in 2000, I went to France to build a genomics platform. A biotech hired me to combine their in-house genome data with that of public repositories like Genbank. The problem was the repositories, all with millions of records, each had their own format. It sounded like a massive, nightmarish data interoperability project. And an ideal fit for a hot new technology : XML
Three Rules for XML Rebels 1. Stop Inventing New Formats 2. Obey the Fifteen Minute Rule 3. Embrace Lazy Data Modeling
Un point de vue intéressant sur le xml, à rebours des conceptions en sciences de l'info (en tout cas les miennes)
Excellent thoughtful article on data bureaucracy and the limitations of XML.Adminer
Single-file PHP MySQL database administration tool
Adminer (formerly phpMinAdmin) is a full-featured MySQL management tool written in PHP. Conversely to phpMyAdmin, it consist of a single file ready to deploy to the target server.
an alternative to phpMyAdmincsharp-sqlite - Project Hosting on Google Code
Vacuum Firefox databases for better performance, now with no restart
Components.classes["@mozilla.org/browser/nav-history-service;1"].getService(Components.interfaces.nsPIPlacesDatabase).DBConnection.executeSimpleSQL("VACUUM");Petabytes on a budget: How to build cheap cloud storage | Backblaze Blog
Build your own custom Backblaze Storage Pods: 67 terabyte 4U servers for $7,867.みんなのきょうの料理 - NHK「きょうの料理」で放送された7年間の料理レシピや献立が探せる！
テレビ7年間のレシピを簡単検索！| みんなのきょうの料理HealthBase - Powered by NetBase
Health meta search engine
This site aggregates search results from all sorts of medical sites so you get a lot of info in one place.
- Powered by NetBaseDataMasher
Infográficos de dados públicos
1. Pick a data set - /> orange circle Poverty Rate 2. Choose an operator - /> choose: - × ÷ 3. Pick another data set - /> blue circle Unemployment Your Mashup! - /> venn diagram Poverty Rate Unemployment
To empower people to discover and discuss government data through manipulation and mapping.
DataMasher is a tool that takes these vast quantities of information and allows you to whittle it down into simpler terms, offering an easy way to get hard data on certain topics without any intrusive media spin.database_software [Internet Mindmap]
list of many database softwares and categorized
Liste de logiciels de bases de données classés par modèle de gestion de l'information (relationnel, XML, RDF...)What Visualization Tool/Software Should You Use? – Getting Started | FlowingData
birthdateDigg the Blog » Blog Archive » Looking to the future with Cassandra
answer is 3TB database???
"The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes."
Wow, cassandra uses a lot of disk space. Trade offs!Dare Obasanjo aka Carnage4Life - Building Scalable Databases: Denormalization, the NoSQL Movement and Digg
As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.
bit on why non-SQL dbs are used in social networking sitesWTF is a SuperColumn? An Intro to the Cassandra Data Model — Arin Sarkissian
Nice detailed examples on NoSQL data modeling in Cassandra.bobby-tables.com: A guide to preventing SQL injection
話題のサービスtwitterの総合ナビゲーションサイト。初心者のための使い方ガイド、おすすめアカウント、twitterを活用したユニークなサービスなどを総合的に紹介。ここに来ればtwitterの全てがわかる。MICDS Library | Home
libaryQuirkeyBlog » Blog Archive » Sammy.js, CouchDB, and the new web architecture
The goal of AggData is to play a small part in making this sought-out data more accessible, portable and reliable.
great source for aggregated data
AggData is short for aggregate data, which means a set of data that is collected together in one place. On this site, the AggData will come in the form of a list of records, where each record has details about a specific object in the group.
data aggregated by web scraping
another free data library.30 Resources to Find the Data You Need | FlowingData
Let's say you have this idea for a visualization or application, or you're just curious about some trend. But you have a problem. You can't find the data, and without the data, you can't even start. This is a guide and a list of sources for where you can find that data you're looking for. There's a lot out there. Universities Being a graduate student, I always look to the library for books and resources. Many libraries are amping up their technology and have some expansive data archives. Many statistics departments also tend to keep a list of data somewhere.漢(オトコ)のコンピュータ道: なぜMySQLのサブクエリは遅いのか。
というわけでMySQLによるサブクエリの処理について見てきたが、きちんと気をつけて使えばサブクエリも高速に実行される。もちろんJOINに書き換えた方が速いのは言うまでもないが、SQL文のメンテナンスし易さなどを考えるとサブクエリで処理を書きたい！という人も居るのではないだろうか。そんな方は次の事に気をつけてサブクエリを使って頂きたい。 * サブクエリの種類 * 外部クエリとサブクエリの評価の順序 * 外部クエリにおいてフェッチされる行数 * サブクエリで利用されるインデックス * テンポラリテーブルのサイズAmazon Web Services Blog: Don't Forget: You Can Use Amazon SimpleDB For Free!
"Graphs are ubiquitous. Social or P2P networks, thesauri, route planning systems, recommendation systems, collaborative filtering, even the World Wide Web itself is ultimately a graph! Given their importance, it’s surely worth spending some time in studying some algorithms and models to represent and work with them effectively. In this short article, we’re going to see how we can store a graph in a DBMS. Given how much attention my talk about storing a tree data structure in the db received, it’s probably going to be interesting to many. Unfortunately, the Tree models/techniques do not apply to generic graphs, so let’s discover how we can deal with them."How Google Taught Me to Cache and Cash-In | High Scalability
A user named Apathy in this thread on how Reddit scales some of their features, shares some advice he learned while working at Google and other major companies. To be fair, I [Apathy] was working at Google at the time, and every job I held between 1995 and 2005 involved at least one of the largest websites on the planet. I didn't come up with any of these ideas, just watched other smart people I worked with who knew what they were doing and found (or wrote) tools that did the same things. But the theme is always the same: # Cache everything you can and store the rest in some sort of database (not necessarily relational and not necessarily centralized). How do you go about applying this strategy?
ing caches is a clasisc strategy for milking your servers as much as possilbe. First look for an exact match. If that's not founPostgreSQL Tips and Tricks | gtuhl: startup technology
Here’s a dozen tips for working with a PostgreSQL database. It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help.
Here’s a dozen tips for working with a PostgreSQL database. It is a sophisticated and powerful piece of software and just knowing a few rules of thumb before diving in can be a huge help. If you want more detail read the amazing documention. My list of tips was very long so I just chopped off a dozen for this post.Sql Antipatterns Strike Back
Great presentation about good SQL practice.
interesting but it's a slideshow rather than an articleFactual
Service for basically creating shared databases: sounds quite interesting!
Factual is a platform where anyone can share and mash open data on any subject. For example, you might find a directory of California restaurants, a database of endocrinologists, or a list of American Idol finalists. We provide smart tools to help the community build and maintain a trusted source of structured data. And this data can be used through widgets and APIs to help application developers and content publishers be more innovative and productive.
open data edit
On line data collections
Data!msarnoff.org ChipDB - integrated circuit quick reference
Online reference for integrated circuits. Very quick reference (as opposed to loading up the 30 MB PDF which is a scan of photocopied document that look liked it was stored in somebody's back pocket).The End of a DBMS Era (Might be Upon Us) | blog@CACM | Communications of the ACM
"Relational database management systems (DBMSs) have been remarkably successful in capturing the DBMS marketplace. To a first approximation they are “the only game in town,” and the major vendors (IBM, Oracle, and Microsoft) enjoy an overwhelming market share. They are selling “one size fits all”; i.e., a single relational engine appropriate for all DBMS needs. Moreover, the code line from all of the major vendors is quite elderly, in all cases dating from the 1980s. Hence, the major vendors sell software that is a quarter century old, and has been extended and morphed to meet today’s needs. In my opinion, these legacy systems are at the end of their useful life. They deserve to be sent to the “home for tired software.” Here’s why."Google Public Sector
Tools for Public Sector
one-stop shop of tips and tools for the public sector from Google
Most people reach government and other public sector websites by using Google and other search engines. This site is a guide to the tools and best practices that can help you reach, communicate and engage with your community. Most of these tools are free, so they can also help you do more with less.
Google: Tools for Public Sector Organizations. Make your agency website, and the information it offers, easier to find.Internet Archive: A Future for Books -- BookServer
Referenced in Chronicle Wired
The widespread success of digital reading devices has proven that the world is ready to read books on screens. As the audience for digital books grows, we can evolve from an environment of single devices connected to single sources into a distributed system where readers can find books from sources across the Web to read on whatever device they have. Publishers are creating digital versions of their popular books, and the library community is creating digital archives of their printed collections. BookServer is an open system to find, buy, or borrow these books, just like we use an open system to find Web sites.
The BookServer is a growing open architecture for vending and lending digital books over the Internet. Built on open catalog and open book formats, the BookServer model allows a wide network of publishers, booksellers, libraries, and even authors to make their catalogs of books available directly to readers through their laptops, phones, netbooks, or dedicated reading devices. BookServer facilitates pay transactions, borrowing books from libraries, and downloading free, publicly accessible books.
This would be awesome to install on all of the school servers as part of plan ceibal.NoSQL: Distributed and Scalable Non-Relational Database Systems | Linux Magazine
Non-SQL oriented distributed databases are all the rage in some circles. They’re designed to scale from day 1 and offer reliability in the face of failures.
NoSQL: Distributed and Scalable Non-Relational Database Systems
lBusiness Information and News: Track, Connect and Share - Tracked.com
Today, we are proud to launch Tracked.com, a new kind of business service. Tracked.com is the only website in the world where business information, communications and connections come together to enhance your business life.
By http://bit.ly/Tweets2DeliciousCassandra and Ruby: A Love Affair? | Engine Yard Blog
"Most of today’s up and coming key-value stores are more than just simple key-value stores. You saw this when we looked at Tokyo Cabinet which, in addition to simple key-value capabilities, adds more sophisticated abilities, such as database-like tables. In this post we’ll look at Cassandra — a modern key-value store that continues this trend. Cassandra was originally developed by Facebook and released to open source last year. The Facebook team describes Cassandra as (Google) BigTable running on top of an Amazon Dynamo-like infrastructure."
Most of today's and up and coming key-value stores are more than just simple key-value stores. Cassandra is a modern key-value store that continues this trend.Why I like Redis
Like mongodb but lives in memory with replication and periodic store-to-disk. Like memcached but with data structures. Great for non-critical data or replicated critical data.Facebook | Engineering @ Facebook's Notes
Portal que intenta hacer pública información sobre los secretos de Washington DC
RT @cshirky: RT THIS is a big deal. OpenSecrets.org releases 200 million [gov't] data records. Today. http://bit.ly/fdXS [from http://twitter.com/danielgillval/statuses/1512025784]
Measuring Link-Bait of Articles I have flagged in the past.
OpenSecrets.org opens up its data -- feel free to mashup information on campaign finacnce, lobbying, personal finances and much moreAmazon Relational Database Service (Amazon RDS)
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS gives you access to the full capabilities of a familiar MySQL database. This means the code, applications, and tools you already use today with your existing MySQL databases work seamlessly with Amazon RDS. Amazon RDS automatically patches the database software and backs up your database, storing the backups for a user-defined retention period. You also benefit from the flexibility of being able to scale the compute resources or storage capacity associated with your relational database instance via a single API call. As with all Amazon Web Services, there are no up-front investments required, and you pay only for the resources you use.
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business.MySQL
Amazon Relational Database Service (Amazon RDS) is a web service that makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while managing time-consuming database administration tasks, freeing you up to focus on your applications and business. Amazon RDS gives you access to the full capabilities of a familiar MySQL database.MongoDB: A Light in the Darkness! (Key Value Stores Part 5) | Engine Yard Blog
Really interesting article about mongoDB and about the installation procedure
"MongoDB can be thought of as the goodness that erupts when a traditional key-value store collides with a relational database management system, mixing their essences into something that’s not quite either, but rather something novel and fascinating. -- MongoDB support is available in many languages, making it a good choice for a system that has to work in a polyglot environment; all of the major languages have support."レプリケーションしてるMySQLで、マスタやスレーブが障害停止した場合のリカバリプラン - (ひ)メモ
mysqlのレプリケーションを用いた構成について書いたblog記事。まだきちんと読んでいない。 フェールオーバー時のデータの完全性を保証しない構成。 想定する運用フローが書いてあるので、自分で考えるときの参考にするdata.australia.gov.au – beta
data.australia.gov.au is the home of Australian government public information datasets. We encourage you to make government information even more useful by mashing-up the data to create something new and exciting! Make sure you pay attention to the licence attached to the datasets you are interested in using.
data.australia.gov.au is the home of Australian government public information datasets. Like Data.gov, it has a wide variety of downloadable government data on topics such as crime, weather, and public lands--as well as some very Australian topics, such as the location and attributes of barbecues on public lands.
the home of Australian government public information datasets. We encourage you to make government information even more useful by mashing-up the data to create something new and exciting! Make sure you pay attention to the licence attached to the datasets you are interested in using. Each licence should make clear what you can and can’t do with the data. If you’re unsure, please contact the contributing agency.
data.australia.gov.au is the home of Australian government public information datasets. We encourage you to make government information even more useful by mashing-up the data to create something new and exciting! Make sure you pay attention to the licence attached to the datasets you are interested in using. Each licence should make clear what you can and can’t do with the data. If you’re unsure, please contact the contributing agency.Open database life: MyISAMとInnoDBのどちらを使うべきか
自分は、特別な事情が無い限り、5.1最新版に含まれるInnoDB Pluginを勧めています。fault-tolerance.png (PNG Image, 784x393 pixels)
(PNG-Grafik, 784x393 Pixel)
All I want for Christmas is a SQL database with no JOINs, secondary indexes, UNIONs, views, character sets or anything else. Just exact and range primary key lookups, GROUP BY, ORDER BY, LIMIT and SQL_CALC_FOUND_ROWS.AMIS Technology blog » Blog Archive » Oracle RDBMS 11gR2 - Solving a Sudoku using Recursive Subquery Factoring
Solving a Sudoku using Recursive Subquery Factoring
AMIS Technology blog » Blog Archive » Oracle RDBMS 11gR2 - Solving a Sudoku using Recursive Subquery FactoringDare Obasanjo aka Carnage4Life - Facebook Seattle Engineering Road Show: Mike Shroepfer on Engineering at Scale at Facebook
Article summarizing presentation by Facebook on some of their scaling challenges and solutions.Rackspace Cloud Computing & Hosting | NoSQL Ecosystem
Good introduction to the "NoSQL" space (initially not a fan of the term, but I guess it is going to stick...), highlighting the different designs used by the options in the space, and the benefits/drawbacks of those designs.
Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as “NoSQL databases.”How to Secure Your New WordPress Installation | Digging into WordPress
One of the best ways to ensure strong security for your WordPress-powered site is to secure its foundations during the installation process. Of course these techniques can be implemented at any point during the life of your site, but stetting them before the game starts prevents headaches and saves time. We’ll start with the WordPress database.
One of the best ways to ensure strong security for your WordPress-powered site is to secure its foundations during the installation process. Of course these techniques can be implemented at any point during the life of your site, but stetting them before the game starts prevents headaches and saves time. We’ll start with the WordPress database
ne of the best ways to ensure strong security for your WordPress-powered site is to secure its foundations during the installation process. Of course these techniques can be implemented at any point during the life of your site, but stetting them before the game starts prevents headaches and saves time. We’ll start with the WordPress database..SQL Databases Don't Scale
"Sharding kills most of the value of a relational database."
sql database db(特にMyISAMを使っていた)ウェブ屋さんがInnoDBを使う場合の設定項目 - kazuhoのメモ置き場
sudo hdparm -W 0 /dev/sdaJonathan Ellis's Programming Blog - Spyced: CouchDB: not drinking the kool-aid
Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages. It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.pskomoroch's dataset Bookmarks on Delicious
Resource list of public datasetsJet Profiler for MySQL
Is real-time query performance and diagnostics tool for the MySQL database server.
Java desktop graphical MySQLprofiler. Free version.
Real-time query performance and diagnostics tool for the MySQL database server.Fixing Poor MySQL Default Configuration Values (by Jeremy Zawodny)
4 tips buenos para mejorar el desempeño de MySQL.
MySQL configuration variables that have defaults which have proven to be problematic in a high-volume production environmentデータベースの基礎を理解しよう！ プログラミング未経験から始めるPHP入門：CodeZine
Copyright Watch, hosted by the Electronic Frontier Foundation, is designed for the purposes of sharing and comparing the copyright laws of countries around the world. As the world has become connected through the Internet the creation and global sharing of content has become very easy. At the same time the misuse of copyrighted content has become easier too. Sometimes copyright violations may be the result of conflicting copyright laws. Copyright Watch aims to provide a place where copyright laws can be compared and changes to copyright laws can be updated. Applications for Education Copyright Watch could be useful for teaching about the differences between copyright laws. Copyright Watch might also be useful as a part of a discussion about the purpose of copyright laws.
Copyright Watch collects and monitors copyright laws from all over the world.
Global Transparency in Copyright Law. "Copyright Watch was begun by an international group of copyright experts, drawn from the Access to Knowledge community. We’d like to thank Corporacion Innovarte, the Electronic Frontier Foundation, Electronic Information for Libraries (eIFL.net), the International Federation of Library Associations, Professor Michael Geist, the Third World Network, and the Bangalore Centre for Internet and Society for their support."
Fabulous website to checkout if you are unsure of what copyright laws exist in which countries?実録、ほぼ無停止なMySQLのフェイルオーバ (動画もあるよ) - (ひ)メモ
keepalived --vrrp で、マルチマスターフェイルオーバーするUnder the Covers of the Google App Engine Datastore (2008 Google I/O Session Videos and Slides)
Presentation on how googleapps datastore implements filtering and sorting on top of bigtable. Basically, all queries are translated to bigtable prefix scans or range scans, without needing any in-memory postprocessing, all rows returned from the scan are relevant to, and in order, for the query. There's a built-in 'single property index' (or two actually: one asc and one desc) which can obviously be used for single-property searches, but also for queries consisting of only equals clauses, by doing multiple range scans and taking the intersection (not sure at which level this happens). More complex queries need specific pre-defined indexes. Index tables only have keys, no columns with values. Indexes are updated synchronously, so everything stays consistent (at the cost of contention problems?). Some mention of string-byte considerations when doing range queries. No fulltext queries. Ends with some talk on transactions.EXPLAIN EXTENDED: efficient database queries in SQL.
JOINs em MysqlLawnchair
A client side JSON document store. Want to see this in Node.JS.
Sorta like a couch except smaller and outside, also, a client side JSON document store. Perfect for webkit mobile apps that need a lightweight, simple and elegant persistence solution.
"Sorta like a couch except smaller and outside, also, a client side JSON document store. Perfect for webkit mobile apps that need a lightweight, simple and elegant persistence solution."Top 20+ MySQL Best Practices - Nettuts+
5. Index and Use Same Column Types for Joins
** Posted using Viigo: Mobile RSS, Sports, Current Events and more **
20 Best practicesPragmatic Programming Techniques: NOSQL Patterns
A nice overview of some of the more popular patterns in NoSQL architecturePerformance, Scalability and Architecture - Java and .NET Application Performance Management (dynaTrace Blog) » Understanding Caching in Hibernate - Part Three : The Second Level Cache
Understanding Caching in Hibernate – Part Three : The Second Level Cache Performance, Scalability and Architecture – Java and .NET Application Performance Management (dynaTrace Blog)
In particular I read a whitepaper several years ago a
In the last posts I already covered the session cache as well as the query cache. In this post I will focus on the second-level cache. The Hibernate Documentation provides a good entry point reading on the second-level cache. The key characteristiTop 20+ MySQL Best Practices - Nettuts+
Database operations often tend to be the main bottleneck for most web applications today. It's not only the DBA's (database administrators) that have to worry about these performance issues. We as programmers need to do our part by structuring tables properly, writing optimized queries and better code. Here are some MySQL optimization techniques for programmers.» Scalable Web Applications Programming the new world: Programming your life and the net, one day at a time
Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.Introducing Redis: a fast key-value database | Zen and the Art of Programming
Tokyo Cabinet is a trove of hidden of gems, the more you learn about it, the more you will appreciate the design and technical decisions behind it. By database standards it is a young project (started in 2007), but since it is a successor to the QDBM project developed by Hirabayashi-san (2000-2007), we could make the argument that it has been, in fact, nine years in the making.漢(オトコ)のコンピュータ道: MySQLレプリケーションを安全に利用するための10のテクニック
2年に1回起きる可能性があるわけで、確かに何か対策した方がよいですね。 しかしケーブルや機材を取り替えて推奨値以上の環境を用意すれば2000万年に1回。10GbEthernetでも200万年に1回の確率ですね・・・それでも対策しないよりも対策した方がよいですが・・・ Okunoさんはバイナリログがネットワークによって化けた経験がおありなんですよね？
"1. マルチマスターレプリケーションを利用しない 非常によくある誤解なのだが、HAにしたいからといってマルチマスター構成にしているユーザをたまに見かける。マルチマスターとは２台のMySQLサーバで構成するトポロジのことで、２つのサーバが互いに相手のマスターかつスレーブとなりレプリケーションを行う。マルチマスターは両方のホストで更新が可能なのだが、片方のサーバ行われた更新は非同期でもう一方のサーバへ適用されるため、更新を行っている方のサーバがクラッシュした場合には更新が失われる可能性がある。"assertTrue( ): NoSQL Required Reading
Starting from Dynamo, ending with (roughly) follow @nosqlupdate on Twitter.
Materials that you need to read in order to get started with NoSQL
List of resources to read to get up-to-speed on the NoSQL movement.Harish Mallipeddi's Blog - CouchDB naked
Good explanation of how CouchDB indexes.
how couchdb b-trees work internallyWhy I think Mongo is to Databases what Rails was to Frameworks // RailsTips by John Nunemaker
Below are 7 Mongo and MongoMapper related features that I have found to be really awesome while working on switching Harmony, a new website management system by my company, Ordered List, to Mongo from MySQL.
The more I work with Mongo the more I am coming around to this way of thinking. I tell no lie when I say that I now approach Mongo with the same kind of excitement I first felt using Rails. For some, that may be enough, but for others, you probably require more than a feeling to check out a new technologyTop 20+ MySQL Best Practices | TuVinhSoft .,JSC
Alex Miller's technical blog on Java, concurrency, programming, design, languages, and more
Hibernate et la gestion du cacheNoSQL with MySQL in Ruby - Friendly
That's something that's a bit troublesome - if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.
Enabling a Google-like search from structured sources (databases)
Google and Yahoo approaching structued Web
Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.Yahoo!オークションでのMySQL 冗長化技術 (Yahoo! JAPAN Tech Blog)
よくある構成っちゃ、よくある構成。MySQL ClusterとかMySQL Proxyとかの完成度があがっていくとこの部分の構成がもっとシンプルかつ強いシステムになっていくのかも
「dual master + ソフトウエアロードバランサ + Nagios」「masterサーバーに障害があった場合には、(Nagios のイベントハンドラで)それに直結する slave のヘルスチェック用ポートを閉じる」
dual master, ソフトウェアロードバランサ, 仮想DNS, gethostbyname書き換え, masterが片方落ちたら直下のslaveもnagiosのイベントハンドラで落として一貫性保つData Sets | GroupLens Research
キーバリューストアの解説 「CAP定理」では、分散システムで以下の3つを同時に保証することは不可能であることが示されています。 * データの整合性（Consistency） * データの可用性（Availability） * データの分散化（Partition-tolerance）
Bigtable, SimpleDB, Tokyo TyrantSql Antipatterns Strike Back
"Common blunders of SQL database design, queries, and software development. Presented as a tutorial at the MySQL Conference & Expo 2009."
noneTrees In The Database - Advanced data structures
A presentation about modelling trees relationally and storing them in an SQL database.
Storing tree structures in a bi-dimensional table has always been problematic. The simplest tree models are usually quit
trees in databaseパブリックドメイン・クラシック
Nice! できるなら英訳してくれ。Vineet Gupta: NoSql Databases – Part 1 - Landscape
At Directi, we are taking a hard look at the way our applications need to store and retrieve data, and whether we really need to use a traditional RDBMS for all scenarios. This does not mean that we will eschew relational systems altogether. What it means is that we will use the best tool for the job – we will use non-relational options wherever needed and not throw everything at a relational database with a mindless one-size-fits-all approach. ... ... This post covers the current landscape of the NoSQL space. In a subsequent post, I intend to cover in more detail the various problem areas addressed by NoSQL systems and the specific algorithms used.
Really detailed description of a number of NoSQL solutions. Interesting reading on Cassandra and Voldemort.
This post covers the current landscape of the NoSQL space. In a subsequent post, I intend to cover in more detail the various problem areas addressed by NoSQL systems and the specific algorithms used.The Scale-Out Blog: Simple HA with PostgreSQL Point-In-Time Recovery
O mica documentatie cum sa faci un warm standby server PostgreSQL pentru HA ca sa replici o baza de dateFleetDB
FleetDB is a schema-free database optimized for agile development.Consensus Protocols: Two-Phase Commit at Paper Trail
Nice article on 2pcterrastore - Project Hosting on Google Code
how to store tree structure on RDB
階層構造SQL for Beginners - Nettuts+
BrowserCouch is an attempt at an in-browser MapReduce implementation.
One in particular now has grown out of hand so bad that we've decided to start from scratch for a whole new version. Why? Lets say you have 3000+ php files, and your boss says "Hrm, we're seeing some problems with performance. Can you display at the bottom of each page the # of queries you use on that page?" If you coded your entire project like the example above, you would be totally screwed. You would have to find each and every mysql_query() and add some counter at the end. It would be a managing Nightmare. So how cold you solve this problem?データセンターが「落ちる」ことを想定したグーグルのアーキテクチャ － Blog on Publickey
あとでDB設計の神ツール「ERMaster」なら、ここまでできる (1/3) - ＠IT
いくつかの無料で利用できるツールが提供されているので、筆者はそれらを利用していましたが、最近「ERMaster」と呼ばれるEclipseプラグインの存在を知りました。 ERMasterは、ほかのツールに比べ、直感的で分かりやすいUI（ユーザーインターフェイス）に、カスタマイズ可能な、Excelで出力できるテーブル定義書、辞書機能など痒いところに手が届くERモデリングのツールです。本稿では、このERMasterについてご紹介します。World Government Data | guardian.co.uk
Tehgrauniad's search engine for government data sets.
more info : http://www.guardian.co.uk/news/datablog/2010/jan/07/government-data-world
The one-stop shop for World Government datasets from The Guardian.
Buscador de datos gubernamentales mundiales de The Guardian
Governments around the globe are opening up their data vaults – allowing you to check out the numbers for yourself. This is the Guardian’s gateway to that information. Search for government data here from the UK (including London), USA, Australia and New Zealand – and look out for new countries and places as we add them.Collections Search Center, Smithsonian Institution
Launched in Jan. 2010 SI is a new collections search center that contains more than 2 million searchable records and 265,900 resrouces (including images, videos, sound files, and electronic journals) from the Smithsonian's libraries, archives, and museums.
Search over 2 million records with 265,900 images, video and sound files, electronic journals and other resources from the Smithsonian's museums, archives
Federated search for the Smithsonian's museums archives, and libraries for images, video, sound files, and electronic journals
SIRIS - Smithsonian Institution Research Information System
recommended by slaJames on Software | Introducing Friendly: NoSQL With MySQL in Ruby
I've been a big proponent of NoSQL for a while. I have played with just about all of the new generation of data stores. We almost got cassandra running in production once, and we've been running mongodb in production for about six months now. But, here's the thing: as awesome as these new dbs are, they're still young. Our app generates a ton of data and gets pretty serious traffic. So, we started hitting walls quickly. To make a long story short, we decided to fall back to MySQL. It's battle hardened. We know its production characteristics and limitations. Backups are a science. We know we can count on it. But, we have a lot of data, and adding fields and indexes was starting to get painful. Flexible schemas are one of the things that attracted me to NoSQL in the first place. Then, I remembered this article about How FriendFeed uses MySQL to store schema-less data. So, I decided to implement the system they describe in the article. Since we put Friendly in to production, we've seen
Friendly makes MySQL look like a document store. When you save an object, it seralizes all of its attributes to JSON and stores them in a single field. To query your data, Friendly creates and maintains indexes in separate tables. It even has write-through and read-through caching built right in.
Introducing Friendly: NoSQL With MySQL in Ruby Dec 16 2009 I've been a big proponent of NoSQL for a while. I have played with just about all of the new generation of data stores. We almost got cassandra running in production once, and we've been running mongodb in production for about six months now.Unlocking innovation | data.gov.uk
"Advised by Sir Tim Berners-Lee and Professor Nigel Shadbolt and others, government are opening up data for reuse. This site seeks to give a way into the wealth of government data and is under constant development. We want to work with you to make it better. We’re very aware that there are more people like you outside of government who have the skills and abilities to make wonderful things out of public data. These are our first steps in building a collaborative relationship with you.[...]"
ça y est ! le site open data UK est public !
Advised by Sir Tim Berners-Lee and Professor Nigel Shadbolt and others, government are opening up data for reuse. This site seeks to give a way into the wealth of government data and is under constant development. We want to work with you to make it better. We’re very aware that there are more people like you outside of government who have the skills and abilities to make wonderful things out of public data. These are our first steps in building a collaborative relationship with you.9 Tips For Working with MySQL Databases » DevSnippets
Getting all rows from one table and only the latest from the child table
1) Selecting all the values from a table for a particular date 2) Search all columns in all the tables in a database for a specific value 3) Splitting string values 4) Select all rows from one table that don't exist in another table 5) Getting all rows from one table and only the latest from the child table 6) Getting all characters until a specific character 7) Return all rows with NULL values in a column 8) Row values to column (PIVOT) 9) Pad or remove leading zeroes from numbers 10) Concatenate Values From Multiple Rows Into One Column分散Key-Valueストア「kumofs」を公開しました！ - 古橋貞之の日記
How to unit test NHibernate code using an in memory SQL Lite database.
Unit Testing NHibernate from Ayende5 useful PHP functions for MySQL data fetching - AnyExample.com
Muy, interesante, recomendaciones para recuperar información de mysql13 Useful WordPress SQL Queries You Wish You Knew Earlier | Onextrapixel - Showcasing Web Treats Without Hitch
WordPress is driven by a MySQL database. This is something active WordPress users would know. However, if you only just read about it here from us, here’s what you should know. MySQL is a free relational database management systemBytepawn - Scalable Web Architectures and Application State
Note about Code-State-Cache-Data (CSCD) pattern in scalable web applications.
Short Article propounding the use of a "Code-State-Cache-Data-Architecture" (CSCD) instead of just CD or CCD applications. Basically saying that you should forget about stateful apps if you wan't maximum performance...
Application state - Data you can restore from the database or afford to lose if server is restarted (logged in users). He recommends storing this in-memory. "Application state goes into an in-memory key-value store like Tokyo Tyrant. Cache data goes into Memcached. Persistent data goes into a database"
"What he needs is the insight to identify state, cached data and persistent data in his application. Application state goes into an in-memory key-value store like Tokyo Tyrant. Cache data goes into Memcached. Persistent data goes into a database. Note that the seperation of code and application state may be beneficial later, because it allows you to scale easily by adding new memory servers. ... Let's call this the Code-State-Cache-Data (CSCD) pattern. What Damian originally had was a Code-Data (CD) pattern, and later he optimized to get a Code-Cache-Data (CCD) pattern"Four ways to optimize paginated displays | MySQL Performance Blog
A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples. You know the design pattern: display 20 results in some most-relevant order. Show a "next" and "previous" link. And usually, show how many items are in the whole list and how many pages of results there are. Rendering such a display can consume more resources than the entire rest of the site! As an example, I'm looking at slow log analysis results (with our microslow patches, set to log all queries) for one client; the slow log contains 6300 seconds' worth of queries, and the two main queries for the paginated display consumed 2850 and 380 seconds, respectively.
Rendering such a display can consume more resources than the entire rest of the site!
A paginated display is one of the top optimization scenarios we see in the real world. Search results pages, leaderboards, and most-popular lists are good examples.Tuning MySQL Performance with MySQLTuner | HowtoForge - Linux Howtos and Tutorials
Perl script for reporting back on your MySQL config.
Debugging and Tuning MySQL performance
Handy script to gather suggestions on mysql tuningBulletproof backups for MySQL | Carsonified
Great comment on using XFS and snapshots to reduce downtime.python-sqlparse - Google Code
sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.
sqlparse is a non-validating SQL parser for Python. It provides support for parsing, splitting and formatting SQL statements.
Support for parsing, splitting and formatting SQL statements.World Cinema Foundation
Watch restored films online. Clips available directly, but by the look of it, you have to register to see the whole movie
The World Cinema Foundation is a natural expansion of my love for movies. Seventeen years ago, together with my fellow filmmakers, we created The Film Foundation to help preserve American cinema. Much has been accomplished and much work remains to be done, but The Film Foundation has created a base upon which we can build. There is now, I believe, a film preservation consciousness.Mr. Moore gets to punt on sharding - (37signals)
I guess the conclusion is that there’s no use in preempting the technological progress of tomorrow. Machines will get faster and cheaper all the time, but you’ll still only have the same limited programming resources that you had yesterday. If you can spend them on adding stuff that users care about instead of prematurely optimizing for the future, you stand a better chance of being in business when that tomorrow finally rolls around.
VerkkoStadi Technologies is looking for a Hardcore PHP Developer. See more on the Job Board.データベースを用いたセッションデータ管理について - Slow Dance
Web アプリケーションとは切っても切れないセッション機構。DB ベースでセッション管理を行なって得られた知見と、それを元に考察した結果をまとめてみます。25+ Alternative & Open Source Database Engines
25+ Alternative & Open Source Database Engines
RT @tweetlicius: 25+ Alternative & Open Source Database Engines - http://bit.ly/cRDaOW
Free Web Resources Everyday - WebResourcesDepotCommon Queries Tree
Common MySQL Queries
Common MySQL Queries (Extending Chapter 9 of Get it Done with MySQL 5&6)
The New York Times did just that this afternoon when it announced that it has released a new Application Programming Interface (API) offering every article the paper has written since 1981, 2.8 million articles. The API includes 28 searchable fields and updated content every hour.
The New York Times announced that it has released a new Application Programming Interface (API) offering every article the paper has written since 1981, 2.8 million articles. The API includes 28 searchable fields and updated content every hour.paperplanes. A Collection Of Redis Use Cases
Redis' particular way of treating data requires some rethinking how to store your data to benefit from speed, atomicity and its data types. I've already written about Redis in abundance, this post's purpose is to compliment them with real-world scenarios. Maybe you can gather some ideas on how to deal with things.
Weil Redis praktisch ist.WordPress : 10+ life saving SQL queries
les SQL de base pour gérer son blog sur wordpress
Même si il ya beaucoup de choses que vous pouvez faire dans WordPress, parfois vous avez besoin d'une solution rapide pour corriger un problème spécifique. Dans ces cas, travailler directement sur la base de données peut être salvateur. Voici donc 10 requêtes SQL extrêmement utiles pour WordPress.A Comparison of Approaches to Large-Scale Data Analysis - MapReduce vs. DBMS Benchmarks
"The following information is meant to provide documentation on how others can recreate the benchmark trials used in our SIGMOD 2009 paper."
A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS BenchmarksLeo's Chronicle: ぜひ押さえておきたいデータベースの教科書
Mongo DB Production
Interesting blog post detailing production experiences with mongodb.Dealing with Duplicate Person Data - Proud to Use Perl
I've recently been working on a fairly large project that that has contact information for almost 2 million people. These records contain details for both online and offline actions. Since the data can come from multiple sources there exist many duplicate records. Duplicate records mean more processing for our code, more storage space and more hassle for our clients who have to deal with these duplicates. All in all, bad things to leave lying around. In this article we'll look at some strategies that I used to identify and remove these duplicates. All code in this article are samples, and we'll leave the task of assembling them into a final working program up to the reader. CPAN is your Friend Like all good Perl projects, we will make heavy use of the CPAN. It makes our lives so much easier and every day I'm more in awe at the quality and bredth of solutions I find there. For this project we'll be using Text::LevenshteinXS, Lingua::EN::Nickname and Parallel::ForkManager. What is a Du
Funny to see people still using perl these days but great exampleKnowledge Innovation For Technology In Education
creating databasesCassandra @ Twitter: An Interview with Ryan King « MyNoSQL
RT @kvz: Why Twitter is dropping MySQL in favor of Cassandra: http://bit.ly/dyeiXF
RT @DZone "Cassandra @ Twitter: An Interview with Ryan King « MyNoSQL" http://dzone.com/WbTY
MyNoSQL: Please include anything I’ve missed.Sphinx - text search The Pirate Bay way • The Register
and it's on track to become the open source world's canonical answer to the question of text search. MySQL and Solr, the two popular solutions, are showing their age. MySQL introduced full-text search in late 2000 as a way to more intelligently search blobs of text stored in databases. You can work a full-text clause into a query, and MySQL will rank the result rows by how relevant it thinks they are to the query. MySQL uses textbook search algorithms and doesn't allow for a lot of relevance tuning. It's like a drawing from a five year old: The heart is in the right place, but everybody knows that kids suck at drawing. Implementation details aside, MySQL still suffers from scalability problems. Having ignored the trend of chip manufacturers to build multiple cores into CPUs, hoping that this unpleasant trend that required them to actually think about multi-threading would just blow over sooner or later, MySQL's ability to handle parallelism is, well, see the five year old's drawing.
Sphinx can index 10 megabytes of data per second and can search up to 100 gigabytes of text on a single processor. It also supports multi-machine distributed searching, as in the case of Craigslist.Dennis Forbes on Software and Technology - Getting Real about NoSQL and the SQL-Isn't-Scalable Lie
SQL is Scalable and NoSQL Isn’t For Everyone The point is one that I think all rational people already realize: The ACID RDBMS isn’t appropriate for every need, nor is the NoSQL solution.
"[Though as Michael Stonebraker points out, SQL the query language actually has remarkably little to actually to do with the debate. It would be more clearly called NoACID]"PHP Tutorials Examples Introduction to PHP and MySQL
HyperGraphDB is a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.幕末・明治期 日本古写真メタデータ・データベース-[撮影対象から探す]
撮影対象から探す日本古写真集Key-Value Store勉強会に行ってきました - blog.katsuma.tv
"# LuxIO (ラックスIO)"# 普通のB+-tree # 特徴1 * mapped index * index部を全部mmap o index部を実メモリより小さいシステムが対象 # 特徴2 * 長いvalue * 4Gまで * node size(page size)をこえたvalueも余計なオーバーヘッドなしで扱える # 特徴3 * 効率的なappend * paddingなしでLinkedListのデータ構造 # SSDに向いてる？ # 使い道 * key-valともに小さいデータで構想なアクセスが必要な場合 * 実メモリ以下のデータベースという制約あり * 大きなvalueを扱いたい場合 * 大きなvalueをどんどん追記したい # 向かない処理 * 削除が多い処理 * 小さいデータをたくさんリンク o seekのオーバーヘッドが大きすぎる * Read,Writeの激しいアプリ # 分散はたぶんしない # Hashはつくるかも # read lockはなくしたい * 読み込みを重きをおく"
Key-Value型データ設計に関して。いくつかのシステムの特徴などのメモ。Urbantastic - Tech Tuesday: The Fiddly Bits
# My own setup.
An architectural approach that uses mostly static HTML and JSON, powered by CouchDB.
In my last post I promised to talk a little about the technology that underlies Urbantastic. It’s not the usual suspects, so it’s worth some explanation.
Annotated link http://www.diigo.com/bookmark/http%3A%2F%2Fblog.urbantastic.com%2Fpost%2F81336210%2Ftech-tuesday-the-fiddly-bits
10 sql tips to speed up your database http://bit.ly/9uIi6k #sql100 Time-Saving Search Engines for Serious Scholars | Online Universities
Undergraduates and grad students alike will appreciate the usefulness of these search engines that allow them to find books, journal articles and even primary source material for whatever kind of research they’re working on and that return only serious, academic results so time isn’t wasted on unprofessional resources.CouchDB with CouchRest in 5 minutes « The Merbist
The other night, during our monthly SDRuby meetup, lots of people were very interested in learning more about CouchDB and Ruby. I tried to show what Couch was all about but I didn’t have time to show how to use CouchDB with Ruby. Here is me trying to do that in 10 minutes or less. I’ll assume you don’t have CouchDB installed.
CouchDB with CouchRest in 5 minutes The other night, during our monthly SDRuby meetup, lots of people were very interested in learning more about CouchDB and Ruby. I tried to show what Couch was all about but I didn’t have time to show how to use CouchDB with Ruby. Here is me trying to do that in 10 minutes or less. I’ll assume you don’t have CouchDB installed.Try jLinq Online
カーリルは全国4300以上の図書館/図書室から現在の貸し出し状況を簡単に検索できるサービスです。データベースパフォーマンスに関する、僕が知りうる限り最高の教科書 - 山本大＠クロノスの日記
どこの現場に行っても正解を導く方程式は一緒なので応用が利く Oracle、SQLServer、MySQLと色々なDBのチューニングをしてきましたが、 どれもRDBの理論に基いているので基本原理を知ればチューニングは可能なはず インデックススキャンの種類や、実行計画の読み方もわかりやすく詳しいAre Commercial Databases Worth It? - Coding the Wheel
I've worked with expensive SQL Server and Oracle setups for most of my career. I've defended them viciously against all comers and contrarians. I've participated in late-night guerilla flame wars and drunken bar brawls. And I've sought out with relentless tunnel vision those pieces of propaganda which support my foregone conclusion: that SQL Server and/or Oracle are (or were) the best choices for the organization. I used to be a commercial database advocate. These databases have put food on my table for a dozen years, you see. I am (or was) what you might call an entrenched practicioner, not necessarily an expert, but a practicioner. And in the manner of entrenched practicioners around the world, I've treated you heretics with the sadistic undercutting and poisonous rancor you've deserved! "MySQL?" I would sneer. "PostgreSQL? Thanks, but this a serious project. We need a database we can depend on." Ahem.
googled "why pay for commercial database" and found this among other articles
Are Commercial Databases Worth It? http://bit.ly/Du96H (via @newsycombinator) [from http://twitter.com/tadej/statuses/1664387681]
Se cuestiona la idoneidad de escoger una BD comercial com Oracle o SQL Server frente a sus alternativas Open Source.ioannis cherouvim » Blog Archive » The * stupidest things I’ve done in my programming job
I don't aree with all of them, but still...Visual Guide to NoSQL Systems - Nathan Hurst's Blog
Good discussion in the comments as well.where is my milk from?
RT @dogfishbeer:Thanks to @beerwars and @foodinc care about beer & food; now find out where milk in you fridge is from http://bit.ly/cVyDhn!
You'd be surprised. Did you know different brands of milk often come from the same dairy - and the same cows? Often, the same dairy provides milk for store and brand names, only differentiating them by their label! Most dairy products, especially milk have a state and plant code. Go get the milk out of your fridge and, and find out which dairy it comes from.Explore the world of configurators! — Configurator-Database
This is the world's biggest configurator database, featuring over 500 web-based configurators.
This site is home to the world's biggest configurator database. Scan over 500 web-based configurators now and follow the up-to-date discussion of these configurators in our blog.eBay’s two enormous data warehouses | DBMS2 -- DataBase Management System Services
trics on eBay’s main Teradata data warehouse include: * >2 petabytes of user data
Millions of queries per day
Statistieken over de databaseverwerking van ebaymixi Engineers’ Blog » 3行でできる超お手軽全文検索
タグ検索と全文検索といえば、Tokyo Dystopiaが同じような機能を既に実現しています。TCにタグ検索と全文検索がサポートされたからもうTDは不要なのかと思われるかもしれませんが、そうではありません。転置インデックスのライブラリとしてはTDの方がはるかに効率的かつスケールする設計になっていて、また業務に必要なカスタマイズを容易にするためにシンプルな実装になっています。一方でTCの転置インデックスは、パフォーマンスやスケーラビリティではTDに劣りますが、ものすごく簡単に導入できることが特徴です。既にテーブルDBでデータの管理をしているならば、setindexホゲホゲという文を書くだけで1分以内に検索機能を強化することができるのですThe Apache Cassandra Project
une base données massivement parallèle et avec l'esprit "bigtable", provient de facebook
The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model.nothingmuch's most awesome Perl blog EVAR!!1one: Why I don't use CouchDB
Keep this as a reference to common couch FUD :)High Scalability - High Scalability - Digg: 4000% Performance Increase by Sorting in PHP Rather than MySQL
# # Scaling practices turn a relational database into a non-relational database. To scale at Digg they followed a set of practices very similar to those used at eBay. No joins, no foreign key constraints (to scale writes), primary key look-ups only, limited range queries, and joins were done in memory. When implementing the comment feature a 4,000 percent increase in performance was created by sorting in PHP instead of MySQL. All this effort required to make a relational database scale basically meant you were using a non-relational database anyway. So why not just use a non-relational database from the start?
As Digg started out with a MySQL oriented architecture and has recently been moving full speed to Cassandra, his observations on some of their lessons learned and the motivation for the move are especially valuable. Here are some of the key takeaways you find useful:
RT @Sebdz: RT: @programmateur: Digg: 4000 % performance increase by sorting in PHP rather than MySQL (via @mrboo) - http://bit.ly/ckma10
♻ @n1k0: "Scaling practices turn a relational database into a non-relational database" http://n1k.li/4v (via @nsilberman)
Typically for relatively static data sets, relatively low query volumes, and relatively high latency requirements.JDbMonitor - Monitor JDBC Performance For Slow SQL Queries
JDbMonitor is a tool to monitor & analyse database performance for any Java application. Easily determine your application's database performance and analyse problems down to specific SQL statement.
Tool for monitoring JDBC database activity
Monitor JDBC Performance For Slow SQL QueriesData Marketplace : Find, buy and sell data online
a place where one can buy and sell structured datasets online - e.g. the WAL MART Location in the US - weekly Oilprices since 1970. If a dataset is not available, you can request it and bid an amount with a set deadline for delivery
Find, buy and sell data onlineKazuho@Cybozu Labs: Pacific という名前の分散ストレージを作り始めた件
These are all mysql oriented, but it sure seems like there are some fantastic principles to be pulling from here. Ex "covering indexes; orders-of-magnitude improvements." Or one on optimizing disk i/o.
Slides from a great variety of sessions @ Percona Performance Conference 2009 (April)
slides of the conference are available onlinyI, Cringely . The Pulpit . Data Debasement | PBS
The second time through the Appistry team tossed the database, at least for its duties as a processing platform, instead keeping the transaction -- in fact ALL transactions -- in memory at the same time. This made the work flow into read-process-write (eventually). The database became more of an archive and suddenly a dozen commodity PCs could do the work of one Z-Series mainframe, saving a lot of power and money along the way.Data | The World Bank
Site regroupant un gros paquet de données de la banque mondiale.秒間120万つぶやきを処理、Twitterシステムの“今” － ＠IT
twitter DB fan out メール
TwitterのDB構成Redis tutorial, April 2010 - by Simon Willison
posted by thraxil: http://quimby.ccnmtl.columbia.edu/ircbot/web/?y=2010&m=04&d=26#20100426105402
These slides and notes were originally written to accompany a three hour Redis tutorial I gave at the NoSQL Europe conference on the 22nd of April 2010.mysqlでいちいちshow databasesとか打つのがめんどい→readlineのマクロで解決 - (ひ)メモ
Wow this is potentially huge! Thoughts? RT @timoreilly:Bulk Data Downloads:A Breakthrough in Government Transparency http://bit.ly/EizO3 [from http://twitter.com/jhelmus/statuses/1283585077]
On getting greater access to government documents and data, with an amendment now in the HouseThriving Organizational Patterns - Wagn
a Wikiwhere people write together + a Databasewhere people organize information + a Content Management Systemwhere people build cool websites = a Wagn.where people organize cool websites togethermixi Engineers’ Blog » DBMによるテーブルデータベース
Tokyo Cabinet DBM の使い方に関するチュートリアル
table databaseDatabase Versioning
Migrations bother me. On one hand, migrations are the best solution we have for the problem of versioning databases. The scope of that problem includes merging schema changes from different developers, applying schema changes to production data, and creating a DRY representation of the schema. But even though migrations is the best solution we have, it still isn’t a very good one.
Check the brainstorming at the end. I love where he's going. Short version: a schema.yml file identified by its SHA1 hash. Migrations are for translating data between versions. Great comments at the end by the smart people in the community.
On one hand, migrations are the best solution we have for the problem of versioning databases. The scope of that problem includes merging schema changes from different developers, applying schema changes to production data, and creating a DRY representation of the schema.HBase vs Cassandra: why we moved « Bits and Bytes.
HBase vs Cassandra: why we movedThe Twitter Engineering Blog: Introducing Gizzard, a framework for creating distributed datastores
In this third post of the series, I’ll talk about using dynamic ACLs: How to store an ACL in a database, and construct it from there when needed. This post builds on the things introduced in part 1 and part 2.Hibernate Performance Tuning | Javalobby
t Level Cache (aka Transaction layer level cache)
performance tuning tips for hibernate.Best articleSteve Huffman on Lessons Learned at Reddit | Carsonified
Steve Huffman on Lessons Learned at Reddit By Keir WhitakerReadings in Database Systems Web Supplement
This book is one of the fundamental database theory books available today. A list of the papers featured in the book, as well as various lecture notes, are listed. Need to track down some of these papers.4 Steps To a Professional Database Design | ProgrammerFish - Everything that's programmed!
Just as you require a blueprint to build a house, you will need a database blueprint in order to implement a database successfully .黒澤デジタルアーカイブ
The world greatest director
онлайн-архив акиры куросавы
Free Kurosawa movies
Opened last year by Kyoto’s Ryukoku University, the archive honors Akira Kurosawa, Japan’s celebrated filmmaker who brought us The Seven Samurai, Rashomon, Ikiru, etc. and won an Oscar for Lifetime Achievement in 1989. What will you find here? A good 20,000 items. Screenplays, manuscripts, photos, sketches, newspaper clippings, notes, etc. You won’t find a larger Kurosawa collection on the web.
Akira Kurosawa Digital ArchiveCassandra By Example | Rackspace Cloud Computing & Hosting
Maybe I should learn to use Cassandra someday.SQL Server 2005 Paging – The Holy Grail - SQL Server Central
The paging and ranking functions introduced in 2005 are old news by now, but the typical ROW_NUMBER OVER() implementation only solves part of the problem. Nearly every application that uses paging gives some indication of how many pages (or total records) are in the total result set. The challenge is to query the total number of rows, and return only the desired records with a minimum of overhead? The holy grail solution would allow you to return one page of the results and the total number of rows with no additional I/O overhead. In this article, we're going to explore four approaches to this problem and discuss their relative strengths and weaknesses. For the purposes of comparison, we'll be using I/O as a relative benchmark.Electrical What ?!
Frustrated by the difficulty of searching schematic symbols through long lists with little information led to the creation of Electrical What !?, a database of electronic components. Electrical What !? displays all electronic components in a easily scanable and cataloged format. However what truly sets Electrical What !? apart from your average reference book is the ability to search by appearance. Using these tools Electrical What !? hopes to make looking for electronic symbols a breeze.
Electrical symbols for schematics and print readingPillbox - prototype pill identification system
enables to search unknown solid-dosage medications (tablets/capsules) based on physical characteristics and images. The system combines high-resolution images of tablets and capsules with FDA-approved appearance information (imprint, shape, color, etc.) to enable users to visually search for and identify an unknown solid dosage pharmaceutical. This system is designed for use by emergency physicians, first responders, other health care providers, Poison Control Center staff, and concerned citizens. Annotated link http://www.diigo.com/bookmark/http%3A%2F%2Fpillbox.nlm.nih.gov
A site that identifies pills無いから作った人たち：ITpro
"memcachedの特徴は、データをキャッシュするメモリーとして、通常のPCサーバーの物理メモリーを利用すること。大容量データを複数のPCサーバーのメモリーに分散しておくために、「キー・バリュー型データストア」と呼ぶ方法を採用している。データをいったん非正規化し、「キー」とそれに対応する「値（バリュー）」にしてから保存する。データをキーと値の組み合わせにすることで、複数のサーバーに分散しておける。"README - redis - Google Code
a database implementing a dictionary, where every key is associated with a value. every single value has a type. The following types are supported: * Strings * Lists * Sets * Sorted Set (since version 1.1)
maybe the guy is not suitable to address such compare?
Persistent in-memory key value database compared to memcached
tructures and algorithms. Indeed both algorithms and data structures in Redis are properly choosed in order to obtain the best performance.blog.TBODA.com | 5 Useful SQL Server Scripts
バッチVegetarian Recipe Search - Find Vegetarian Recipes
Recommended by Lifehacker
Find Vegetarian recipes with this vegetarian recipe search engine.Stack Overflow Creative Commons Data Dump - Blog - Stack Overflow
Awesome, Stack Overflow release all of their public web data under a CC license.mixi Engineers’ Blog » 100行のCプログラムでWebチャットを実装する方法
Tokyo CabinetFree Geolocation API tool : CodeDiesel
Using PHP and cURL, pinging for the details of an IP address to nail down the city, country, zip, latitude, longitude etc, of a visitor
http://iplocationtools.comWeb λ.0 - Functional programming for the Web: Sky is the limit
Using tokyocabinet as backing store for Mnesia
This is only the 3rd blog post I found about mnesiaex and support for tokyocabinet. The comments are worth reading!Life is beautiful: マルチスレッド・プログラミングの落とし穴、その２
bookmark してなかったのか… >そう考えると、私にはCreate/Update/Deleteのリクエストに対して、クライアントを待たせながら（つまり、HTTP Requestの処理に必要なスレッド・プロセスを保持したまま）データベースに変更をかけることが根本的に間違っているように思える。 これは同感なんだが、非同期にして comet 的に処理するとしても、他のリクエストとの整合性が必要なケースは存在するので、そこを確実にする配慮が必要になる筈。
問題の分割。実装詳細は詳しいのがほかにいくらでもあると思うSimple Wins : Daytime Running Lights
Background on jchrisa's Toast (standalone chat app in CouchDB+JS+HTML)
The point is to show how CouchDB's "databasey" features, because they are implemented using HTTP, can be leveraged to make powerful end-user experiences, with just a minimum of code.Map Fields - A Rails plugin to ease the importing of CSV files @ Ramblings on Rails
a very nice solution for importing csv files, it handles the mapping which is a very common problem
OMFG!!!! Manbabies are ready!!!!
Ease way to import csv files「キー・バリュー型データストア」開発者が大集合した夜：ITpro
キー・バリュー型データストア（またはキー・バリュー型データベース）は、大量のユーザーとデータを抱え、データベースのパフォーマンス問題とコスト高に頭を悩ませるWeb企業が注目する技術である。mmalone's django-caching at master - GitHub
"Mike Malone shares code used by Pownce to add QuerySet level caching to Django. It’s a smart implementation—a CachingQuerySet class inspects the arguments passed to get(), and if they’re just a straight forward exact PK lookup hits memcache for the object before hitting the database. Signals are used to invalidate the cache."
Some examples of transparently caching things in Django. An example Django app that uses custom managers, fields, and QuerySets to transparently cache objects.
Some examples of transparently caching things in Django.
mmalone's django-caching appWhat is data science? - O'Reilly Radar
The future belongs to the companies who figure out how to collect and use data successfully. In this in-depth piece, O'Reilly editor Mike Loukides examines the unique skills and opportunities that flow from data science.
aspects Business Intelligence, Text Mining, and other statistical analysisSQLike - a small query engine
SCALABLE, OPEN-SOURCE SQL DBMS WITH ACIDMySQL Format Date | date_format Tool
DESIGNFamilySearch.org - Family History and Genealogy Records
Beta program to digitize records.NoSQL at Twitter (NoSQL EU 2010)
A discussion of the different NoSQL-style datastores in use at Twitter, including Hadoop (with Pig for analysis), HBase,
Twitters NoSQL slides
A discussion of the different NoSQL-style datastores in use at Twitter, including Hadoop (with Pig for analysis), HBase, Cassandra, and FlockDB.
cassandra,thrift, hdfs, hbase, scribe,pig,lzo, flockdb
interesting presentation on #NoSQL at #twitter by @kevinweil http://bit.ly/99h8BK [from http://twitter.com/behi_at/statuses/13587582774]Falsehoods Programmers Believe About Names: MicroISV on a Shoestring
This blog is about the business aspects of running Bingo Card Creator, a small software company. A brief summary of the last few years is available here. If you like what you see, I encourage you to sign up for the RSS feed. Thanks for visiting!怪異・妖怪画像データベース
Base de données des monstres de la mythologie japonaise.
怪異・妖怪画像データベース 怪異・妖怪画像データベース Copyright (c)2010- International Research Center for Japanese Studies, Kyoto, Japan. All rights reserved. はてなブックマーク - 怪異・妖怪画像データベース はてなブックマークに追加 gin-oi2 gin-oi2 データベース, **お役立ち, *webサービス いつか、役に立つときが来るかも…CriticalPast.com: Search over 57000 videos and 7 million photos
View more than 57,000 historic videos and 7 million photos for FREE in one of the world's largest collections of royalty-free archival stock footage. Offering immediate downloads in more than 10 formats starting at just $1.97 (Consumer); $30 (Pro).
archive of "over 57000 videos and 7 million photos" especially of the mid-20th century (1930s to '60s) - "offering imediate downloads in more than 10 formats starting at just $1.97 (Consumer); $30 (Pro)."Getting Started with HTML5 Local Databases « Dark Crimson Blog
HTML 5 local databases. Gives step by step instructions for setting one up!
Starting with Safari 4, iPhone/iPad OS3, Chrome 5, and Opera 10.5 (Desktop), HTML5 Local Databases are now supported. I’ve been reading about local databases for quite some time and decided to do a write up with some basic examples on how to get started.
14 Starting with Safari 4, iPhone/iPad OS3, Chrome 5, and Opera 10.5 (Desktop), HTML5 Local Databases are now supported. I’ve been reading about local databases for quite some time and decided to do a write up with some basic examples on how to get started.Membase.org
For those familiar with memcached, membase provides on-the-wire protocol compatibility, but adds disk persistence; hierarchical storage management; data replication; live cluster reconfiguration and rebalancing; and secure multi-tenancy with data partitioning. Like memcached, membase is simple, fast and elastic.
Persistent Key/Value Storage
Membase is an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput.
from oreilly news link
Membase is an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. It scales linearly from a single-server deployment to a cluster of thousands of machines. And because membase does not require creation of a schema before storing data, it is a flexible, cost-effective place to Store Lots of Stuff.
Membase is an open-source (Apache 2.0 license) distributed, key-value database management system optimized for storing data behind interactive web applications. These applications must service many concurrent users; creating, storing, retrieving, aggregating, manipulating and presenting data in real-time. Supporting these requirements, membase processes data operations with quasi-deterministic low latency and high sustained throughput. It scales linearly from a single-server deployment to a cluster of thousands of machines. And because membase does not require creation of a schema before storing data, it is a flexible, cost-effective place to Store Lots of Stuff. The original membase source code was released as Open Source by NorthScale, Zynga and NHN to membase.org in June 2010.Why you Should be using PHP’s PDO for Database Access | Nettuts+
Many PHP programmers learned how to access databases by using either the mysql or mysqli extensions. Since PHP 5.1, there’s been a better way. PHP Data Objects (PDO) provide methods for prepared statements and working with objects that will make you far more productive!
This is an eazy to learn tutorial for PDO.
Many PHP programmers learned how to access databases by using either the mysql or mysqli extensions. Since PHP 5.1, there's been a better way. PHP Data Objects
ripe for SQL Injection!A fast, fuzzy, full-text index using Redis | PlayNice.ly
PlayNice.ly is entirely based on a data-structure server called Redis. Redis is one of several new key-value databases which break away from traditional relational data architecture. It is simple, flexible, and blazingly fast. So why not use the tools we have already?
redis.smembers("word:" + metaphone("python"))
Interesting post about being able to search data in redis using indexing and phonetic algorthms.5 Rails Plugins to Help Optimize Your MySQL | Purify Blog
Bullet / SlimScrooge/ Query Reviewer / Rails Indexes / Ambitious Query IndexerPHPで大規模ブラウザゲームを開発してわかったこと
"an open data cleansing tool"
Freebase Gridworks is a power tool that allows you to load data, understand it, clean it up, reconcile it internally, augment it with data coming from Freebase, and optionally contribute your data to Freebase for others to use. All in the comfort and privacy of your own computer.Top 10 MySQL GUI Tools — DatabaseJournal.com
B上我们都建立了shard_001和shard_002两个逻辑数据库， Node-A上的shard_001和Node-B上的shard_001组成一个Shard，而同一时间只有一个逻辑数据库处于Active状态InfoQ: 又拍网架构中的分库设计
B上我们都建立了shard_001和shard_002两个逻辑数据库， Node-A上的shard_001和Node-B上的shard_001组成一个Shard，而同一时间只有一个逻辑数据库处于Active状态HASHCRACK.COM - Reverse Hash Lookup for MD5, SHA1, MySQL, NTLM and Lanman-Password-Hashes
Reflections on MongoDB -- http://bit.ly/aHCUC9Introduction to MySQL Triggers | Nettuts+