Pages tagged

はてなブックマーク全文検索機能の裏側 - moratorium

http://d.hatena.ne.jp/kzk/20081216

pfi++。どうやってるのか不思議に思ってた（index-fabricかfat-btreeみたいなやつかと）。csaのロードバランス方法はどういうアルゴリズムなのかな(サーチ時はすべてのノードに問合Main page - Introduction to Genetic Algorithms - Tutorial with Interactive Java Applets

http://www.obitko.com/tutorials/genetic-algorithms/index.php

e area of genetic algorithms is very wide, it is not possible to cover everything in these pages. But you should get some idea, what the genetic algorithms are and what they could be useful for. Do not expect any sophisticated mathematicMIT’s Introduction to Algorithms, Lectures 17, 18 and 19: Shortest Path Algorithms - good coders code, great reuse

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-twelve/

Finding a Loop in a Singly Linked List

http://ostermiller.org/find_loop_singly_linked_list.html

Pode ser útil para a situação de detectar redirects ciclicos de páginas web!

many method of finding loop in singled list with code5 Problems of Recommender Systems - ReadWriteWeb

http://www.readwriteweb.com/archives/5_problems_of_recommender_systems.php

Earlier this week we posted a Guide to Recommender Systems, as part of our series on recommendation technologies. In this post we look at some of the challenges ...

Earlier this week we posted a Guide to Recommender Systems, as part of our series on recommendation technologies. In this post we look at some of the challenges in building or deploying a recommender system. And yes, Napoleon Dynamite is one of them.

5 punkter som er kritiske succesfaktorer for anbefalingssystemerWelcome to Pyevolve documentation ! — Pyevolve v0.5 documentation

http://pyevolve.sourceforge.net/index.html

Pyevolve was developed to be a complete genetic algorithm framework written in pure python.How Not To Sort By Average Rating

http://www.evanmiller.org/how-not-to-sort-by-average-rating.html

I always like it when people call out other people for bad math....

You are a web programmer. You have users. Your users rate stuff on your site. You want to put the highest-rated stuff at the top and lowest-rated at the bottom. You need some sort of "score" to sort by.新はてなブックマークでも使われてるComplement Naive Bayesを解説するよ - 射撃しつつ前転

http://d.hatena.ne.jp/tkng/20081217/1229475900

新はてなブックマークではブックマークエントリをカテゴリへと自動で分類しているが、このカテゴリ分類に使われているアルゴリズムはComplement Naive Bayesらしい。今日はこのアルゴリズムについて紹介してみる。

Complement Naive BayesJonathan Ellis's Programming Blog - Spyced: All you ever wanted to know about writing bloom filters

http://spyced.blogspot.com/2009/01/all-you-ever-wanted-to-know-about.html

Google's New Search Engine Rankings Place Heavy Emphasis on Branding : SEO Book.com

http://www.seobook.com/google-branding

Graph of NP-Complete Problems

http://page.mi.fu-berlin.de/aneumann/npc.html

Leo's Chronicle: 正規表現に見切りをつけるとき

http://leoclock.blogspot.com/2009/01/blog-post_27.html

正規表現では対応できない構文解析。

見切りというよりは適材適所。正規表現は入れ子と相性が悪いのでちゃんと構文解析しましょうと。pHash.org: Home of pHash, the open source perceptual hash library

http://www.phash.org/

a fingerprint of an audio, video or image file that is mathematically based on the audio or visual content contained within. Unlike cryptographic hash functions which rely on the avalanche effect of small changes in input leading to drastic changes in the output, perceptual hashes are "close" to one another if the inputs are visually or auditorily similar

A method for producing hashes so that similar objects produce similar hashes. Applied mostly to audio and video. Doesn't appear to be a general theory. Rather, special cases for files such as pictures and audio are derived. This is an open source library that runs under linux only at the moment.Longest common subsequence

http://wordaligned.org/articles/longest-common-subsequence

Starting with a list of runners ordered by finishing time, select a sublist of runners who are getting younger. What is the longest such sublist?

Longest common subsequence

Taking a brief step back, this article is the third of a series. In the first episode we posed a puzzle: Starting with a list of runners ordered by finishing time, select a sublist of runners who are getting younger. What is the longest such sublist? In the second episode we coded up a brute force solution which searched all possible sublists to find an optimal solution. Although the code was simple and succinct, its exponential complexity made it unsuitable for practical use. In this episode we’ll discuss an elegant algorithm which solves our particular problem as a special case. On the way we’ll visit dynamic programming, Python decorators, version control and genetics.MIT’s Introduction to Algorithms, Lectures 20 and 21: Parallel Algorithms - good coders code, great reuse

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-thirteen/

Lectures

This is the thirteenth post in an article series about MIT’s lecture course “Introduction to Algorithms.” In this post I will review lectures twenty and twenty-one on parallel algorithms. These lectures cover the basics of multithreaded programming and multithreaded algorithms.Easy AI with Python (#115) - PyCon 2009 - Chicago - A Conference for the Python Community

http://us.pycon.org/2009/conference/schedule/event/71/

Survey several basic AI techniques implemented with short, open-source Python code recipes. Appropriate for educators and programmers who want to experiment with AI and apply the recipes to their own problem domains. For each technique, learn the basic operating principle, discuss an approach using Python, and review a worked out-example. We'll cover database mining using neural nets, automated categorization with a naive Bayesian classifier, solving popular puzzles with depth-first and breath-first searches, solving more complex puzzles with constraint propagation, and playing a popular game using a probing search strategy.

Probably the most beautiful code I have ever seen. Lovely algorithms in elegant style. "Survey several basic AI techniques implemented with short, open-source Python code recipes. Appropriate for educators and programmers who want to experiment with AI and apply the recipes to their own problem domains. For each technique, learn the basic operating principle, discuss an approach using Python, and review a worked out-example. We'll cover database mining using neural nets, automated categorization with a naive Bayesian classifier, solving popular puzzles with depth-first and breath-first searches, solving more complex puzzles with constraint propagation, and playing a popular game using a probing search strategy."

Some AI examples made in Python. Discusses the AI techniques and the code.Sorting Algorithm Animations

http://www.sorting-algorithms.com/

ベイズを学びたい人におすすめのサイト - ダウンロードたけし（寅年）の日記

http://d.hatena.ne.jp/download_takeshi/20090408/1239146640

ベイズ理論クラスタリングの定番アルゴリズム「K-means法」をビジュアライズしてみた - てっく煮ブログ

http://d.hatena.ne.jp/nitoyon/20090409/kmeans_visualise

おもしろい

1. 各点にランダムにクラスタを割り当てる 2. クラスタの重心を計算する。 3. 点のクラスタを、一番近い重心のクラスタに変更する 4. 変化がなければ終了。変化がある限りは 2. に戻る。

-means 法（K平均法）

1. 各点にランダムにクラスタを割り当てる 2. クラスタの重心を計算する。 3. 点のクラスタを、一番近い重心のクラスタに変更する 4. 変化がなければ終了。変化があるSEOmoz | How Google's Rankings Algorithm Has Changed Over Time

http://www.seomoz.org/blog/how-googles-rankings-algorithm-has-changed-over-time-

Perlでアニメ顔を検出＆解析するImager::AnimeFace - デー

http://d.hatena.ne.jp/ultraist/20090412/1239497216

で、ぐぬぬ画像を自動で作るにはどうしたら。ちゃんと読むか。hatful of hollow - Visualising Sorting Algorithms

http://www.hatfulofhollow.com/posts/code/visualisingsorting/index.html

via the cairo graphics lib, see cairographics.org

Cool, visual, way of showing sorting algoriothmsw

Static images of sorting algorithms, pretty neat!

"This whole thing started partly as an excuse to get familiar with the Cairo graphics library. It produces beautiful, clean images, and appears to be both portable and well designed. It also comes with a set of Python bindings that are maintained as part of the project itself - a big plus in my books. Firefox 3 will use Cairo as its standard rendering back end, which will instantly make it one of the most widely used vector graphics libraries out there. "

I dislike animated sorting algorithm visualisations - there's too much of an air of hocus-pocus about them. Something impressive and complicated happens on screen, but more often than not the audience is left mystified. I think their creators must also know that they have precious little explanatory value, because the better ones are sexed up with play-by-play doodles, added, one feels, as an apologetic afterthought by some particularly dorky sportscaster. Nevertheless I've been unable to find a single attempt to visualise a sorting algorithm statically (if you know of any, please drop me a line). So, presented below are the results of a pleasant evening with some nice Scotch and the third volume of Knuth. First, here's a taster - a static visualisation of heapsort: Heapsort I think these simple static visualisations are much clearer than most animated attempts - and they have the added benefit of also being, to my not entirely unbiased eye, rather beautiful.

I think these simple static visualisations are much clearer than most animated attempts - and they have the added benefit of also being, to my not entirely unbiased eye, rather beautiful.シムシティーの仕組み

http://simlabo.main.jp/educate/material/simshikumi.htm

By Will-Wright

ウィルライトがあかすシムシティaM laboratory

http://lab.andre-michelle.com/karplus-strong-guitar

B木 - naoyaのはてなダイアリー

http://d.hatena.ne.jp/naoya/20090412/btree

ハードディスクのような遅い記憶装置にはB-treeが、SSDのような速い記憶装置にはSuffix Arrayが適しているという論。

ので次は18章です。18章の開発チームが明かす、Google Waveの実装概要 － ＠IT

http://www.atmarkit.co.jp/news/200906/01/wave.html

ギークのおもちゃ。

Waveはリアルタイム／同期通信だけか？ まず、「リアルタイム」という点。ミリ秒単位の遅延しか感じられないチャットやゲームをHTMLページで行うというデモンストレーションに驚かされたために、レポート記事ではこの点を強調しすぎたようだ。Waveが「リアルタイム通信、同期通信だけのもの」という印象を受けた人もいた。それは誤解 Waveではメール同様の非同期通信も可能だ。つまり、あなたが新規にWaveを作り、誰かに宛てたWave（メッセージ）を書けば、それはまずサーバに送られる。受信側（Wave参加者）は、その時点でオンラインでもオフラインでも構わない。オンラインであれば、すぐにそのWaveを開いて読むことができるが、オフラインであれば、メールのインボックスのように（おそらく最終更新時刻の時間順で）、自分が関与しているWave一覧が表示

レポート記事（【詳報】Google Waveとは何なのか？）への反響を見ると、さまざまな疑問を感じている人がいる。そこでここでは、直接Waveのプロジェクトリーダーに話を聞いたり、別セッションで開発チームが行った説明、およびオンラインドキュメントから読み取れたことなど、いくつか追加情報をまとめたい。Understanding Ternary Trees | PC Plus

http://www.pcplus.co.uk/node/3074/

ince we m

Ternary trees are the fastest way to search for data strings, at least in hardcore programming terms, but how exactly do they work?Yury Lifshits | Algorithmic Problems Around the Web

http://yury.name/algoweb/

ジャンル別ゲームの作り方とアルゴリズムまとめ - Logic Edge

http://d.hatena.ne.jp/seikenn/20090627/1246028707

言語別もある

ゲームの作り方とアルゴリズムをジャンル別にまとめてみました。ゲーム制作や、プログラミングの勉強用にご活用ください。言語別ゲームプログラミング制作講座一覧もあわせてお読みください。Web上の膨大な画像に基づく自動画像補完技術の威力 - A Successful Failure

http://d.hatena.ne.jp/LM-7/20090629/1246282979

論文によれば当初1万枚の画像データベースで試した時には生成画像の品質は失望させられるものだったが、データベースを200万枚まで増やすと品質が飛躍的に向上したという。Web上にアップされる画像の総数は日々増加しているのだから（Flickrへの投稿画像は昨年30億枚を突破している）

2007年のSIGGRAPHで、アメリカ・カーネギメロン大学のJames HaysとAlexei A. Efrosが発表した、画像内に映り込んだ所望のオブジェクトを排除し、違和感の無い画像を生成するシーン補完技術。Web上の画像から対象となる画像の類似画像を検索し、その画像で隠蔽領域を完全に置き換えることで違和感の無い補完画像を生成するというもの。出来上がった画像に違和感を感じない。恐ろしい…

すげー。。。。

すごいこれ。

普通に凄いSoftware Updates: Courgette (Chromium Developer Documentation)

http://dev.chromium.org/developers/design-documents/software-updates-courgette

asm_new = disassemble(update) asm_new_adjusted = adjust(asm_new, asm_old) asm_diff = bsdiff(asm_old, asm_new_adjusted) transmit asm_diff client:

We want smaller updates because it narrows the window of vulnerability. If the update is a tenth of the size, we can push ten times as many per unit of bandwidth. We have enough users that this means more users will be protected earlier. A secondary benefit is that a smaller update will work better for users who don't have great connectivity.

Software Updates: Courgette

nice update tool for smaller patches

VERY TINY BINARY DIFFS released by Google and Open Sourced.

A new differential compression algorithm for making Google Chrome updates significantly smaller. Courgette transforms the program into the primitive assembly language and does the diffing at the assembly level

逆アセンブルの差分？ server: asm_old = disassemble(original) asm_new = disassemble(update) asm_new_adjusted = adjust(asm_new, asm_old) asm_diff = bsdiff(asm_old, asm_new_adjusted) transmit asm_diff client: receive asm_diff asm_old = disassemble(original) asm_new_adjusted = bspatch(asm_old, asm_diff) update = assemble(asm_new_adjusted)MIT’s Introduction to Algorithms, Lectures 22 and 23: Cache Oblivious Algorithms - good coders code, great reuse

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-fourteen/

Solve Computational Geometry Problems | PC Plus

http://www.pcplus.co.uk/node/3089/

From the simple to the intricate, geometry is an inescapable part of graphics programming.iPhone Sudoku Grab: How does it all work?

http://sudokugrab.blogspot.com/2009/07/how-does-it-all-work.html

How he solved a limited purpose computer vision problem by applying knowledge about the problem domain.

Weil iPhones Sudokus ganz leicht erkennen können.Plain English Explanation of Big O Notation

http://www.cforcoding.com/2009/07/plain-english-explanation-of-big-o.html

I've met too many developers who don't grok big OFeature Column from the AMS

http://www.ams.org/featurecolumn/archive/svd.html

An intuitive explanation of the geometric meaning behind SVD.

Good explanation of the SVD

Geometric interpretation of SVD.2009-07-04 - 当面C#と.NETな記録

http://d.hatena.ne.jp/siokoshou/20090704#p1

一番右端の立っているビット位置を求める「ものすごい」コード

うーん全然わかんねー…

crazy!The Status of the P Versus NP Problem | September 2009 | Communications of the ACM

http://cacm.acm.org/magazines/2009/9/38904-the-status-of-the-p-versus-np-problem/fulltext

Algorithmatic

http://algorithmatic.com/

A repository for algorithms, and an environment for collaborative development.

looks like an interesting stack overflow for algorithms. i'll keep watching this one...doesn't have kelly ratio or even sharpe ratio yet.linkiblog | How to Build a Popularity Algorithm You can be Proud of

http://blog.linkibol.com/post/How-to-Build-a-Popularity-Algorithm-You-can-be-Proud-of.aspx

a bestiary of algorithmic trading strategies « Locklin on science

http://scottlocklin.wordpress.com/2009/08/17/a-bestiary-of-algorithmic-trading-strategies/

Quants come in three basic varieties. 1. Structurers: people who price complex financial instruments. 2. Risk managers people who manage portfolio risk 3. Quant traders people who use statistics to make money by buying and selling most quants are structurers. Of course, there is often bleed over between these varieties -but it’s a useful taxonomy for looking for work. I’ve done a little of all three at this point (very little, honestly), and have always liked quant trading problems more than the other two varieties. It’s the most ambitious, and the most likely to net you a career outside of a large organization (go me: Army of one!). It’s also the most mysterious, since successful quant traders don’t like to talk about what they do. Structurers and risk managers have to talk about what they do, almost by definition. Quant traders gain little from talking about their special sauce.

***** very good and deep articles on finance topics by "Locklin on science"

vocab of "job specs" in tradingEli Bendersky’s website » Blog Archive » Co-routines as an alternative to state machines

http://eli.thegreenplace.net/2009/08/29/co-routines-as-an-alternative-to-state-machines/

Observation: Co-routines are to state machines what recursion is to stacks When you have to traverse some sort of a nested data structure (say, a binary tree), one approach is to create a stack that remembers where in the tree you are. Another, much more elegant approach, is to write the function recursively. A recursive function employs the machine stack used to implicitly implement function calls - you get the benefits of the stack without paying the cost of reduced readability.Netflix prize tribute: Recommendation algorithm in Python | This Number Crunching Life

http://blog.smellthedata.com/2009/06/netflix-prize-tribute-recommendation.html

Quick implementation of the Netflix recommendation algorithm (probablistic matrix factorization) in Python.

probabalistic matrix factorisation

I test my code using synthetic data, where I first make up latent vectors for users and items, then I generate some training set ratings by multiplying some latent user vectors by latent item vectors then adding some noise. I then discard the latent vectors and just give the model the synthetic ratings.Nihilogic : Canvas Visualizations of Sorting Algorithms

http://www.nihilogic.dk/labs/sorting_visualization/

Leo's Chronicle: ぜひ押さえておきたいコンピューターサイエンスの教科書

http://leoclock.blogspot.com/2009/09/blog-post_21.html

コンピューターサイエンスの教科書

全部読みたいDictionary of Algorithms and Data Structures

http://www.itl.nist.gov/div897/sqg/dads/terms.html

Definitions of algorithms, data structures, and classical Computer Science problems. Some entries have links to implementations and more information.percobaan » Face detection in javascript + canvas

http://blog.kpicturebooth.com/?p=8

Detecção de faces utilizando javascript

Script permettant de faire de la détection faciale à l'aide de javascript.アルゴリズムの紹介

http://fussy.web.fc2.com/algo/index.htm

そのうちやるPuzzle: Fast Bit Counting « Reflections

http://gurmeetsingh.wordpress.com/2008/08/05/fast-bit-counting-routines/

return ((tmp + (tmp >> 3)) & 030707070707) % 63;delicious blog » How SPEAR Identifies Domain Experts within Delicious

http://blog.delicious.com/blog/2009/08/how-spear-identifies-domain-experts-within-delicious.html

analyzing user behavior to find experts

SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious.

"A major problem of the Internet today is that finding high quality information is not easy nor fast. The steady increase of spam and junk content on the Web further complicates this challenge. Another related issue is that finding knowledgeable and trustworthy users on social platforms like Delicious is much more difficult than it should be. Wouldn’t it be nice if Delicious recommended “good” users with similar interests? Or wouldn’t it be helpful if you could get a selection of great websites on jewelry or mortgage without being overwhelmed by spam? To tackle this problem, we created the SPEAR algorithm. SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like Delicious. A great benefit of SPEAR is that it returns two very useful sets of results: first, a list of users ranked by their expertise; and second, a list of websites ranked by their quality."

good, but missing essential parts for recommendations for educational system.

SPEAR (Spamming-resistant Expertise Analysis and Ranking) is a new technique to measure the expertise of users by analyzing their public activities on platforms like DeliciousMichael Nielsen » The Google Technology Stack

http://michaelnielsen.org/blog/lecture-course-the-google-technology-stack/

Interesting set of links and posts describing the technologies Google builds its software on, and how they work together.

The Google Technology Stack … or as I would put it: An Introduction to MapReduce, Data Mining and PageRank

A great in-depth treament of the engine that powers Google

Part of what makes Google such an amazing engine of innovation is their internal technology stack: a set of powerful proprietary technologies that makes it easy for Google developers to generate and process enormous quantities of data. According to a senior Microsoft developer who moved to Google, Googlers work and think at a higher level of abstraction than do developers at many other companies, including Microsoft: “Google uses Bayesian filtering the way Microsoft uses the if statement” (Credit: Joel Spolsky). This series of posts describes some of the technologies that make this high level of abstraction possible.Damn Cool Algorithms: Spatial indexing with Quadtrees and Hilbert Curves - Nick's Blog

http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatial-indexing-with-Quadtrees-and-Hilbert-Curves

How to find the location of a particular point in a Hilbert curve. (via delicious popular)Summary of all the MIT Introduction to Algorithms lectures - good coders code, great reuse

http://www.catonmat.net/blog/summary-of-mit-introduction-to-algorithms/

"As you all may know, I watched and posted my lecture notes of the whole MIT Introduction to Algorithms course. In this post I want to summarize all the topics that were covered in the lectures and point out some of the most interesting things in them."A Speculative Post on the Idea of Algorithmic Authority « Clay Shirky

http://www.shirky.com/weblog/2009/11/a-speculative-post-on-the-idea-of-algorithmic-authority/

one of the things up for grabs in the current news environment is the nature of authority. In particular, I noted that people trust new classes of aggregators and filters, whether Google or Twitter or Wikipedia (in its ‘breaking news’ mode.). Algorithmic authority is the decision to regard as authoritative an unmanaged process of extracting value from diverse, untrustworthy sources, without any human standing beside the result saying “Trust this because you trust me.”

Algorithmic authority is the decision to regard as authoritative an unmanaged process of extracting value from diverse, untrustworthy sources, without any human standing beside the result saying “Trust this because you trust me.” This model of authority differs from personal or institutional authority, and has, I think, three critical characteristics.

We were talking about authority and trust the other day in class after Angela's presentation on medical diagnoses - here's a new post from Clay Shirky on the topic - worth reading.

Invité à réagir à l'évolution des médias, Clay Shirky explique que la transformation majeure dans l'environnement de l'information repose sur la nature de l'autorité. En quelques années, par l'intermédiaire de nouveaux outils de filtrage et d'agrégation, nos autorités ont changé. Et de définir l'autorité algorithmique nouvelle par trois caractéristiques : il utilise des sources multiples et les combine pour les classer ; ces résultats étant suffisamment bons, les gens lui font confiance ; enfin, les gens se rendent compte que nombreux sont ceux qui font confiance à ces résultats ce qui les aide à adopter ces nouvelles autorités (comme Wikipédia).Finally: Finger Trees! : Good Math, Bad Math

http://scienceblogs.com/goodmath/2009/05/finally_finger_trees.php

What finger trees do is give me a way of representing a list that has both the convenience of the traditional cons list, and the search efficiency of the array based method. The basic idea of the finger tree is amazingly simple. It's a balanced tree where you store all of the data in the leaves. The internal nodes are just a structure on which you can hang annotations, which you can use for optimizing search operations on the tree.

"The basic idea of the finger tree is amazingly simple. It's a balanced tree where you store all of the data in the leaves. The internal nodes are just a structure on which you can hang annotations, which you can use for optimizing search operations on the tree. What makes the finger tree so elegant is the way that some very smart people have generalized the idea of annotations to make finger trees into a single, easily customizable structure that's useful for so many different purposes: you customize the annotations that you're going to store in the internal nodes according to the main use of your tree." A commentator says regarding the article however, "Ørjan Johanse is right. You described a monoid-annotated-binary-tree, which is not enough to be a finger tree."List of Algorithms

http://www.scriptol.com/programming/list-algorithms.php

A Favorite Data Structure « Rotten Cotton

http://www.onebadseed.com/blog/?p=80

Ullman Set: position[members[i]] = i

Ullman set, an excellent tutorialMonoids and Finger Trees

http://apfelmus.nfshost.com/monoid-fingertree.html

"A very powerful application of monoids are 2-3 finger trees, first described by Ralf Hinze and Ross Patterson. Basically, they allow you to write fast implementations for pretty much every abstract data type mentioned in Okasaki's book on purely functional data structures. For example, you can do sequences, priority queues, search trees and priority search queues. Moreover, any fancy and custom data structures like interval trees or something for stock trading are likely to be implementable in this framework as well. How can one tree be useful for so many different data structures? The answer: monoids! Namely, the finger tree works with elements that are related to a monoid, and all the different data structures mentioned above arise by different choices for this monoid. Let me explain how this monoid magic works."Algorithm Tutorials

http://www.topcoder.com/tc?d1=tutorials&d2=alg_index&module=Static

GraphRuby Algorithms: Sorting, Trie & Heaps - igvita.com

http://www.igvita.com/2009/03/26/ruby-algorithms-sorting-trie-heaps/

Collection of some useful Ruby data structures all coded up and ready for use.Algorithm Library Design: Lecture Notes

http://www.mpi-inf.mpg.de/~kettner/courses/lib_design_03/notes/index.html

Library design is language design. [Stroustrup] Course Goal To learn how to implement software libraries, such as STL, CGAL, LEDA, ..., that have a focus on algorithms and data structures. To learn advanced programming techniques in C++, such as templates, generic programming, object-oriented design, design patterns, and large-scale C++ software design.Consensus Protocols: Two-Phase Commit at Paper Trail

http://hnr.dnsalias.net/wordpress/?p=90

Nice article on 2pcocto.py: quick and easy MapReduce for Python

http://ebiquity.umbc.edu/blogger/2009/01/02/octopy-quick-and-easy-mapreduce-for-python/

octo.py: quick and easy MapReduce for Python

showcases an example of using the mapreduce system octo.pyウノウラボ Unoh Labs: RDBで階層構造を扱うには？

http://labs.unoh.net/2009/06/rdb.html

how to store tree structure on RDB

階層構造Anti-Grain Geometry - Interpolation with Bezier Curves

http://www.antigrain.com/research/bezier_interpolation/index.html

直線をベジェ曲線にするためのアルゴリズム。参考に。

直線を曲線に補完するコード。

ベジェのスムースアルゴリズムLet’s Try to Find All 200 Parameters in Google Algorithm | Search Engine Journal

http://www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/

Muy buen articulo sobre SEO

I am sure Googlers should be enjoying this: hardly can they say a word, there follows a wealth of guessed and speculations. This time Matt Cutts is said to haveGoogle Searches for Staffing Answers - WSJ.com

http://online.wsj.com/article/SB124269038041932531.html

People leave because they are under used!

Google began crunching data from employee reviews and promotion and pay histories in a formula Google says can identify which employees are most likely to quit.

Idea for business alliance

Current and former Googlers said the company is losing talent because some employees feel they can't make the same impact as the company matures.

Concerned a brain drain could hurt its long-term ability to compete, Google Inc. is tackling the problem with its typical tool: an algorithm. The Internet search giant recently began crunching data from employee reviews and promotion and pay histories in a mathematical formula Google says can identify which of its 20,000 employees are most likely to quit.

algorithm to calculate people who might leave the firm; "underused" employeesCalculate exp() and log() Without Multiplications

http://www.quinapalus.com/efunc.html

IEEE Spectrum: The Million Dollar Programming Prize

http://www.spectrum.ieee.org/may09/8788

year-old Netflix Prize competition, offers a grand prize of US $1 million for an algorithm that’s 10 percent more accurate than the one Netflix uses to predict customers’ movie preferences.

Netflix's bounty for improving its movie-recommendation software is almost in the bag. Here is one team's account

Bell Labs explains their strategy for solving Netflix's collaborative filtering problem.Jeff Erickson's Algorithms Course Materials

http://compgeom.cs.uiuc.edu/~jeffe/teaching/algorithms/

mixi Engineers’ Blog » PerlとRubyで省メモリなハッシュを使おう

http://alpha.mixi.co.jp/blog/?p=791

Gamma error in picture scaling

http://www.4p8.com/eric.brasseur/gamma.html

"Photographs that have been scaled with these software have been degradated. The degradation is often faint but probably most pictures contain at least an array where the degradation is clearly visible. I suppose this happens since the first versions of these software, maybe 20 years ago."

found via reddit. Should consider this the next time I edit a bunch of photosExclusive: How Google’s Algorithm Rules the Web | Magazine

http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1

Want to know how Google is about to change your life? Stop by the Ouagadougou conference room on a Thursday morning. It is here, at the Mountain View, California, headquarters of the world’s most powerful Internet company, that a room filled with three dozen engineers, product managers, and executives figure out how to make their search engine even smarter.

Exclusive: How Google’s Algorithm Rules the Web

Excellent in-depth look at how Google's constantly-improving algorithms make it superior.Some Stuff - Screaming Duck Software

http://www.screamingduck.com/Article.php?ArticleID=46&Show=ABCE

A good idea of image compression based on genetic algorithms.Exclusive: How Google’s Algorithm Rules the Web | Magazine

http://www.wired.com/magazine/2010/02/ff_google_algorithm/

Want to know how Google is about to change your life? Stop by the Ouagadougou conference room on a Thursday morning. It is here, at the Mountain View, California, headquarters of the world’s most powerful Internet company, that a room filled with three dozen engineers, product managers, and executives figure out how to make their search engine even smarter. This year, Google will introduce 550 or so improvements to its fabled algorithm, and each will be determined at a gathering just like this one. The decisions made at the weekly Search Quality Launch Meeting will wind up affecting the results you get when you use Google’s search engine to look for anything — “Samsung SF-755p printer,” “Ed Hardy MySpace layouts,” or maybe even “capital Burkina Faso,” which just happens to share its name with this conference room. Udi Manber, Google’s head of search since 2006, leads the proceedings. One by one, potential modifications are introduced, along with the results of months of testing in vari

Filosofisk (?) artikel om googles algoritmer

Want to know how Google is about to change your life? Stop by the Ouagadougou conference room on a Thursday morning. It is here, at the Mountain View, California, headquarters of the world’s most powerful Internet company, that a room filled with three dozen engineers, product managers, and executives figure out how to make their search engine even smarter. This year, Google will introduce 550 or so improvements to its fabled algorithm, and each will be determined at a gathering just like this one. The decisions made at the weekly Search Quality Launch Meeting will wind up affecting the results you get when you use Google’s search engine to look for anythingKey-Value Store勉強会に行ってきました - blog.katsuma.tv

http://blog.katsuma.tv/2009/02/key_value_store_study.html

"# LuxIO (ラックスIO)"# 普通のB+-tree # 特徴1 * mapped index * index部を全部mmap o index部を実メモリより小さいシステムが対象 # 特徴2 * 長いvalue * 4Gまで * node size(page size)をこえたvalueも余計なオーバーヘッドなしで扱える # 特徴3 * 効率的なappend * paddingなしでLinkedListのデータ構造 # SSDに向いてる？ # 使い道 * key-valともに小さいデータで構想なアクセスが必要な場合 * 実メモリ以下のデータベースという制約あり * 大きなvalueを扱いたい場合 * 大きなvalueをどんどん追記したい # 向かない処理 * 削除が多い処理 * 小さいデータをたくさんリンク o seekのオーバーヘッドが大きすぎる * Read,Writeの激しいアプリ # 分散はたぶんしない # Hashはつくるかも # read lockはなくしたい * 読み込みを重きをおく"

Key-Value型データ設計に関して。いくつかのシステムの特徴などのメモ。Data Compression Explained

http://mattmahoney.net/dc/dce.html

12 Reasons To Be Learning Graph Theory

http://andresosinski.com.ar/blog_view_entry/?id=1

RT @Kellblog: 12 Reasons To Be Learning (or at least paying attention to) Graph Theory http://bit.ly/a1F9hY #linkeddata #rdf #eav #gtmixi Engineers’ Blog » 3行でできる超お手軽全文検索

http://alpha.mixi.co.jp/blog/?p=1112

タグ検索と全文検索といえば、Tokyo Dystopiaが同じような機能を既に実現しています。TCにタグ検索と全文検索がサポートされたからもうTDは不要なのかと思われるかもしれませんが、そうではありません。転置インデックスのライブラリとしてはTDの方がはるかに効率的かつスケールする設計になっていて、また業務に必要なカスタマイズを容易にするためにシンプルな実装になっています。一方でTCの転置インデックスは、パフォーマンスやスケーラビリティではTDに劣りますが、ものすごく簡単に導入できることが特徴です。既にテーブルDBでデータの管理をしているならば、setindexホゲホゲという文を書くだけで1分以内に検索機能を強化することができるのですgraph-theory-algorithms-book - Project Hosting on Google Code

http://code.google.com/p/graph-theory-algorithms-book

MIT’s Introduction to Algorithms, Lecture 16: Greedy Algorithms - good coders code, great reuse

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-eleven/

This is the eleventh post in an article series about MIT's lecture course Introduction to Algorithms. In this post I ...15 Real-World Applications of Genetic Algorithms

http://brainz.org/15-real-world-applications-genetic-algorithms/

Some of the most useful applications of genetic algorithms in the real world.【人工知能】物理エンジンで人工生命つくって学習させた‐ニコニコ動画(ββ)

http://www.nicovideo.jp/watch/sm6392515

こういうのを学生時やりたかった・・A Turing Machine Overview

http://aturingmachine.com/

"I wanted to build a machine that would be immediately recognizable as a Turing machine to someone familiar with Turing's work."

A video of a working Turing machine that Turing himself would probably have recognized despite the use of modern electronics (the only thing missing is a truly infinite tape). I'm impressed, but it also just drives home why there wasn't a "personal Turing machine revolution" after he proposed the device!

Building a real Turing machine. It uses 35mm film leader and writes 0/1 with a dry erase marker. Very cleaver.

An ACTUAL Turing MachinePapers on PageRank you should read | Science for SEO

http://www.scienceforseo.com/ranking-algorithms/papers-on-pagerank-you-should-read/

PageRank is a standard and much discussed topic in SEO and while it is relevant, the methods and techniques discussed are often not. There is a lot of

SEO optimization articlesassertTrue( ): One of the toughest job-interview questions ever

http://asserttrue.blogspot.com/2009/05/one-of-toughest-job-interview-questions.html

I mentioned in a previous post that I once interviewed for a job at a well-known search company. One of the five people who interviewed me asked a question that resulted in an hour-long discussion: "Explain how you would develop a frequency-sorted list of the ten thousand most-used words in the English language." I'm not sure why anyone would ask that kind of question in the course of an interview for a technical writing job (it's more of a software-design kind of question), but it led to a lively discussion, and I still think it's one of the best technical-interview questions I've ever heard. Ask yourself: How would you answer that question?

I mentioned in a previous post that I once interviewed for a job at a well-known search company. One of the five people who interviewed me asked a question that resulted in an hour-long discussion: "Explain how you would develop a frequency-sorted list of the ten thousand most-used words in the English language."

The author talks about a question he got at a job interview, and goes on to provide a reasonable recap/discussion about hash tables. This is generally the kind of answer I look for when I ask similar questions. 9/10 candidates I talk with can't actually discuss a hash function, and don't know how to create one.北海道を落とすとどう跳ねるのか？の裏側 - てっく煮ブログ

http://d.hatena.ne.jp/nitoyon/20090422/hokkaido_uragawa

すごいなあ

跳ねるの裏側

「全ての都道府県について何か見所があるようにパラメータを調整したつもり」なんというサービス精神。Dynamic Programming Practice Problems

http://people.csail.mit.edu/bdean/6.046/dp/

a collection of practice dynamic programming problems and their solutions.

This site contains a collection of practice dynamic programming problems and their solutions. The problems listed below are also available in a pdf handout. To view the solution to one of the problems below, click on its title. To view the solutions, you'll need a machine which can view Macromedia Flash animations and which has audio output. If you want, you can also view a quick review from recitation on how to solve the integer knapsack problem (with multiple copies of items allowed) using dynamic programming.

Nice problem examples.MIT’s Introduction to Algorithms, Lecture 3: Divide and Conquer - good coders code, great reuse

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-two/

This is the second post in an article series about MIT's lecture course Introduction to Algorithms. I changed my mind ...A Sudoku Solver in Java implementing Knuth’s Dancing Links Algorithm

http://www.ocf.berkeley.edu/~jchu/publicportal/sudoku/sudoku.paper.html

Dr. Donald Knuth’s Dancing Links Algorithm solves an Exact Cover situation. The Exact Cover problem can be extended to a variety of applications that need to fill constraints. Sudoku is one such special case of the Exact Cover problem.

See also the references, esp. Knuth's original paper.Artisan System - A PHP5 Object Oriented Framework

http://artisansystem.com/blog/entry/36

how phrases work in search indexesプログラマーに最適なデータマイニングの教科書 『集合知プログラミング』 - 図書館情報学を学ぶ

http://d.hatena.ne.jp/kunimiya/20081116/p1

統計周りの知識は一切ないのでこれから勉強する。

- 図書館情報学を学ぶWhy you should never use rand() | game development | Ian Bullard

http://mjolnirstudios.com/IanBullard/files/79ffbca75a75720f066d491e9ea935a0-10.php

MIT’s Introduction to Algorithms, Lecture 15: Dynamic Programming - good coders code, great reuse

http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-ten/

This is the tenth post in an article series about MIT's lecture course Introduction to Algorithms. In this post I ...The Swinger « Music Machinery

http://musicmachinery.com/2010/05/21/the-swinger/

Swinger uses the new Dirac time-stretching capabilities of Echo Nest remix.

fun software hack that takes any song and makes it swing...with examples

Stretches the first half of each beat and shrinks the latter. Via http://twitter.com/nondisbeliever

HOLY SHIT! The Swinger is a bit of python code that takes any song and makes it swing. It does this be taking each beat and time-stretching the first half of each beat while time-shrinking the second half. It has quite a magical effect.2009-04-09 - きしだのはてな

http://d.hatena.ne.jp/nowokay/20090409#1239268405

メタ役立ちそう

後者関数のラムダ式に書き間違いがある気がする。 succ = λx.λf.λx.f(n f x) => succ = λn.λf.λx.f(n f x) だと思うんだけど、どうなんだろう…。Chris Harrison - Pseudo-3D Video Conferencing with a Generic Webcam

http://www.chrisharrison.net/projects/3dvideo/

Conferenze 3d con webcamsquare root

http://www.itl.nist.gov/div897/sqg/dads/HTML/squareRoot.html

Calculation by handHow Spellcheckers Work | PC Plus

http://www.pcplus.co.uk/node/3062/

As you can see, the process of checking spellings and suggesting corrections is not an exact science, but there\'s no denying that it has made our lives a little easier and our publications a little less unpredictable.

http://news.ycombinator.com/item?id=745537jLayout — JavaScript Layout Algorithms - bramstein.com

http://www.bramstein.com/projects/jlayout/

Usage

The jLayout JavaScript library provides layout algorithms for laying out components. A component is an abstraction; it can be implemented in many ways, for example as items in a HTML5 Canvas drawing or as HTML elements. The jLayout library allows you to focus on drawing the individual components instead of on how to arrange them on your screen. The library currently provides four layout algorithms: border, which lays out components in five different regions; grid, which lays out components in a user defined grid, flex-grid which offers a grid with flexible column and row sizes, and flow which flows components in a user defined direction. Using the grid and flex-grid algorithms you can also create horizontal and vertical layouts. A jQuery plugin to lay out (X)HTML elements is also available.COS 493, Spring 2002: Schedule and Readings

http://www.cs.princeton.edu/courses/archive/spring02/cs493/schedule.html

Algorithms for Massive Data SetsAccurately computing running variance

http://www.johndcook.com/standard_deviation.html

The most direct way of computing sample variance or standard deviation can have severe numerical problems. [...] There is a way to compute variance that is more accurate and is guaranteed to always give positive results. Furthermore, the method computes a running variance. That is, the method computes the variance as the x's arrive one at a time. The data do not need to be saved for a second pass.

"This better way of computing variance goes back to a 1962 paper by B. P. Welford and is presented in Donald Knuth's Art of Computer Programming, Vol 2, page 232, 3rd edition. Although this solution has been known for decades, not enough people know about it. Most people are probably unaware that computing sample variance can be difficult until the first time they compute a standard deviation and get an exception for taking the square root of a negative number. It is not obvious that the method is correct even in exact arithmetic. It's even less obvious that the method has superior numerical properties, but it does."

A simple way to compute running sample variance (standard deviation).

Computing mean, variance and standard deviation on a stream of data.The Most Important Algorithms (Survey)

http://www.risc.jku.at/people/ckoutsch/stuff/e_algorithms.html

SNA Projects Blog : Beating Binary Search

http://sna-projects.com/blog/2010/06/beating-binary-search/

Quick, what is the fastest way to search a sorted array? Binary search, right? Wrong. There is actually a method called interpolation search, in which, rather than pessimistically looking in the middle of the array, you use a model of the key distribution to predict the location of the key and look there.

Interploating search with alogrithmUnderstanding and Applying Operational Transformation - Code Commit

http://www.codecommit.com/blog/java/understanding-and-applying-operational-transformation?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+codecommit+%28Code+Commit%29

@djspiewak wrote a very detailed intro to operational transformation. Very useful for building, say, a collab editor

Almost exactly a year ago, Google made one of the most remarkable press releases in the Web 2.0 era. Of course, by “press release”, I actually mean keynote at their own conference, and by “remarkable” I mean potentially-transformative and groundbreaking. I am referring of course to the announcement of Google Wave, a real-time collaboration tool which has been in open beta for the last several months.

Good article explaining how the Operational Transform from Google Wave can be implemented, and the various cases that have to be handled when server and client both have edits pending.

The algorithm behind "Wave"Plain english explanation of Big O - Stack Overflow

http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o/487278#answer-487278

One of the best laypersons explanation of algorithm complexity that I've seen.

Traditional computers can solve problems in polynomial time. Certain things are used in the world because of this. Public Key Cryptography is a prime example. It is computationally hard to find two prime factors of a very large number. If it wasn't, we couldn't use the public key systems we use.

Stack Overflow post about Big O notationPlain english explanation of Big O - Stack Overflow

http://stackoverflow.com/questions/487258/plain-english-explanation-of-big-o/487278#answer-487278

Cowboy Programming » Programming Poker AI

http://cowboyprogramming.com/2007/01/04/programming-poker-ai/

I recently programmed the AI for the World Series of Poker, developed by Left Field Productions and published by Activision. I started out thinking it would be an easy task. But it proved a lot more complex than I initially thought.

I recently programmed the AI for the World Series of Poker, developed by Left Field Productions and published by Activision. I started out thinking it would be an easy task. But it proved a lot more complex than I initially thought. This article for the budding poker AI programmer provides a foundation for a simple implementation of No-Limit Texas Holdem Poker AI, covering the basics of hand strength evaluation and betting. By following the recipe set out here, you will quickly become able to implement a reasonably strong poker AI, and have a solid foundation on which to build. I assume you are familiar with the basic terminology of poker.Cowboy Programming » Programming Poker AI

http://cowboyprogramming.com/2007/01/04/programming-poker-ai/

I recently programmed the AI for the World Series of Poker, developed by Left Field Productions and published by Activision. I started out thinking it would be an easy task. But it proved a lot more complex than I initially thought.Google Confirms “Mayday” Update Impacts Long Tail Traffic

http://searchengineland.com/google-confirms-mayday-update-impacts-long-tail-traffic-43054

Google Confirms “Mayday” Update Impacts Long Tail Traffic

This change seems to have primarily impacted very large sites with “item” pages that don’t have many individual links into them, might be several clicks from the home page, and may not have substantial unique and value-added content on them. For instance, ecommerce sites often have this structure. The individual product pages are unlikely to attract external links and the majority of the content may be imported from a manufacturer databaseA 10-MINUTE DESCRIPTION OF HOW JUDY ARRAYS WORK AND WHY THEY ARE SO FAST

http://judy.sourceforge.net/doc/10minutes.htm

As the inventor of the Judy algorithm I've been asked repeatedly, "What makes Judy so fast?" The answer is not simple, but finally I can share all of the details.

A complex (to implement) but efficient scalable data-structure that obtains very high performance by minimising the number of cache-line fills required.A Coder's Musings: Curve fitting with Pyevolve

http://acodersmusings.blogspot.com/2009/07/curve-fitting-with-pyevolve.html

Genetic algorithms with Python

genetic algorithm lib useA 10-MINUTE DESCRIPTION OF HOW JUDY ARRAYS WORK AND WHY THEY ARE SO FAST

http://judy.sourceforge.net/doc/10minutes.htm

As the inventor of the Judy algorithm I've been asked repeatedly, "What makes Judy so fast?" The answer is not simple, but finally I can share all of the details.

A complex (to implement) but efficient scalable data-structure that obtains very high performance by minimising the number of cache-line fills required.