Feature Hashing for Large Scale Multitask Learning

About

Empirical evidence suggests that hashing is an effective strategy for dimensionality reduction and practical nonparametric estimation. In this paper we provide exponential tail bounds for feature hashing and show that the interaction between random subspaces is negligible with high probability. We demonstrate the feasibility of this approach with experimental results for a new use case -- multitask learning with hundreds of thousands of tasks.

Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola• 2009

Related benchmarks

Task	Dataset	Result
Recommendation	Yelp 2018	Recall@204.894	73
Recommendation	Gowalla	Recall @ 209.214	35
Recommendation	Gowalla	Recall@106.368	20
Recommendation	Yelp 2018	Recall@103.048	20
Recommendation	Beauty	Recall@103.054	20
Recommendation	Beauty	Recall@204.696	20
Recommendation	AmazonBook	Recall@101.677	19
Recommendation	AmazonBook	Recall@202.485	19
Intra-creator embedding similarity	Video Dataset (Same Day)	Intra-creator Similarity66	2
Intra-creator embedding similarity	Video Dataset Same & Next Day	Intra-creator Similarity66	2

Showing 10 of 11 rows

Other info

Follow for update

@wizwand_team Discord