As a task of high importance for recommender systems, we consider the problem of learning the convex combination of ranking algorithms by online machine learning. First, we propose a stochastic optimization algorithm that uses finite differences. Our new algorithm achieves close to optimal empirical performance for two base rankers, while scaling well with an increased number of models. In our experiments with five real-world recommendation data sets, we show that the combination offers significant improvement over previously known stochastic optimization techniques. The proposed algorithm is the first effective stochastic optimization method for combining ranked recommendation lists by online machine learning. Secondly, we propose an exponentially weighted algorithm based on a grid over the space of combination weights. We show that the algorithm has near-optimal worst-case performance bound. The bound provides the first theoretical guarantee for non-convex bandits using limited number of evaluations under very general conditions.