A recently introduced, effective feature extraction technique for computational paralinguistics is that of Bag-of-Audio-Words (BoAW), where we cluster the frame-level training vectors, and represent each speech utterance based on the cluster of its frames. Over the past few years, several improvements have been proposed for the original BoAW approach, but none of them has examined the impact of the stochastic nature of the clustering step. In this study we demonstrate experimentally that the random factor present in the BoAW clustering step is indeed propagated into the next classification step, eventually leading to suboptimal classification performance. As a solution, we propose to train an ensemble of classifiers; that is, we repeat the BoAW codebook selection step several times, train separate classifier models for these BoAW representation versions and combine their predictions. Our results, obtained for three different paralinguistic datasets, demonstrate that this ensemble technique makes the whole paralinguistic classification process more robust, and it leads to improvements in the classification performance. We tested this technique on three different paralinguistic datasets, and achieved the highest Unweighted Average Recall score reported so far on the iHEARu-EAT corpus.