Recommender systems (RS) have been gaining momentum with the advent of digitalization of our daily lives, accordingly, companies seek to attract most customers in this environment. One way of attracting more customers by advertisements is through online ads that make use of click-through rates (CTR) for the ads to build efficient RSS. For the RSS, frequently utilized methods are collaborative filtering (CF), content-based filtering (CBF) along with one of the traditional reinforcement learning approaches. The objective of this paper is to determine the best online ad among multiple advertisements to show the customers by reinforcement learning (RL). By treating the problem in multi-armed bandits, we modeled the problem with Bernoulli distribution by means of obtained CTRs. The best ad was tried to be chosen by the Bernoulli bandit with three settings; A/B/n testing, epsilon greedy, and Upper Confidence Bound (UCB) methods. The results show the explorations' contribution (with UCB and epsilon greedy) to the performance of the methods. Each method chose the same ad to show for online ads. UCB found the most preferable ad with a CTR rate of around 27.01%. It was followed by the epsilon greedy strategy with a CTR of around 25%. All the methods used determined the same ad alternative as the best according to the results obtained.