Encouraged by the requirement of high speed matrix computations and training deep neural networks, TensorCore was introduced in NVIDIA GPU to further accelerate matrix-matrix multiplication. It supports very fast half precision general matrix matrix multiplications (GEMMs), which is around 8x faster then single precision CUDA core GEMMs. So far the use of TensorCore GPU for matrix operations other than matrix-matrix multiplication is under developed. In this paper, we propose efficient BLAS3 operations that exploits TensorCore. The experimental results show that the proposed algorithms outperform cublas corresponding routines and the naive TensorCore implementation with up to 4.7x speedup.