I have graduated in Master degree of Artificial Intelligence in Amirkabir University of Science and Technology in Tehran - Iran.
Before that, I had graduated in Computer Engineering - Software from Iran University of Science and Technology.
Currently, I am studying PhD in computer science at University Of Central Florida Center for Research in Computer Vision (CRCV)
under supervision of Dr. Mubarak Shah .
My main research area is computer vision and machine learning specially saliency detection and semantic sefmentation.
Scene labeling task is to segment the image into meaningful
regions and categorize them into classes of objects
which comprised the image. Commonly used methods typically
find the local features for each segment and label them
using classifiers. Afterwards, labeling is smoothed in order
to make sure that neighboring regions receive similar labels.
However, these methods ignore expressive connections
between labels and non-local dependencies among regions.
In this paper, we propose to use a sparse estimation of precision
matrix (also called concentration matrix), which is
the inverse of covariance matrix of data obtained by graphical
lasso to find interaction between labels and regions. To
do this, we formulate the problem as an energy minimization
over a graph, whose structure is captured by applying
sparse constraint on the elements of the precision matrix.
This graph encodes (or represents) only significant interactions
and avoids a fully connected graph, which is typically
used to reflect the long distance associations. We use local
and global information to achieve better labeling. We assess
our approach on three datasets and obtained promising
[CVPR 2016 Paper]
[ Presentation file]
Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes
Visual saliency is the ability of a vision system
to promptly select the most relevant data in the scene and
reduce the amount of visual data that needs to be processed.
Thus, its applications for complex tasks such as object detection,
object recognition and video compression have attained
interest in computer vision studies. In this paper,we introduce
a novel unsupervised method for detecting visual saliency
in videos of natural scenes. For this, we divide a video
into non-overlapping cuboids and create a matrix whose
columns correspond to intensity values of these cuboids.
Simultaneously, we segment the video using a hierarchical
segmentation method and obtain super-voxels. A dictionary
learned from the feature data matrix of the video is subsequently
used to represent the video as coefficients of atoms.
Then, these coefficients are decomposed into salient and nonsalient
parts. We propose to use group lasso regularization
to find the sparse representation of a video, which benefits
from grouping information provided by super-voxels and
extracted features from the cuboids.We find saliency regions
by decomposing the feature matrix of a video into low-rank
and sparse matrices by using robust principal component
analysis matrix recovery method. The applicability of our
method is tested on four video data sets of natural scenes.
Our experiments provide promising results in terms of predicting
eye movement using standard evaluation methods.
In addition, we show our video saliency can be used to improve
the performance of human action recognition on a standard dataset.
[Related Publication ]
Covariance of Motion and Appearance Features for Human Action and Gesture Recognition
In this work, we introduce a novel descriptor for general purpose video analysis. In our approach, we
compute kinematic features from optical flow and first and second-order derivatives of intensities to represent motion and appearance
respectively. These features are then used to construct covariance matrices which capture joint statistics of both low-level motion
and appearance features extracted from a video. Using an over-complete dictionary of the covariance based descriptors built from
labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem. Within this, we pose the
sparse decomposition of a covariance matrix, which also conforms to the space of semi-positive definite matrices, as a determinant
maximization problem. Also since covariance matrices lie on non-linear Riemannian manifolds, we compare our former approach with
a sparse linear approximation alternative that is suitable for equivalent vector spaces of covariance matrices. This is done by searching
for the best projection of the query data on a dictionary using an Orthogonal Matching pursuit algorithm.
We show the applicability of our video descriptor in two different application domains - namely low-level event recognition in
unconstrained scenarios and gesture recognition using one shot learning. Our experiments provide promising insights in large scale
UCF is a part of SRI-Sarnoff team and this work was part of ALADDIN projects.