Nasim Souly

I have graduated in Master degree of Artificial Intelligence in Amirkabir University of Science and Technology in Tehran - Iran. Before that, I had graduated in Computer Engineering - Software from Iran University of Science and Technology. Currently, I am studying PhD in computer science at University Of Central Florida Center for Research in Computer Vision (CRCV) under supervision of Dr. Mubarak Shah . My main research area is computer vision and machine learning specially saliency detection and semantic sefmentation.

Email: nsouly@eecs.ucf.edu

[My Résumé]


Projects:

 

Scene Labeling Using Sparse Precision Matrix

Abstract: Scene labeling task is to segment the image into meaningful regions and categorize them into classes of objects which comprised the image. Commonly used methods typically find the local features for each segment and label them using classifiers. Afterwards, labeling is smoothed in order to make sure that neighboring regions receive similar labels. However, these methods ignore expressive connections between labels and non-local dependencies among regions. In this paper, we propose to use a sparse estimation of precision matrix (also called concentration matrix), which is the inverse of covariance matrix of data obtained by graphical lasso to find interaction between labels and regions. To do this, we formulate the problem as an energy minimization over a graph, whose structure is captured by applying sparse constraint on the elements of the precision matrix. This graph encodes (or represents) only significant interactions and avoids a fully connected graph, which is typically used to reflect the long distance associations. We use local and global information to achieve better labeling. We assess our approach on three datasets and obtained promising results. [CVPR 2016 Paper] [Video Presentation] [ Presentation file]

 

Visual Saliency Detection Using Group Lasso Regularization in Videos of Natural Scenes

Abstract: Visual saliency is the ability of a vision system to promptly select the most relevant data in the scene and reduce the amount of visual data that needs to be processed. Thus, its applications for complex tasks such as object detection, object recognition and video compression have attained interest in computer vision studies. In this paper,we introduce a novel unsupervised method for detecting visual saliency in videos of natural scenes. For this, we divide a video into non-overlapping cuboids and create a matrix whose columns correspond to intensity values of these cuboids. Simultaneously, we segment the video using a hierarchical segmentation method and obtain super-voxels. A dictionary learned from the feature data matrix of the video is subsequently used to represent the video as coefficients of atoms. Then, these coefficients are decomposed into salient and nonsalient parts. We propose to use group lasso regularization to find the sparse representation of a video, which benefits from grouping information provided by super-voxels and extracted features from the cuboids.We find saliency regions by decomposing the feature matrix of a video into low-rank and sparse matrices by using robust principal component analysis matrix recovery method. The applicability of our method is tested on four video data sets of natural scenes. Our experiments provide promising results in terms of predicting eye movement using standard evaluation methods. In addition, we show our video saliency can be used to improve the performance of human action recognition on a standard dataset. [Project Page] [Related Publication ]


Covariance of Motion and Appearance Features for Human Action and Gesture Recognition

Abstract: In this work, we introduce a novel descriptor for general purpose video analysis. In our approach, we compute kinematic features from optical flow and first and second-order derivatives of intensities to represent motion and appearance respectively. These features are then used to construct covariance matrices which capture joint statistics of both low-level motion and appearance features extracted from a video. Using an over-complete dictionary of the covariance based descriptors built from labeled training samples, we formulate low-level event recognition as a sparse linear approximation problem. Within this, we pose the sparse decomposition of a covariance matrix, which also conforms to the space of semi-positive definite matrices, as a determinant maximization problem. Also since covariance matrices lie on non-linear Riemannian manifolds, we compare our former approach with a sparse linear approximation alternative that is suitable for equivalent vector spaces of covariance matrices. This is done by searching for the best projection of the query data on a dictionary using an Orthogonal Matching pursuit algorithm. We show the applicability of our video descriptor in two different application domains - namely low-level event recognition in unconstrained scenarios and gesture recognition using one shot learning. Our experiments provide promising insights in large scale video analysis. [Arxiv]

Contribution: UCF is a part of SRI-Sarnoff team and this work was part of ALADDIN projects.