Skip to main content

Introduction

Human Skin Detection in Color Images

Thesis Overview

See full thesis

The purpose of the thesis is to present a review of the human skin detection datasets and approaches of the state of the art, and then perform a comparative in-depth analysis of the most relevant methods on different databases.

Detected skin pixels: the skin regions are now completely red as the ground truth mask has been overlayed onto the original image using red pixels. Depending on the dataset, the regions countouring the mouth, ears, eyes, and other facial features may or may not represent pixels of skin and are often content of discussion in the academic world.
The original image: a three-quarter shot featuring a pale-skinned girl with curly brown hair and a plain orange background.

Skin detection is the process of discriminating skin and non-skin pixels. It is quite a challenging process because of the large color diversity that objects and human skin can assume, and scene properties such as lighting and background.

Applications

  • Facial Analysis
  • Gesture Analysis
  • Biomedical
  • Advertisement
    Infer audience demographics
  • Content Filter
  • Video Surveillance
  • Privacy Protection
    Encrypt people identity in smart cities

Ramirez et al. 2014 [1]

Skin detection is often an important step to analyze faces.
In Digital Out-of-Home advertising, skin detection can be used to infer properties of the audience.

Low et al. 2020 [2]

Biomedical applications include the early detection of skin cancers, such as melanoma.

Do et al. 2014 [3]

Clay color is really similar to some skin tones.
Wood can assume similar color to some skin tones.
Lighting can modify the image properties a lot. We, as humans do not perceive that much difference because our eyes are trained on these kinds of color transitions, but computers have trouble.
Lighting can modify the image properties a lot. We, as humans do not perceive that much difference because our eyes are trained on these kinds of color transitions, but computers have trouble.

Limitations

  • Materials with skin-like colors
  • Wide range of skin tones
  • Illumination
  • Cameras color science

Methodological Approach

In this thesis the significance and limitations of skin detection have been addressed. A review of public datasets available in the domain and an analysis of state-of-the-art approaches has been presented, including a new proposed taxonomy. Three different state-of-the-art methods have been thoroughly examined implemented and validated in respect to the original papers, when possible. An evaluation of the chosen approaches in different settings has been presented, alongside a discussion on the metrics used in the domain.Finally, the results have been thoroughly discussed through data and figures.

Taxonomy

Skin detection is a binary classification problem: the pixels of an image must be divided between skin and non-skin classes.

One of several ways to categorize methods is to group them according to how the pixel classification is done.

Rule-based
classifier

Skin Detector

Machine Learning
classifier

Thresholding approaches use plain rules to classify each pixel as either skin or non-skin. An example is the following.
A given (Y,Cb,Cr) pixel is a skin pixel if 133<=Cr<=173 and 77<=Cb<=127

Investigated Methods

Thresholding, Statistical, and Deep Learning have been the chosen approaches.
The first to demonstrate whether simple rules can achieve powerful results; and the latter two to compare how differently the models behave and generalize, and whether the capabilities of a CNN to extract semantic features can give an advantage.

Dynamic Thresholding

Algorithm Overview

  1. Input image RGB to YCbCr
  2. Crmax Cbmin computation
  3. Pixel-wise computation of the correlation rules parameters
  4. Pixel-wise correlation rules check
Visualization of how some parameters are computed with the trapezia to provide a better context to the following paragraphs.

Brancati et al. 2017 [4]

The skin pixels clusters assume a trapezoidal shape in the YCb and YCr color subspaces. Moreover, the shape and size of the trapezium vary according to many factors, such as the illumination conditions. In high illumination conditions, the base of the trapezium results larger.

Besides, the chrominance components of a skin pixel P with coordinates (PY, PCb, PCr) in the YCbCr space exhibit the following behavior: the further is the (PY, PCr) point from the longer base of the trapezium in the YCr subspace, the further is the (PY, PCb) point from the longer base of the trapezium in the YCb subspace, and vice versa.

The aforementioned observations are the base of the method: it tries to define image-specific trapeziums in the YCb and YCr color subspaces and then verifies that the correlation rules between the two subspaces reflect the inversely proportional behavior of the chrominance components.

Statistical

A three-quarter shot featuring a pale-skinned girl with curly brown hair and a plain orange background.
The resulting 3D histogram from the image featuring the girl with the orange background: each pixel is taken from the original image and stored at the coordinates [R,G,B] of the histogram. The visualization of the resulting three-dimensional histogram features some accumulation points which may indicate some interesting features to extract from the image. For example the plain orange background can easily be identified as it represents lot of pixels with low variance.

Example of an image's 3D Histogram

Train

  1. Initialize the skin and non-skin 3D histograms
  2. Pick (image, mask) from the training set
  3. Loop every RGB pixel from image
  4. By checking its mask, the pixel its either skin or non-skin. Add +1 to the relative histogram count at coordinates [r,g,b]
  5. Return to step 2 until there are images

Predict

  1. Define classifying threshold Θ
  2. Loop every RGB pixel from input image
  3. Calculate RGB probability of being skin
  4. If skin probability > Θ, it is classified as skin

The data is modeled with two 3D histograms representing the probabilities of skin and non-skin classes, and classification is performed via probability calculus by measuring the probability P of each rgb pixel to belong to the skin class:

Probability function; The probability of each RGB pixel to belong to the SKIN class is: the pixel count contained in bin RGB of the SKIN histogram divided by parenthesis the pixel count contained in bin RGB of the SKIN histogram plus the equivalent count from the non-skin histogram closed parenthesis.

where s[rgb] is the pixel count contained in bin rgb of the skin histogram and n[rgb] is the equivalent count from the non-skin histogram.
A particular rgb value is labeled skin if:

Having computed the probability, a threshold (represented by Theta) must be surpassed to be classified as a skin pixel.

where 0 ≤ Θ ≤ 1 is a threshold value that can be adjusted to trade-off between true positives and false positives.

Workflow

  1. Pre-process input image: resize (512×512)px, padding
  2. Extract features in the contracting pathway via convolutions and down-sampling: the spatial information is lost while advanced features are learnt
  3. Try to retrieve spatial information through up-sampling in the expansive pathway and direct concatenations of dense blocks coming from the contracting pathway
  4. Provide a final classification map
Context image to the paragraphs describing the architecture of Skinny. The network is called "U-Net" because of its shape: there is a contracting path which tries to extract increasingly complex features as it goes deeper, and an expanding path which tries to retreive the lost spatial information during the feature extraction.

Tarasiewicz et al. 2020 [5]

The Skinny network consists of a modified U-Net incorporating dense blocks and inception modules to benefit from a wider spatial context.

The network is called "U-Net" because of its shape: there is a contracting path which tries to extract increasingly complex features as it goes deeper, and an expanding path which tries to retreive the lost spatial information during the feature extraction

An additional deep level is appended to the original U-Net model, to better capture large-scale contextual features in the deepest part of the network. The features extracted in the contracting path propagate to the corresponding expansive levels through the dense blocks.

The original U-Net convolutional layers are replaced with the inception modules: before each max-pooling layer, in the contracting path, and after concatenating features, in the expanding path. Thanks to these architectural choices, Skinny benefits from a wider pixel context.

A face shot of a man. The face covers almost all of the image, hence it may need bigger convolution sizes to extract complex features.
A half body shot of a person. Depending on the dataset, an image like this may be common, hence convolution sizes may be already good for it.
An image featuring two people fully. The skin pixel regions are small, hence convolution sizes can be reduced.

The salient content size varies between images. Inception module combines multiple kernels with different sizes for content adaptation.

Visualization of the layers in a dense block to provide context to the following paragraph. Lot of information is lost by going deeper in a CNN; so much that there is the problem of vanishing before reaching the other side. Dense blocks enhance feature reuse by simplifying the connectivity pattern between the network paths.

Dense block layers are connected in a way that each one receives feature maps from all preceding layers and passes its feature maps to all subsequent layers.

Datasets

Image databases are essential for developing skin detectors. Over the years, new databases keep getting published, but there are still some limitations on their reliability:

  • Unbalanced classes
    May cause some metrics to give overoptimistic estimations [6]
  • Number of images
  • Image quality
  • Ground truth quality
  • Lack of additional data
    This kind of data may be extremely useful in some applications:
    • Lighting conditions
    • Background complexity
    • Number of subjects
    • Featured skin tones
    • Indoor or outdoor scenery

ECU, HGR, and Schmugge are the chosen datasets for this work as they describe a good overall score considering popularity, diversity, size, and the previously mentioned issues.

Here are the common datasets used in Skin Detection.

Only public datasets featuring images and including ground truths are considered

TDSD [7] is the acronym of Test Database for Skin Detection, which is a database featuring 555 full-body skin images. Its ground truths are segmentation masks. It is also referred to as IBTD.

ECU [8] is a dataset created at the Edith Cowan University and represents the largest analyzed dataset, consisting of 3998 pictures. It has been categorized as a full-body dataset, but most of its content is half-body shots. It can also be referred to as Face and Skin Detection Database (FSD).

Schmugge [9] is a facial dataset that includes 845 images taken from different databases. It provides several labeled information about each image and ternary ground truths.

Pratheepan [10] is composed of 78 pictures randomly sampled from the web, precisely annotated. It stores the pictures containing a single subject with simple backgrounds and images containing multiple subjects with complex backgrounds in different folders.

VPU [11] as for Video Processing & Understanding Lab, consists of 285 images taken from five different public datasets for human activity recognition. The size of the pictures is constant between the images of the same origin. The dataset provides native train and test splits. It can also be referred to as VDM.

SFA [12] is the acronym of Skin of FERET and AR Database and consists of 1118 semipassport pictures with a very plain background, and skin and non-skin samples (ignored in this work). Its ground truths are segmentation masks.

HGR [13] is a Hand Gesture Recognition Database that organizes 1558 hand gesture images in three sub-datasets. Two sub-datasets include size-fixed very high-resolution images together with downscaled alternatives (used in this work).

abd [14] is a database composed of 1400 size-fixed abdominal pictures accurately selected to represent different ethnic groups and body mass indices. It has native test and train splits.

Citations from the original papers or eventual labels
NameYearImagesShot TypeSkin Tones
abd-skin20191400abdomenafrican, indian, hispanic, caucasian, asian
HGR20141558hand-
SFA20131118faceasian, caucasian, african
VPU2013285full body-
Pratheepan201278full body-
Schmugge2007845facelabels: light, medium, dark
ECU20053998full bodywhitish, brownish, yellowish, and darkish
TDSD2004555full bodydifferent ethnic groups

VPU [11] as for Video Processing & Understanding Lab, consists of 285 images taken from five different public datasets for human activity recognition. The size of the pictures is constant between the images of the same origin. The dataset provides native train and test splits. It can also be referred to as VDM.

Results

For example, with ECU as the dataset, it means that the skin detector is eventually trained using the training set of ECU, and then tested on the test set of ECU.

In single evaluations, methods are eventually trained on the training set (in the case of trainable methods), and then predictions are performed on the test set.

For example, with ECU as the training dataset and HGR as the testing dataset, it means that the skin detector is trained using the training set of ECU, and then tested on all the HGR dataset.

In cross evaluations, only trainable approaches are analyzed. Detectors are trained on the training set, and then predictions are performed on all the images of every other datasets.
The expression HGR on ECU describes the situation in which the evaluation is performed by using HGR as the training set and ECU as the test set.

Initially, the metrics are measured for all the instances, then the average and population standard deviation for each metric are computed.

Single Dataset

Dprs = sqrt{(1-PR)^2 + (1-RE)^2 + (1-SP)^2}

Where PR is Precision, RE is Recall, and SP is Specificity; and (1,1,1) the ideal ground truth.

Method \ DatabaseECUHGRSCHMUGGE
F1U-Net0.9133 ± 0.080.9848 ± 0.020.6121 ± 0.45
Statistical0.6980 ± 0.220.9000 ± 0.150.5098 ± 0.39
Thresholding0.6356 ± 0.240.7362 ± 0.270.4280 ± 0.34
IoUU-Net0.8489 ± 0.120.9705 ± 0.030.5850 ± 0.44
Statistical0.5751 ± 0.230.8434 ± 0.190.4303 ± 0.34
Thresholding0.5088 ± 0.250.6467 ± 0.300.3323 ± 0.28
DprsU-Net0.1333 ± 0.120.0251 ± 0.030.5520 ± 0.64
Statistical0.4226 ± 0.270.1524 ± 0.190.7120 ± 0.54
Thresholding0.5340 ± 0.320.3936 ± 0.360.8148 ± 0.48
  • Schmugge appears to be the hardest dataset to classify, also presenting high standard deviations that can be attribuited to its diverse content, featuring different subjects, backgrounds, and lighting.

  • HGR seems to be the easier dataset to classify, which can be due to the relatively low diversity of subjects and backgrounds. In fact, learning approaches tend to have very high measurements.

  • In the ECU dataset, the results of Statistical and Thresholding are relatively close, while U-Net outperforms them by far.

  • U-Net beats its competitors in all the measurements, while Statistical comes always second.

NOT REPRESENTATIVE OF OVERALL PERFORMANCES!!
(for the skin detectors performance read the tables)

Instead their purpose is trying to highlight the strength and limitations of each skin detector by making a comparison.

Significant Outcomes

Context for the following paragraphs.
(a)(b)(c)(d)(e)
U-Net predictions may have different shape than other images due to the network preprocessing

Skin detection results. (a) input image; (b) ground truth; (c) U-Net; (d) Statistical; (e) Thresholding

  • All approaches struggle on the first image as the lighting is really tricky. Even the U-Net describes a very bad classification, with a tremendous number of False Positives. Thresholding is the most restrictive on False Positives in this instance.

  • Color-based methods struggle on images without skin pixels and containing materials with skin-like color, with Statistical having a really high number of False Positives.

Cross Dataset

Taken into consideration to get a better idea of the number of True Positives compared to False Positives and False Negatives
TrainingECUHGRSCHMUGGE
TestingHGRSCHMUGGEECUSCHMUGGEECUHGR
F1U-Net0.9308 ± 0.110.4625 ± 0.410.7252 ± 0.200.2918 ± 0.310.6133 ± 0.210.8106 ± 0.19
Statistical0.5577 ± 0.290.3319 ± 0.280.4279 ± 0.190.4000 ± 0.320.4638 ± 0.230.5060 ± 0.25
IoUU-Net0.8851 ± 0.150.3986 ± 0.370.6038 ± 0.220.2168 ± 0.250.4754 ± 0.220.7191 ± 0.23
Statistical0.4393 ± 0.270.2346 ± 0.210.2929 ± 0.170.2981 ± 0.240.3318 ± 0.200.3752 ± 0.22
DprsU-Net0.1098 ± 0.150.7570 ± 0.560.3913 ± 0.260.9695 ± 0.440.5537 ± 0.270.2846 ± 0.27
Statistical0.5701 ± 0.291.0477 ± 0.350.8830 ± 0.231.0219 ± 0.420.7542 ± 0.300.6523 ± 0.27
F1 - IoUU-Net0.04570.06390.12140.07500.13790.0915
Statistical0.11840.09730.13500.10190.13200.1308
  • Using HGR as the training set and predicting over Schmugge, Statistical outperforms U-Net, especially in the F1 score. While Statistical generally performs better than U-Net, it also includes a lot of False Positives, as the F1 - IoU and the Dprs metrics indicate. The latter is particularly bad in both cases, evidencing a big distance between the ideal ground truths and the predictions.

  • Training on Schmugge and predicting on ECU has the U-Net describing a slightly worse F1 - IoU, suggesting the presence of False Positives and False Negatives.

  • U-Net exceeds an F1 score of 80 in the case of Schmugge as the training set and HGR as the prediction set despite the size of the training set, which is not huge.

  • Apart from a few expections, U-Net still dominates.

Significant Outcomes

Context for the following paragraphs.
(a)(b)(c)(d)

Skin detection results. (a) input image; (b) ground truth; (c) U-Net; (d) Statistical

  • It can be noticed how Statistical tends to exaggerate at classifying skin pixels in some cases, confirming the above intuitions on the statistical method having a lot of False Positives.

  • The third row (HGR on Schmugge) is part of the datasets combination in which Statistical outperforms U-Net. Statistical reports a lot of False Positives, but also a lot of True Positives, which U-Net struggles to identify. This kind of situation is why Dprs is better on U-Net despite F1 and IoU are worse: Dprs formula is partly driven by False Negatives (via Specificity), contrary to the other two metrics.

  • The last row (HGR on Schmugge) is also part of the same datasets combination and describes a similar situation: U-Net fails at labeling several skin pixels, especially on very lit regions, while Statistical overdoes it. This image represents the high complexity and diversity of the Schmugge content.

Single Skin Tone

Method \ DatabaseDARKMEDIUMLIGHT
F1U-Net0.9529 ± 0.000.9260 ± 0.150.9387 ± 0.12
Statistical0.8123 ± 0.020.7634 ± 0.190.8001 ± 0.15
Thresholding0.2620 ± 0.140.6316 ± 0.200.6705 ± 0.14
IoUU-Net0.9100 ± 0.010.8883 ± 0.180.9006 ± 0.14
Statistical0.6844 ± 0.030.6432 ± 0.170.6870 ± 0.16
Thresholding0.1587 ± 0.100.4889 ± 0.190.5190 ± 0.14
DprsU-Net0.0720 ± 0.010.1078 ± 0.210.0926 ± 0.15
Statistical0.3406 ± 0.050.3452 ± 0.230.3054 ± 0.20
Thresholding0.8548 ± 0.120.5155 ± 0.240.4787 ± 0.17
  • DARK presents an almost null standard deviation in learning approaches, indicating that the diversity of the images might not be very high.

  • The learning approaches have the highest difficulty at classifying the medium skin tones, which may be caused by the difficult scenarios featured in the sub-dataset, such as clay terrains, which have a skin-like color.

  • Thresholding struggles to classify dark skin tones, which may indicate that the skin clustering rules are leaving out the darker skin pixels.

  • U-Net beats its competitors in all the measurements, while Statistical comes always second.

Significant Outcomes

Context for the following paragraphs.
(a)(b)(c)(d)(e)

Skin detection results. (a) input image; (b) ground truth; (c) U-Net; (d) Statistical; (e) Thresholding

  • The first two rows depict darker skin tones. In both examples, it is possible to notice a pattern in the classification of each approach: U-Net produces almost ground truth-like predictions; Statistical tends to exaggerate on classifying skin pixels, but has an excellent number of True Positives; Thresholding seems to fail at classifying the darkest skin tones, but sometimes still manages to mark the inner regions of the face, which are often enough for describing the face shape.

  • The third row represents a tricky background with a clay terrain and medium skin tones. U-Net produces a very good prediction, while the other approaches include many False Positives. Statistical reports a tremendous number of False Positives, while Thresholding is deceived by the clay terrain and ruins its otherwise excellent classification.

  • In the last row, U-Net and Thresholding have very good predictions, with the former incorporating more False Positives, and the latter including more False Negatives. The statistical approach reports once again a huge number of False Positives.

Cross Skin Tone

TrainingDARKMEDIUMLIGHT
TestingMEDIUMLIGHTDARKLIGHTDARKMEDIUM
F1U-Net0.7300 ± 0.250.7262 ± 0.260.8447 ± 0.130.8904 ± 0.140.7660 ± 0.170.9229 ± 0.11
Statistical0.7928 ± 0.110.7577 ± 0.120.5628 ± 0.140.7032 ± 0.140.5293 ± 0.200.7853 ± 0.11
IoUU-Net0.6279 ± 0.270.6276 ± 0.280.7486 ± 0.150.8214 ± 0.160.6496 ± 0.210.8705 ± 0.13
Statistical0.6668 ± 0.110.6229 ± 0.130.4042 ± 0.130.5571 ± 0.140.3852 ± 0.190.6574 ± 0.12
DprsU-Net0.3805 ± 0.330.3934 ± 0.340.2326 ± 0.170.1692 ± 0.180.3402 ± 0.210.1192 ± 0.16
Statistical0.3481 ± 0.160.4679 ± 0.180.6802 ± 0.200.5376 ± 0.230.6361 ± 0.220.3199 ± 0.16
F1 - IoUU-Net0.10210.09860.09610.06900.11640.0524
Statistical0.12600.13480.15860.14610.14410.1279
  • Using DARK as the training set and predicting over LIGHT, Statistical has better F1 but worse IoU: Statistical picks more True Positives than U-Net.

  • In MEDIUM on DARK case, the Dprs score of Statistical is worse than in the case of LIGHT on DARK, even if the F1 and IoU are better. Specificity is driving the prediction away from the ideal ground truth, suggesting very few True Negatives.

  • Statistical outperforms U-Net a pair of times when using the darker skin tones as the training set: it may indicate that, when using a smaller training set, Statistical performs better, as the dark sub-dataset was the smallest one and therefore had to be data-augmented with light transformations. U-Net also describes more unstable results as the population standard deviation is higher.

  • Despite LIGHT being the biggest dataset, U-Net describes the best average score by training on MEDIUM, as the LIGHT on DARK case is far worse than LIGHT on MEDIUM. Representing the midpoint between the colors of darker and lighter skin tones, MEDIUM data allows U-Net to get very good predictions even with a small training size.

  • As usual, U-Net outperforms Statistical in most situations.

Significant Outcomes

Context for the following paragraphs.
(a)(b)(c)(d)

Skin detection results. (a) input image; (b) ground truth; (c) U-Net; (d) Statistical

  • The first two rows are from models trained on DARK. The results of U-Net are terrible, with a lot of False Positives and False Negatives. Statistical depicts a lot of False Positives too, but at least gets the skin pixels. The small size of dataset makes it hard for the CNN model to classify correctly.

  • The third row features a MEDIUM on DARK case, where the hypothesis of Statistical having very few True Negatives, driving the Dprs measure high, seems confirmed. U-Net on the other hand performs a quite good classification, marking almost correctly most of the skin regions.

  • The last row represents a LIGHT on DARK case on the same original picture of the previous row. In this case, Statistical does a much better job, especially at predicting non-skin pixels, which may indicate that the light sub-dataset contains more images featuring sky and water labeled as non-skin pixels.

Inference Times

Inference times were measured on a i7-4770k CPU for each algorithm on the same set of images, with multiple observations performed.

Deep learning: assume one prediction is already performed before starting the observations.

Statistical: create the prediction image looping on a sequence object instead of every pixel.

Inference times independent of image content
Inference time
(seconds)
Improved inference time
(seconds)
Deep Learning0.826581 ± 0.0430.242685 ± 0.016
Statistical0.457534 ± 0.0020.371515 ± 0.002
Thresholding0.007717 ± 0.0000.007717 ± 0.000

Conclusion

Conclusion

  • Semantic features extraction got CNN an edge
  • Rule-based method proved to be really fast but struggled on darker skin tones
  • Statistical method was prone to false positives
  • Involving multiple metrics debunked over-optimistic results
  • Data quality is important for performance

Future Work

  • Improve public data quality
  • Vision transformers
  • U-Nets on mobile devices [15]

Bibliography

  1. Ramirez, G. A., Fuentes, O., Crites Jr, S. L., Jimenez, M., &amp; Ordonez, J. (2014). Color analysis of facial skin: Detection of emotional state. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 468-473).
  2. Low, C. C., Ong, L. Y., Koo, V. C., &amp; Leow, M. C. (2020). Multi-audience tracking with RGB-D camera on digital signage. Heliyon, 6(9), e05107.
  3. Do, T. T., Zhou, Y., Zheng, H., Cheung, N. M., &amp; Koh, D. (2014, August). Early melanoma diagnosis with mobile imaging. In 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (pp. 6752-6757). IEEE.
  4. Brancati, N., De Pietro, G., Frucci, M., &amp; Gallo, L. (2017). Human skin detection through correlation rules between the YCb and YCr subspaces based on dynamic color clustering. Computer Vision and Image Understanding, 155, 33-42.
  5. Tarasiewicz, T., Nalepa, J., &amp; Kawulok, M. (2020, October). Skinny: A lightweight U-net for skin detection and segmentation. In 2020 IEEE International Conference on Image Processing (ICIP) (pp. 2386-2390). IEEE.
  6. Chicco, D., &amp; Jurman, G. (2020). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC genomics, 21(1), 1-13.
  7. Zhu, Q., Wu, C. T., Cheng, K. T., &amp; Wu, Y. L. (2004, October). An adaptive skin model and its application to objectionable image filtering. In Proceedings of the 12th annual ACM international conference on Multimedia (pp. 56-63).
  8. Phung, S. L., Bouzerdoum, A., &amp; Chai, D. (2005). Skin segmentation using color pixel classification: analysis and comparison. IEEE transactions on pattern analysis and machine intelligence, 27(1), 148-154.
  9. Schmugge, S. J., Jayaram, S., Shin, M. C., &amp; Tsap, L. V. (2007). Objective evaluation of approaches of skin detection using ROC analysis. Computer vision and image understanding, 108(1-2), 41-51.
  10. Tan, W. R., Chan, C. S., Yogarajah, P., &amp; Condell, J. (2011). A fusion approach for efficient human skin detection. IEEE Transactions on Industrial Informatics, 8(1), 138-147.
  11. Sanmiguel, J. C., &amp; Suja, S. (2013). Skin detection by dual maximization of detectors agreement for video monitoring. Pattern Recognition Letters, 34(16), 2102-2109.
  12. Casati, J. P. B., Moraes, D. R., &amp; Rodrigues, E. L. L. (2013, June). SFA: A human skin image database based on FERET and AR facial images. In IX workshop de Visao Computational, Rio de Janeiro.
  13. Kawulok, M., Kawulok, J., Nalepa, J., &amp; Smolka, B. (2014). Self-adaptive algorithm for segmenting skin regions. EURASIP Journal on Advances in Signal Processing, 2014(1), 1-22.
  14. Topiwala, A., Al-Zogbi, L., Fleiter, T., &amp; Krieger, A. (2019, October). Adaptation and evaluation of deep learning techniques for skin segmentation on novel abdominal dataset. In 2019 IEEE 19th International Conference on Bioinformatics and Bioengineering (BIBE) (pp. 752-759). IEEE.
  15. Ignatov, A., Byeoung-Su, K., Timofte, R., &amp; Pouget, A. (2021). Fast camera image denoising on mobile gpus with deep learning, mobile ai 2021 challenge: Report. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2515-2524).