S. Paisitkriangkrai, J. Sherrah, P. Janney, A. Van-den, and . Hengel, Effective semantic pixel labelling with convolutional networks and Conditional Random Fields, Proc. CVPR/W, 2015.

N. Audebert, B. L. Saux, and S. Lefevre, Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks, ISPRS J. Photogram. and Remote Sensing, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01636145

G. Facciolo, C. De-franchis, and E. Meinhardt-llopis, Automatic 3D reconstruction from multi-date satellite images, Proc. ICCV Workshops, 2015.

B. L. Saux, N. Yokoya, R. Hansch, M. Brown, and G. Hager, 2019 IEEE GRSS Data Fusion Contest: Large-Scale Semantic 3D Reconstruction, IEEE Geosci. and Remote Sensing Mag, 2019.
URL : https://hal.archives-ouvertes.fr/hal-02443778

S. Srivastava, M. Volpi, and D. Tuia, Joint height estimation and semantic labeling of monocular aerial images with CNNs, Proc. IGARSS, 2017.

M. Carvalho, B. L. Saux, P. Trouvé-peloux, A. Almansa, and F. Champagnat, On regression losses for deep depth estimation, Proc. ICIP, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01925321

A. Kendall, V. Badrinarayanan, and R. Cipolla, Bayesian SegNet: Model uncertainty in deep convolutional encoder-decoder architectures for scene understanding, 2015.

Y. Xu, B. Du, L. Zhang, D. Cerra, M. Pato et al., Advanced multi-sensor optical remote sensing for urban land use and land cover classification: Outcome of the 2018 IEEE GRSS Data Fusion Contest, IEEE J. Sel. Topics in Applied Earth Obs. and Remote Sensing, vol.12, issue.6, 2019.

M. Cramer, The DGPF-test on digital airborne camera evaluationoverview and test design, 2010.

Y. Xu, B. Du, and L. Zhang, Multi-source remote sensing data classification via fully convolutional networks and post-classification processing, Proc. IGARSS, 2018.

G. J. Brostow, J. Fauqueur, and R. Cipolla, Semantic object classes in video: A high-definition ground truth database, Patt. Rec. Lett, 2009.

J. Long, E. Shelhamer, and T. Darrell, Fully Convolutional Networks for Semantic Segmentation, Proc. CVPR, 2015.

J. Benediktsson, P. H. Swain, and O. Ersoy, Neural network approaches versus statistical methods in classification of multisource remote sensing data, Proc. IGARSS, 1989.

D. Marmanis, K. Schindler, J. D. Wegner, S. Galliani, M. Datcu et al., Classification with an edge: Improving semantic image segmentation with boundary detection, ISPRS J. Photogram, 2016.

D. Eigen and R. Fergus, Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture, Proc. ICCV, 2015.

I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, and N. Navab, Deeper depth prediction with fully convolutional residual networks, Proc. 3DV, 2016.

L. Mou and X. X. Zhu, IM2HEIGHT: Height estimation from single monocular imagery via fully residual convolutional-deconvolutional network, 2018.

P. Ghamisi and N. Yokoya, IMG2DSM: Height simulation from single imagery using conditional generative adversarial net, IEEE Geosci. and Remote Sensing Lett, 2018.

H. A. Amirkolaee and H. Arefi, Height estimation from single aerial images using a deep convolutional encoder-decoder network, ISPRS J. Photogram, 2019.

R. Caruana, Multitask learning, Machine Learning, 1997.

A. Kendall, Y. Gal, and R. Cipolla, Multi-task learning using uncertainty to weigh losses for scene geometry and semantics, Proc. CVPR, 2018.

Z. Chen, V. Badrinarayanan, C. Lee, and A. Rabinovich, GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks, Proc. ICML, 2018.

O. Sener and V. Koltun, Multi-task learning as multi-objective optimization, Proc. NeurIPS, 2018.

M. Jaggi, Revisiting Frank-Wolfe: Projection-free sparse convex optimization, Proc. ICML, 2013.

A. Kendall and Y. Gal, What uncertainties do we need in bayesian deep learning for computer vision, 2017.

M. Gerke, Use of the Stair Vision Library within the ISPRS 2D semantic labeling benchmark (Vaihingen), 2015.

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang et al., Automatic differentiation in pytorch, Proc. NeurIPS Workshops, 2017.

D. Kingma and J. Ba, Adam: A method for stochastic optimization, Proc. ICLR, 2014.

D. Cerra, Combining deep and shallow neural networks with ad hoc detectors for the classification of complex multi-modal urban scenes, Proc. IGARSS, 2018.

M. Cordts, The Cityscapes dataset for semantic urban scene understanding, Proc. CVPR, 2016.

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, Indoor segmentation and support inference from RGBD images, Proc. ECCV, 2012.