Si vous assurez l’administration d’infrastructures informatiques, vous vous êtes certainement intéressé à Centreon, une solution open source qui permet de réaliser de la supervision des éléments de votre réseau, de vos serveurs et des différents services liés.
Centreon est une solution très souple qui s’appuie par exemple sur le protocole SNMP pour lire l’état de éléments réseau, mais également sur des sondes (ou plugins) qui peuvent monitorer de manière complètement modulable tel ou tel service. Le web fourmille d’ailleurs de collections de plugins pour tout un tas de choses.
Il est bien entendu possible de développer vos propres sondes Nagios/Centreon. Centreon diffuse d’ailleurs une documentation pour tous ceux qui souhaitent développer leur propres sondes.
Pour monitorer des serveurs et services Apple sous Mac OS X, j’ai repéré une collection d’outils développés par Jedda Wignall, qui propose pas mal de tutoriels sur son blog en lien avec cette thématique. J’ai donc commencé à utiliser ses outils.
Le monitoring de la taille d’un dossier
Un des outils proposés par Jedda Wignall est un script qui permet de surveiller la taille d’un dossier et de définir un seuil d’alerte et un seuil de criticité. C’est pratique pour surveiller par exemple la taille de dossiers qui ont tendance à grossir de manière chronique.
Le script original est ici :
#!/bin/bash # Check Folder Size # by Dan Barrett # http://yesdevnull.net # v1.1 - 28 October 2013 # Added OS X 10.9 Support and fixes a bug where folders with spaces in their name would fail with du. # v1.0 - 9 August 2013 # Initial release. # Checks to see how large the folder is and warns or crits if over a specified size. # Defaults to MB # Arguments: # -f Path to folder # -b Block size (i.e. data returned in MB, KB or GB - enter as m, k or g) # -w Warning threshold for storage used # -c Critical threshold for storage used # Example: # ./check_folder_size.sh -f /Library/Application\ Support/ -w 2048 -c 4096 # Supports: # Untested but I'm sure it works fine on OS X 10.6 and 10.7 # * OS X 10.8.x # * OS X 10.9 folderPath="" blockSize="m" blockSizeFriendly="MB" warnThresh="" critThresh="" # Get the flags! while getopts "f:b:w:c:" opt do case $opt in f ) folderPath=$OPTARG;; b ) blockSize=$OPTARG;; w ) warnThresh=$OPTARG;; c ) critThresh=$OPTARG;; esac done if [ "$folderPath" == "" ] then printf "ERROR - You must provide a file path with -f!\n" exit 2 fi if [ "$warnThresh" == "" ] then printf "ERROR - You must provide a warning threshold with -w!\n" exit 2 fi if [ "$critThresh" == "" ] then printf "ERROR - You must provide a critical threshold with -c!\n" exit 2 fi if [ "$blockSize" == "k" ] then blockSizeFriendly="KB" fi if [ "$blockSize" == "g" ] then blockSizeFriendly="GB" fi folderSize=`du -s$blockSize "$folderPath" | grep -E -o "[0-9]+"` if [ "$folderSize" -ge "$critThresh" ] then printf "CRITICAL - folder is $folderSize $blockSizeFriendly in size | folderSize=$folderSize;$warnThresh;$critThresh;\n" exit 2 elif [ "$folderSize" -ge "$warnThresh" ] then printf "WARNING - folder is $folderSize $blockSizeFriendly in size | folderSize=$folderSize;$warnThresh;$critThresh;\n" exit 1 fi printf "OK - folder is $folderSize $blockSizeFriendly in size | folderSize=$folderSize;$warnThresh;$critThresh;\n" exit 0
Par exemple pour surveiller le dossier /users
et être alerté quand il dépasse 250 Go et de passer en état critique quand il dépasse les 300 Go il faudra utiliser la commande :
./check_folder_size.sh -f /users -b g -w 250 -c 300
Le problème rencontré avec check_folder_size.sh sur les dossiers très volumineux
Le problème avec cette sonde, c’est qu’elle met un certain temps à retourner une réponse lorsque le dossier est très volumineux et contient beaucoup d’éléments. Et dans certains cas, on peut dépasser facilement les 30 secondes avant d’avoir un retour. Ce qui peut poser problème du côté du serveur Centreon. Il est évidemment possible d’augmenter dans les paramètres de Centreon le “Temps maximum d’exécution d’une commande du gestionnaire d’évènements” (event_handler_timeout
) et le “Délai du contrôle de service” (service_check_timeout
) mais au-delà de 30 secondes ou 1 minute, cela devient contreproductif et peut vous faire passer à côté d’autres problèmes.
Du coup, après quelques conseils, j’ai entrepris une refonte de la sonde, de sorte qu’elle puisse enregistrer l’état précédent pour chaque dossier surveillé (avec un horodatage) et qu’on puisse lui passer un paramètre de délai de réponse (en secondes).
Ainsi, si l’on utilise cette option -t
la sonde remontera le dernier état connu pour le dossier surveillé si elle n’a pas fini d’en calculer la taille dans le délai voulu.
En parallèle, en tâche de fond, la sonde continuera son calcul pour mettre à jour le fichier tampon dans lequel elle enregistre le dernier état du dossier.
J’en ai profité pour utiliser aussi la possibilité d’avoir des informations détaillées (LONGOUTPUT
) dans le retour de la sonde. C’est dans cette sortie détaillée que sera indiqué si la sonde a utilisé la dernière valeur enregistrée dans le fichier tampon ou si le calcul de taille du dossier a été réalisé dans les délais souhaités.
L’option -t
que j’ai ajoutée est facultative, il est possible de poursuivre l’utilisation de la sonde comme elle fonctionnait avant cette refonte.
Voici le code de la sonde après avoir réalisé ces modifications :
#!/bin/bash # Check Folder Size - Nagios Probe for OSX # Original by Dan Barrett - http://yesdevnull.net # Modded by Yvan GODARD - godardyvan@gmail.com - http://www.yvangodard.me # v1.2 - 31 Octobre 2015 # Add options to check write outpout in a specific file for very large folder. # Complete refactoring # v1.1 - 28 October 2013 # Added OS X 10.9 Support and fixes a bug where folders with spaces in their name would fail with du. # v1.0 - 9 August 2013 # Initial release. # Options version="check_folder_size v1.2 - 2015 - by Yvan Godard http://www.yvangodard.me & Dan Barrett http://yesdevnull.net" scriptDir=$(dirname "${0}") scriptName=$(basename "${0}") scriptNameWithoutExt=$(echo "${scriptName}" | cut -f1 -d '.') help="no" folderPath="" blockSize="m" warnThresh="" critThresh="" withTimeLimit=0 timeLimit="" thisTime=0 actualSizeK="" previousSizeK="" previousSizeM="" previousSizeG="" previousDate="" previousLineBufferFile="" newLineBufferFile="" optsCount=0 bufferFolder="/var/${scriptNameWithoutExt}" bufferFile="${bufferFolder%/}/bufferFile.txt" messageContent=$(mktemp /tmp/${scriptNameWithoutExt}_messageContent.XXXXX) duTempScript=$(mktemp /tmp/${scriptNameWithoutExt}_duTempScript.XXXXX) help () { echo "" echo "${version}" echo "" echo "This tool is a Nagios probe for Mac OS X System." echo "It's designed to check how large a folder is and to warn or crit if it's over a specified size." echo "" echo "Disclamer:" echo "This tool is provide without any support and guarantee." echo "" echo "Synopsis:" echo "./${scriptName} [-h] | -f <folder> -w <warning> -c <critical>" echo " [-b <block size>] [-t <time limit>]" echo "" echo "Example:" echo "./${scriptName} -f /Library/Application\ Support/ -w 2048 -c 4096 -t 45 -b g" echo "" echo "To print this help:" echo " -h : Prints this help then exit" echo "" echo "Mandatory arguments:" echo " -f <folder>: Complete path to folder you want to check" echo " -w <warning>: Warning threshold for storage used" echo " -c <critical>: Critical threshold for storage used" echo "" echo "Optional options:" echo " -b <block size>: Block size (i.e. data returned in MB, KB or GB, enter as m, k or g)" echo " defaults: '-w ${blockSize}' (i.e. ${blockSizeFriendly})" echo " -t <time limit>: Delay (in seconds) in which the probe must print an outpout." echo " If the script has not finished calculating the size of the folder within that time" echo " (on a very large file, for example), the script will display the last state known for this." echo "" } function endThisScript () { [[ ! -z ${3} ]] && echo ${3} [[ ! -z $(cat ${messageContent}) ]] && echo "" && cat ${messageContent} [[ -e ${messageContent} ]] && rm -R ${messageContent} if [[ "${2}" == "removeDuTemp" ]]; then [[ -e ${duTemp} ]] && rm -R ${duTemp} [[ -e ${duTempScript} ]] && rm -R ${duTempScript} [[ -e ${lockFile} ]] && rm -R ${lockFile} fi [[ ${1} -eq 0 ]] && [[ -e ${lockFile} ]] && rm -R ${lockFile} exit ${1} } function sizeToK () { # Test if function have 2 parameters [[ $# -ne 2 ]] && endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - Function 'sizeToK' used without mandatory parameters!" if [[ "${1}" == "k" ]]; then echo ${2} elif [[ "${1}" == "m" ]]; then echo $((${2}*1024)) elif [[ "${1}" == "g" ]]; then echo $((${2}*1024*1024)) fi } function sizeToM () { # Test if function have 2 parameters [[ $# -ne 2 ]] && endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - Function 'sizeToM' used without mandatory parameters!" if [[ "${1}" == "k" ]]; then echo $((${2}/1024)) elif [[ "${1}" == "m" ]]; then echo ${2} elif [[ "${1}" == "g" ]]; then echo $((${2}*1024)) fi } function sizeToG () { # Test if function have 2 parameters [[ $# -ne 2 ]] && endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - Function 'sizeToG' used without mandatory parameters!" if [[ "${1}" == "k" ]]; then echo $((${2}/1048576)) elif [[ "${1}" == "m" ]]; then echo $((${2}/1024)) elif [[ "${1}" == "g" ]]; then echo ${2} fi } function processingOutputTest () { if [[ "${folderSize}" -ge "${critThresh}" ]]; then endThisScript 2 ${1} "CRITICAL - folder is ${folderSize} ${blockSizeFriendly} in size | folderSize=${folderSize};${warnThresh};${critThresh}" elif [[ "${folderSize}" -ge "${warnThresh}" ]]; then endThisScript 1 ${1} "WARNING - folder is ${folderSize} ${blockSizeFriendly} in size | folderSize=${folderSize};${warnThresh};${critThresh}" else endThisScript 0 ${1} "OK - folder is ${folderSize} ${blockSizeFriendly} in size | folderSize=${folderSize};${warnThresh};${critThresh}" fi } function testInteger () { test ${1} -eq 0 2>/dev/null if [[ $? -eq 2 ]]; then echo 0 else echo 1 fi } # Get the flags! while getopts "ht:f:b:w:c:" opt do case $opt in h) help="yes" ;; t) timeLimit=${OPTARG} withTimeLimit=1 ;; f) folderPath=${OPTARG} let optsCount=${optsCount}+1 ;; b) blockSize=$(echo ${OPTARG} | sed 'y/KMG/kmg/') ;; w) warnThresh=${OPTARG} let optsCount=${optsCount}+1 ;; c) critThresh=${OPTARG} let optsCount=${optsCount}+1 ;; esac done # Print help then exit [[ ${help} = "yes" ]] && help && endThisScript 0 # Test mandatory options [[ "${folderPath}" == "" ]] && echo "> You must provide a file path with -f!" >> ${messageContent} [[ "${warnThresh}" == "" ]] && echo "> You must provide a warning threshold with -w!" >> ${messageContent} [[ "${critThresh}" == "" ]] && echo "> You must provide a critical threshold with -c!" >> ${messageContent} [[ ${optsCount} != "3" ]] && help && endThisScript 2 "dontRemoveDuTemp" "ERROR - All mandatory options are not filled." # Test root access [[ `whoami` != 'root' ]] && endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - This tool needs a root access. Use 'sudo'." # Test format blockSize if [[ "${blockSize}" == "m" ]]; then blockSizeFriendly="MB" elif [[ "${blockSize}" == "k" ]]; then blockSizeFriendly="KB" elif [[ "${blockSize}" == "g" ]]; then blockSizeFriendly="GB" else echo "You have entered '-b ${blockSize}' but this parameter can only be filled with k (KB), m (MB) or g (GB)." >> ${messageContent} endThisScript 2 "removeDuTemp" "ERROR - blocksize parameter can only be k, m or g." fi # Test warnThresh and critThresh are integer [[ $(testInteger ${warnThresh}) -ne 1 ]] && endThisScript 2 "dontRemoveDuTemp" "ERROR - Option -w have to be an integer." [[ $(testInteger ${critThresh}) -ne 1 ]] && endThisScript 2 "dontRemoveDuTemp" "ERROR - Option -c have to be an integer." # Test timeLimit is an integer [[ ${withTimeLimit} -eq 1 ]] && [[ $(testInteger ${timeLimit}) -ne 1 ]] && endThisScript 2 "dontRemoveDuTemp" "ERROR - Option -t have to be an integer." # Create hash to identify our test hashTestWithOptions=$(echo "$(dirname ${folderPath%/})/$(basename ${folderPath%/})" | md5) duTemp=/tmp/${scriptNameWithoutExt}_${hashTestWithOptions}_duTemp # Add lockfile to avoid multiple instances lockFile=/tmp/${scriptNameWithoutExt}_${hashTestWithOptions}.lock lockfile -r 0 ${lockFile} > /dev/null 2>&1 if [[ $? -ne 0 ]]; then echo "$(date)" > ${messageContent} echo "You tried to launch multiple instances of this tool to monitor the same directory '${folderPath%/}'." >> ${messageContent} echo "But, this is not possible. Other instance was launched at the following date: $(date -j -f "%s" "$(stat -f "%m" ${lockFile})" +"%Y/%m/%d %T")." >> ${messageContent} endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - Impossible to run simultam multiple instances of this tool for the same path '${folderPath%/}'" fi if [[ ${withTimeLimit} = "1" ]]; then # Test access to write on buffer folder & buffer file if [[ ! -d ${bufferFolder} ]]; then mkdir -p ${bufferFolder} [[ $? -ne 0 ]] && endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - Impossible to create the buffer path '${bufferFolder}'" fi if [[ ! -f ${bufferFile} ]]; then touch ${bufferFile} [[ $? -ne 0 ]] && endThisScript 2 "dontRemoveDuTemp" "FATAL ERROR - Impossible to create the buffer file '${bufferFile}'" fi # Test if a previous outpout is stored in buffer file cat ${bufferFile} | grep ${hashTestWithOptions} > /dev/null 2>&1 if [[ $? -eq 0 ]]; then # Reading previous values previousLineBufferFile=$(cat ${bufferFile} | grep ^${hashTestWithOptions}) # Launching test in background [[ -e ${duTemp} ]] && rm -R ${duTemp} nohup du -s${blockSize} "${folderPath%/}" | grep -E -o "[0-9]+" 1>${duTemp} 2>&1 & # Loop running until [[ ${thisTime} -eq $((${timeLimit}-5)) ]] do # Test if background job is done if [[ ! -z $(cat ${duTemp}) ]]; then # Test is done in background thisTime=$((${timeLimit}-6)) # Convert actual size to Kb with 'sizeToK' function actualSizeK=$(sizeToK ${blockSize} $(cat ${duTemp})) # Writing outpout to buffer file newLineBufferFile="${hashTestWithOptions};${actualSizeK};$(date +%s)" cat ${bufferFile} | sed 's/'"${previousLineBufferFile}"'/'"${newLineBufferFile}"'/g' >> ${bufferFile}.new \ && mv ${bufferFile} ${bufferFile}.old && mv ${bufferFile}.new ${bufferFile} && rm ${bufferFile}.old folderSize=$(cat ${duTemp}) echo "$(date)" > ${messageContent} echo "Actual size of folder '${folderPath%/}' has been saved on buffer file '${bufferFile}'." >> ${messageContent} processingOutputTest "removeDuTemp" fi sleep 1 let thisTime=${thisTime}+1 done # Reading previous values previousSizeK=$(echo ${previousLineBufferFile} | cut -d ';' -f 2) previousDate=$(echo ${previousLineBufferFile} | cut -d ';' -f 3) previousDateExplicit=$(date -r ${previousDate}) echo "$(date)" > ${messageContent} echo "The script has not finished calculating the size of the folder within the delay of ${timeLimit} seconds." >> ${messageContent} echo "So, output is previous value. Values are dated: ${previousDateExplicit}." >> ${messageContent} # Processing previous size to blocksize if [[ "${blockSize}" == "k" ]]; then folderSize=${previousSizeK} elif [[ "${blockSize}" == "m" ]]; then folderSize=$(sizeToM k ${previousSizeK}) elif [[ "${blockSize}" == "g" ]]; then folderSize=$(sizeToG k ${previousSizeK}) fi # Creating temp script echo "#!/bin/bash" > ${duTempScript} echo 'du -sk '${folderPath%/}' | grep -E -o "[0-9]+" > '${duTemp} >> ${duTempScript} echo 'newLineBufferFile="'${hashTestWithOptions}';$(cat '${duTemp}');$(date +%s)"' >> ${duTempScript} echo "cat "${bufferFile}" | sed 's/"${previousLineBufferFile}"/'\"\${newLineBufferFile}\"'/g' >> "${bufferFile}".new && mv "${bufferFile}" "${bufferFile}".old && mv "${bufferFile}".new "${bufferFile}" && rm "${bufferFile}".old" >> ${duTempScript} echo "[[ -e "${duTemp}" ]] && rm -R "${duTemp} >> ${duTempScript} echo "rm ${duTempScript}" >> ${duTempScript} echo "rm ${lockFile}" >> ${duTempScript} echo "exit 0" >> ${duTempScript} # Chmod script to be executed chmod +x ${duTempScript} # Run this script in background (/bin/bash ${duTempScript} > /dev/null 2>&1 &) processingOutputTest "dontRemoveDuTemp" else # First time running test for this folder > writing outpout to Buffer file folderSize=$(du -s${blockSize} "${folderPath%/}" | grep -E -o "[0-9]+") # echo $folderSize # Convert actual size to Kb with 'sizeToK' function actualSizeK=$(sizeToK ${blockSize} ${folderSize}) newLineBufferFile="${hashTestWithOptions};${actualSizeK};$(date +%s)" echo ${newLineBufferFile} >> ${bufferFile} echo "$(date)" > ${messageContent} echo "This is the first time this test is running with option '-t'." >> ${messageContent} echo "We don't have any previous value to use if the script has not finished calculating the size of the folder within that time." >> ${messageContent} echo "But from now we will have one!" >> ${messageContent} echo "" >> ${messageContent} echo "Actual size of folder '${folderPath%/}' has been saved on buffer file '${bufferFile}'." >> ${messageContent} processingOutputTest "removeDuTemp" fi elif [[ ${withTimeLimit} = "0" ]]; then folderSize=$(du -s${blockSize} "${folderPath%/}" | grep -E -o "[0-9]+") processingOutputTest "removeDuTemp" fi endThisScript 0 "removeDuTemp"
Je reviendrai plus tard sur d’autres outils que j’ai modifiés pour utiliser Centreon en environnement Apple Mac OS X ou bien encore sur les mesures de sécurité à mettre en oeuvre pour utiliser ces sondes.