Some problems in openMP's parallel for

Overview

Somehow I started preparing for the ASC competition.
When I’m trying my second demo pi, which is a program running Monte-Carlo algorithm with multi-threading tech, I encountered a question.

Question-Solution

1. Initial program

// pi.cpp

#include <iostream>

#include <fstream>

#include <omp.h>

using namespace std;

fstream fin("/dev/urandom", ios::in|ios::binary);

u_int32_t randomNum() {

    u_int32_t ret;

    fin.read((char*)&ret, sizeof(u_int32_t));

    return ret;

}

int main() {

    int64_t inCircleHits = 0;

    int64_t totalHits = 1000000;

#pragma omp parallel for num_threads(4) reduction(+:inCircleHits)

    for (int i=1 ; i<=totalHits ; i++) {

        double x=1.0*randomNum()/__UINT32_MAX__;

        double y=1.0*randomNum()/__UINT32_MAX__;

        if ((x*x+y*y)<=1.0) {

            inCircleHits++;

        }

    }   

    clog << 4.0*inCircleHits/totalHits << endl;

    fin.close();

    return 0;

}

Running environment and results:

LLVM-clang++ + external openMP library on macOS: 3.9969 (always 3.9+)

GCC-g++ + built-in openMP library on macOS: 10: Bus Error

GCC-g++ + built-in openMP library on linux: Segmentation fault(core dumped)

Analysis: ??? (Browse online for a couple of hours…)
Gain discovery: On the Internet tutorials & examples, the loop variables are always inititialized with 0.

2. Second Trial

At this time, I initialized the loop variable i with 0

for (int i=0 ; i<totalHits ; i++)

Running environment and results:

LLVM-clang++ + external openMP library on macOS: 3.9925 (always 3.9+)

GCC-g++ + built-in openMP library on macOS: 3.9912 (always 3.9+)

GCC-g++ + built-in openMP library on linux: Segmentation fault(core dumped)

Analysis: Errr… at least we fixed a internal exception.
But why do the answers incorrect? And why does it fails on Linux platform?
Hypothesis: Maybe std::fstream is to blame. The objects in c++ (may be) not multiThread-safe.

Trial 3

Try substitute std::fstream with the old friend FILE *
Changed all cpp to c

// pi.c

#include <stdio.h>

#include <stdlib.h>

#include <omp.h>

FILE *randomFile;

u_int32_t randomNum() {

    u_int32_t ret;

    fread((char*)(&ret), sizeof(u_int32_t), 1, randomFile);

    return ret;

}

int main() {

    randomFile = fopen("/dev/urandom", "rb");

    int64_t inCircleHits = 0;

    int64_t totalHits = 1000000;

#pragma omp parallel for num_threads(4) reduction(inCircleHits)

    for (int i=0 ; i<totalHits ; i++) {

        double x=1.0*randomNum()/__UINT32_MAX__;

        double y=1.0*randomNum()/__UINT32_MAX__;

        if ((x*x+y*y)<=1.0) {

            inCircleHits++;

        }

    }   

    printf("%f", 4.0*inCircleHits/totalHits);

    fclose(randomFile);

    fflush(stdout);

    return 0;

}

Running environment and results:

LLVM-clang++ + external openMP library on macOS: 3.135 (correct)

GCC-g++ + built-in openMP library on macOS: 3.141 (correct)

GCC-g++ + built-in openMP library on linux: 3.132 (correct)

Yea. As we could see, it’s running well.

4. variable control

Let’s see what happens if we initialize loop variable i with 1

for (int i=1 ; i<=totalHits ; i++)

Running environment and results

LLVM-clang++ + external openMP library on macOS: (correct)

GCC-g++ + built-in openMP library on macOS: (correct)

GCC-g++ + built-in openMP library on linux: (correct)

So, the loop variable isn’t the decicive factor, std::fstream is!

Conclusion

Do not ever use cpp’s objects unless you make sure it’s safe under a multiThreading context.