Calculation result becomes strange at the time of asynchronous execution with memcpyPeerAsync() #14

5enxia · 2022-03-02T05:11:46Z

def dot(cls, local_A, x, out):
        # Copy vector data to All devices
        for i in range(cls.begin, cls.end+1):
            index = i-cls.begin
            cp.cuda.runtime.memcpyPeerAsync(cls.x[index].data.ptr, i, x.data.ptr, cls.end, cls.nbytes, cls.streams[index].ptr)
        # dot
        for i in range(cls.begin, cls.end+1):
            index = i-cls.begin
            Device(i).use()
            cls.streams[index].synchronize()
            cls.y[index] = cls.A[index].dot(cls.x[index])
        # Gather caculated element from All devices
        for i in range(cls.begin, cls.end+1):
            Device(i).synchronize()
            index = i-cls.begin
            cp.cuda.runtime.memcpyPeerAsync(cls.out[index*cls.local_local_N].data.ptr, cls.end, cls.y[index].data.ptr, i, cls.local_local_nbytes, cls.streams[index].ptr)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation result becomes strange at the time of asynchronous execution with memcpyPeerAsync() #14

Calculation result becomes strange at the time of asynchronous execution with memcpyPeerAsync() #14

5enxia commented Mar 2, 2022

Calculation result becomes strange at the time of asynchronous execution with memcpyPeerAsync() #14

Calculation result becomes strange at the time of asynchronous execution with memcpyPeerAsync() #14

Comments

5enxia commented Mar 2, 2022