Switch to English version?

Yes

Переключитись на українську версію?

Так

Переключиться на русскую версию?

Да

Przełączyć się na polską wersję?

Tak

Login
Registration
- Welcome to Freelancehunt
  
  Work risk-free, saving time and money
  
  Login Registration

Post your project for free and start receiving proposals from freelancers within minutes after publication!

7 USD

Правильно використати SIMD C++

7 USD

C & C++

6 out of 6

project not completed

publication
open for proposals
terms confirmation
reservation
work underway
project not completed

Код весь написаний, треба тільки два метода виправити: який рахує за допомогою SSE, який рахує за допомогою AVX.

Є імплементація алгоритму на чистому CPU, яка працює правильно.

Сам алгоритм наступний:

Імплементацію можна глянути в прикріпленому проекті.

Помилка вже також відома, про неї написали на stackoverflow наступні два коментарі з розбором проблеми і навіть вирішенням!!!

1) "*(pInputSignal + i - j) is incorrect in case of SSE, because it's not an i-j offset away from current value, it's (i-j) * 4 . THe thing is, as I remember it, the idea of using pointer that way is incorrect unless intrinsics had changed since then - in my time one had to "load" values into an instance of __m128 in this case, as H(J) and X(I-J) are in unaligned location (and sequence breaks). "

2) "Since you care about individual floats and their order, probably best to use const float*, with _mm_loadu_ps instead of just dereferencing (which is like _mm_load_ps). That way you can easily do unaligned loads that get the floats you want into the vector element positions you want, and the pointer math works the same as for scalar. You just have to take into account that load(ptr) actually gets you a vector of elements from ptr+0..3."

Необхідно виправити помилки (вони ідентичні - тобто якщо напишете правильно в одному, то інший аналогічно робиться) в методах computeConvolutionOutputSSE і computeConvolutionOutputAVX

Після запуску коду в результаті останньої перевірки на рівність трьох файлів (від CPU, від SSE, від AVX) має виявитись, що вони всі рівні (даний метод вже написаний).

Applications 1

Application viewing is only available registered users.

Client's review of cooperation with freelancer

Quality

Professionalism

Cost

Contactability

Deadlines

The execution does not intend to send outcome files, and therefore there is a doubt that he in a wrong way wants to get a payment and does not make any fixes or something after the payment is made. Don’t spend your time on a person who doesn’t want to follow the rules of the service.

Mike J. | Safe

Proposals 1

1 proposal concealed

Mike J.

Slavyansk 28 1