과학 컴퓨팅의 F # 성능

Program Tip

과학 컴퓨팅의 F # 성능

programtip 2020. 10. 28. 20:31

과학 컴퓨팅의 F # 성능

F # 성능이 C ++ 성능과 어떻게 비교되는지 궁금합니다. Java와 관련하여 비슷한 질문을했는데 Java가 무거운 numbercrunching에 적합하지 않다는 인상을 받았습니다.

F #이 더 확장 가능하고 성능이 뛰어나야한다는 것을 읽었지만 실제 성능은 C ++와 어떻게 비교됩니까? 현재 구현에 대한 구체적인 질문은 다음과 같습니다.

부동 소수점을 얼마나 잘 수행합니까?
벡터 명령을 허용합니까
컴파일러 최적화에 얼마나 친숙합니까?
메모리 공간은 얼마나 큽니까? 메모리 지역성을 세밀하게 제어 할 수 있습니까?
Cray와 같은 분산 메모리 프로세서를위한 용량이 있습니까?
무거운 숫자 처리가 관련된 계산 과학에 관심을 가질 수있는 기능은 무엇입니까?
그것을 사용하는 실제 과학 컴퓨팅 구현이 있습니까?

감사

F #은 .NET CLR이 허용하는 한 빨리 부동 소수점 계산을 수행합니다. C # 또는 다른 .NET 언어와 크게 다르지 않습니다.
F #은 벡터 명령어 자체를 허용하지 않지만 CLR에 이러한 API가있는 경우 F #은이를 사용하는 데 문제가 없습니다. 예를 들어 Mono를 참조하십시오 .
내가 아는 한 현재 F # 컴파일러는 하나뿐이므로 "최적화와 관련하여 F # 컴파일러가 얼마나 좋은가?"라는 질문이있을 수 있습니다. 대답은 "잠재적으로는 C # 컴파일러만큼 좋을 것입니다. 현재로서는 조금 더 나빠질 것입니다." F #은 컴파일 타임에 인라인을 지원한다는 점에서 예를 들어 C #과 다르므로 잠재적으로 제네릭에 의존하는보다 효율적인 코드를 허용합니다.
F # 프로그램의 메모리 공간은 다른 .NET 언어의 공간과 비슷합니다. 할당 및 가비지 수집에 대한 제어 수준은 다른 .NET 언어와 동일합니다.
분산 메모리 지원에 대해 모르겠습니다.
F #에는 배열 및 목록과 같은 플랫 데이터 구조를 처리하는 데 매우 유용한 기본 요소가 있습니다. 예를 들어 Array 모듈의 내용을보십시오 : map, map2, mapi, iter, fold, zip ... 배열은 본질적으로 좋은 메모리 위치 속성으로 인해 과학 컴퓨팅에서 인기가 있습니다.
F #을 사용하는 과학 계산 패키지의 경우 Jon Harrop이 수행하는 작업을 살펴볼 수 있습니다.

F # 성능이 C ++ 성능과 어떻게 비교되는지 궁금합니다.

응용 프로그램에 따라 크게 다릅니다. 다중 스레드 프로그램에서 정교한 데이터 구조를 광범위하게 사용하는 경우 F #이 큰 도움이 될 것입니다. 대부분의 시간이 배열을 변경하는 타이트한 숫자 루프에 소비되는 경우 C ++는 2-3 배 더 빠를 수 있습니다.

사례 연구 : 광선 추적기 여기서 내 벤치 마크 는 계층 적 컬링을위한 트리와 수치 광선-구 교차 코드를 사용하여 출력 이미지를 생성합니다. 이 벤치 마크는 몇 년이 지났으며 C ++ 코드는 수년에 걸쳐 수십 번 개선되었으며 수십만 명의 사람들이 읽었습니다. Microsoft의 Don Syme는 MSVC로 컴파일하고 OpenMP를 사용하여 병렬화 할 때 가장 빠른 C ++ 코드보다 약간 빠른 F # 구현을 작성했습니다 .

F #이 더 확장 가능하고 성능이 뛰어나야한다는 것을 읽었지만 실제 성능은 C ++와 어떻게 비교됩니까?

코드 개발은 C ++보다 F #을 사용하는 것이 훨씬 쉽고 빠르며, 이는 유지 관리뿐 아니라 최적화에도 적용됩니다. 결과적으로 프로그램 최적화를 시작할 때 C ++ 대신 F #을 사용하면 같은 양의 노력으로 훨씬 더 큰 성능 향상을 얻을 수 있습니다. 그러나 F #은 더 높은 수준의 언어이므로 결과적으로 성능에 대한 제한이 낮습니다. 따라서 최적화에 무한한 시간을 할애한다면 이론적으로는 항상 C ++로 더 빠른 코드를 생성 할 수 있어야합니다.

이것은 물론 C ++가 Fortran에 비해, Fortran이 손으로 작성한 어셈블러에 비해 얻은 것과 똑같은 이점입니다.

사례 연구 : QR 분해 LAPACK과 같은 라이브러리에서 제공하는 선형 대수의 기본적인 수치 방법입니다. 참조 LAPACK 구현은 2,077 줄의 Fortran입니다. 동일한 수준의 성능을 달성하는 80 줄 미만의 코드 로 F # 구현 을 작성 했습니다 . 그러나 참조 구현은 빠르지 않습니다. 인텔의 MKL (Math Kernel Library)과 같은 공급 업체 조정 구현은 종종 10 배 더 빠릅니다. 놀랍게도, 저는 Intel 하드웨어에서 실행되는 Intel의 구현 성능을 뛰어 넘는 F # 코드를 최적화 할 수 있었지만 코드를 150 줄 미만으로 완전히 일반화했습니다 (단 정밀도 및 배정 밀도, 복잡하고 심지어 기호 행렬까지도 처리 할 수 있습니다!). 크고 얇은 매트릭스의 경우 F # 코드가 인텔 MKL보다 최대 3 배 빠릅니다.

이 사례 연구의 교훈은 F #이 공급 업체 조정 라이브러리보다 빠르다고 기대해야한다는 것이 아니라 인텔과 같은 전문가조차도 낮은 수준의 언어 만 사용하는 경우 생산적인 높은 수준의 최적화를 놓칠 것이라는 점입니다. 인텔의 수치 최적화 전문가는 도구가 매우 번거롭고 F #이 쉽게 수행 할 수 있기 때문에 병렬 처리를 완전히 활용하지 못한 것 같습니다.

부동 소수점을 얼마나 잘 수행합니까?

성능은 ANSI C와 비슷하지만 일부 기능 (예 : 반올림 모드)은 .NET에서 사용할 수 없습니다.

벡터 명령을 허용합니까

아니.

컴파일러 최적화에 얼마나 친숙합니까?

이 질문은 의미가 없습니다. F #은 단일 컴파일러를 사용하는 Microsoft의 독점 .NET 언어입니다.

메모리 공간은 얼마나 큽니까?

빈 애플리케이션은 여기서 1.3Mb를 사용합니다.

메모리 지역성을 세밀하게 제어 할 수 있습니까?

대부분의 메모리 안전 언어보다 좋지만 C만큼 좋지는 않습니다. 예를 들어 F #에서 임의의 데이터 구조를 "구조체"로 표시하여 상자를 풀 수 있습니다.

Cray와 같은 분산 메모리 프로세서를위한 용량이 있습니까?

"용량"이 의미하는 바에 따라 다릅니다. 해당 Cray에서 .NET을 실행할 수있는 경우 다음 언어와 마찬가지로 F #으로 메시지 전달을 사용할 수 있지만 F #은 주로 데스크톱 멀티 코어 x86 컴퓨터 용입니다.

무거운 숫자 처리가 관련된 계산 과학에 관심을 가질 수있는 기능은 무엇입니까?

메모리 안전은 세그먼트 오류 및 액세스 위반이 발생하지 않음을 의미합니다. .NET 4의 병렬 처리에 대한 지원이 좋습니다. Visual Studio 2010에서 F # 대화 형 세션을 통해 즉시 코드를 실행하는 기능은 대화 형 기술 컴퓨팅에 매우 유용합니다.

그것을 사용하는 실제 과학 컴퓨팅 구현이 있습니까?

우리의 상용 제품 F # 과학 컴퓨팅은 이미 수백 명의 사용자를 가지고있다.

그러나 귀하의 질문은 과학 컴퓨팅을 대화 형 기술 컴퓨팅 (예 : MATLAB, Mathematica)이 아닌 고성능 컴퓨팅 (예 : Cray)으로 생각한다는 것을 나타냅니다. F #은 후자를위한 것입니다.

다른 사람들이 말한 것 외에도 F #에 대한 중요한 점이 하나 있는데 그것은 병렬 처리 입니다. 일반 F # 코드의 성능은 CLR에 의해 결정되지만 F #에서 LAPACK을 사용할 수 있거나 프로젝트의 일부로 C ++ / CLI를 사용하여 네이티브 호출을 수행 할 수 있습니다.

그러나 잘 설계된 기능적 프로그램은 병렬화가 훨씬 더 쉬운 경향이 있습니다. 즉, 과학적 컴퓨팅을 수행하는 경우 확실히 사용할 수있는 멀티 코어 CPU를 사용하여 쉽게 성능을 얻을 수 있습니다. 다음은 몇 가지 관련 링크입니다.

F # 및 Task-Parallel 라이브러리 (기계 학습 작업을 수행하는 Jurgen van Gael의 블로그)
병렬주의에 관한 SO의 또 다른 흥미로운 대답
F #에서 병렬 LINQ를 사용 하는 예
내 책의 14 장 에서는 병렬 처리에 대해 설명합니다 ( 소스 코드 사용 가능).

분산 컴퓨팅과 관련하여 .NET 플랫폼에서 사용할 수있는 모든 분산 컴퓨팅 프레임 워크를 사용할 수 있습니다. F #과 잘 작동하는 MPI.NET 프로젝트가 있지만 MSR 프로젝트 인 DryadLINQ를 사용할 수도 있습니다.

일부 문서 : .NET 용 F # MPI 도구 , MPI.NET과의 동시성
DryadLINQ 프로젝트 홈페이지

모든 언어 / 성능 비교와 마찬가지로 마일리지는 코딩 능력에 따라 크게 달라집니다.

F #은 OCaml의 파생물입니다. 숫자 처리 성능이 매우 중요한 금융 세계에서 OCaml이 많이 사용된다는 사실에 놀랐습니다. OCaml이 가장 빠른 C 및 C ++ 컴파일러와 동등한 성능을 제공하는 더 빠른 언어 중 하나라는 사실에 더욱 놀랐습니다.

F #은 CLR을 기반으로합니다 . CLR에서 코드는 Common Intermediate Language라는 바이트 코드 형식으로 표현됩니다. 따라서 JIT의 최적화 기능의 이점을 누리고 코드가 잘 작성된 경우 C # (반드시 C ++는 아님)과 유사한 성능을 제공합니다.

CIL 코드는 NGEN (Native Image Generator)을 사용하여 런타임 전에 별도의 단계에서 네이티브 코드로 컴파일 할 수 있습니다. 이렇게하면 CIL에서 네이티브로의 컴파일이 더 이상 필요하지 않으므로 이후의 모든 소프트웨어 실행 속도가 빨라집니다.

One thing to consider is that functional languages like F# benefit from a more declarative style of programming. In a sense, you are over-specifying the solution in imperative languages such as C++, and this limits the compiler's ability to optimize. A more declarative programming style can theoretically give the compiler additional opportunities for algorithmic optimization.

It depends on what kind of scientific computing you are doing.

If you are doing traditional heavy computing, e.g. linear algebra, various optimizations, then you should not put your code in .Net framework, at least not suitable in F#. Because this is at the algorithm level, most of the algorithms must be coded in an imperative languages to have good performance in running time and memory usage. Others mentioned parallel, I must say it is probably useless when you doing low level stuff like parallel an SVD implementation. Because when you know how to parallel an SVD, you simply won't use an high level languages, Fortran, C or modified C(e.g. cilk) are your friends.

However, a lot of the scientific computing today is not of this kind, which is some kind of high level applications, e.g. statistical computing and data mining. In these tasks, aside from some linear algebra, or optimization, there are also a lot of data flows, IOs, prepossessing, doing graphics, etc. For these tasks, F# is really powerful, for its succinctness, functional, safety, easy to parallel, etc.

As others have mentioned, .Net well supports Platform Invoke, actually quite a few projects inside MS are use .Net and P/Invoke together to improve the performance at the bottle neck.

I don't think that you'll find a lot of reliable information, unfortunately. F# is still a very new language, so even if it were ideally suited for performance heavy workloads there still wouldn't be that many people with significant experience to report on. Furthermore, performance is very hard to accurately gauge and microbenchmarks are hard to generalize. Even within C++, you can see dramatic differences between compilers - are you wondering whether F# is competitive with any C++ compiler, or with the hypothetical "best possible" C++ executable?

As to specific benchmarks against C++, here are some possibly relevant links: O'Caml vs. F#: QR decomposition; F# vs Unmanaged C++ for parallel numerics. Note that as an author of F#-related material and as the vendor of F# tools, the writer has a vested interest in F#'s success, so take these claims with a grain of salt.

I think it's safe to say that there will be some applications where F# is competitive on execution time and likely some others where it isn't. F# will probably require more memory in most cases. Of course the ultimate performance will also be highly dependent on the skill of the programmer - I think F# will almost certainly be a more productive language to program in for a moderately competent programmer. Furthermore, I think that at the moment, the CLR on Windows performs better than Mono on most OSes for most tasks, which may also affect your decisions. Of course, since F# is probably easier to parallelize than C++, it will also depend on the type of hardware you're planning to run on.

Ultimately, I think that the only way to really answer this question is to write F# and C++ code representative of the type of calculations that you want to perform and compare them.

Here are two examples I can share:

Matrix multiplication: I have a blog post comparing different matrix multiplication implementations.
LBFGS

I have a large scale logistic regression solver using LBFGS optimization, which is coded in C++. The implementation is well tuned. I modified some code to code in C++/CLI, i.e. I compiled the code into .Net. The .Net version is 3 to 5 times slower than the naive compiled one on different datasets. If you code LBFGS in F#, the performance can not be better than C++/CLI or C#, (but would be very close).

I have another post on Why F# is the language for data mining, although not quite related to the performance issue you concern here, it is quite related to scientific computing in F#.

If I say "ask again in 2-3 years" I think that will answer your question completely :-)

First, don't expect F# to be any different than C# perf-wise, unless you are doing some convoluted recursions on purpose and I'd guess you are not since you asked about numerics.

Floating-point wise it is bound to be better than Java since CLR doesn't aim at cross-platform uniformity, meaning that JIT will go to 80-bits whenever it can. On the other side you don't control over that beyond watching the number of variables to make sure there's enough FP registers.

Vector-wise, if you scream loud enough maybe something happens in 2-3 yr since Direct3D is entering .NET as a general API anyway and C# code done in XNA runs on Xbox whihc is as close to the bare metal you can get with CLR. That still means that you'd need do so some intermediary code on your own.

So don't expect CUDA or even ability to just link NVIDIA libs and get going. You'd have much more luck trying that approach with Haskell if for some reason you really, really need a "functional" language since Haskell was designed to be linking-friendly out of pure necessity.

Mono.Simd has been mentioned already and while it should be back-portable to CLR it might be quite some work to actually do it.

There,s quite some code in a social.msdn posting on using SSE3 in .NET, vith C++/CLI and C#, come array blitting, injecting SSE3 code for perf etc.

There was some talk about running CECIL on compiled C# to extract parts into HLSL, compile into shaders and link a glue code to schedule it (CUDA is doing the equivalent anyway) but I don't think that there's anything runnable coming out of that.

A thing that might be worth more to you if you want to try something soon is PhysX.Net on codeplex. Don't expect it to just unpack and do the magic. However, ih has currently active author and the code is both normal C++ and C++/CLI and yopu can probably get some help from the author if you want to go into details and maybe use similar approach for CUDA. For full speed CUDA you'll still need to compile your own kernels and then just interface to .NET so the easier that part goes the happier you are going to be.

There is a CUDA.NET lib which is supposed to be free but the page gives just e-mail address so expect some strings attached, and while the author writes a blog he's not particularly talkative about what's inside the lib.

Oh and if you have the budget yo might give that Psi Lambda a look (KappaCUDAnet is the .NET part). Apparently they are going to jack up the prices in Nov (if it's not a sales trick :-)

Last I knew, most scientific computing was still done in FORTRAN. It's still faster than anything else for linear algebra problems - not Java, not C, not C++, not C#, not F#. LINPACK is nicely optimized.

But the remark about "your mileage may vary" is true of all benchmarks. Blanket statements (except mine) are rarely true.

Firstly C is significantly faster than C++.. So if you need so much speed you should make the lib etc in c.

With regards to F# most bench marks use Mono which is up to 2 * slower than MS CLR due t partially to its use of the boehm GC ( they have a new GC and LVVM but these are still immature dont support generics etc).

.NEt languages itself are compiled to an IR ( the CIL) which compile to native code as efficiently as C++. There is one problem set that most GC languages suffer in and that is large amounts of mutable writes ( this includes C++ .NET as mentioned above) . And there is a certain scientific problem set that requires this , these when needed should probably use a native library or use the Flyweight pattern to reuse objects from a pool ( which reduces writes) . The reason is there is a write barrier in the .NET CLR where when updating a reference field (including a box) it will set a bit in a table saying this table is modified . If your code consists of lots of such writes it will suffer.

That said a .NET app like C# using lots of static code , structs and ref/out on the structs can produce C like performance but it is very difficult to code like this or maintain the code ( like C) .

Where F# shines however is parralelism over immutable data which goes hand and hand with more read based problems. Its worth noting most benchmarks are much higher in mutable writes than real life applications.

With regard to floating point , you should use an alternative lib ( ie the .Net one) to the oCaml ones due to it being slow. C/C++ allows faster for lower precision which oCaml doesnt by default.

마지막으로 C #, F # 및 적절한 프로파일 링과 같은 고급 언어는 동일한 개발자 시간 동안 C 및 C ++보다 더 나은 성능을 제공 할 것이라고 주장합니다. 병목을 ac lib pinvoke 호출로 변경하면 중요한 영역에서 C와 같은 성능으로 끝납니다. 즉, 예산이 무제한이고 속도에 더 관심이 있다면 C보다 유지 관리가 (C ++가 아님) 갈 길입니다.

참고 URL : https://stackoverflow.com/questions/2752229/f-performance-in-scientific-computing

'Program Tip' 카테고리의 다른 글

자바 단위 테스트, 디렉토리 레이아웃 (0)	2020.10.28
모든 프로그래머가 알아야 할 .NET Framework 4의 사항 (0)	2020.10.28
TabControl의 탭 페이지 활성화 (0)	2020.10.28
디스플레이 없음없이 jQuery 페이드 아웃? (0)	2020.10.28
Excel에서 ISO8601 날짜 / 시간 (TimeZone 포함) 구문 분석 (0)	2020.10.28

현재글과학 컴퓨팅의 F # 성능

programtip

과학 컴퓨팅의 F # 성능

과학 컴퓨팅의 F # 성능

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

티스토리툴바

과학 컴퓨팅의 F # 성능

과학 컴퓨팅의 F # 성능

'Program Tip' 카테고리의 다른 글

'Program Tip'의 다른글

관련글

티스토리툴바