Your observation about native formats may not be true in every instance, but that is the reason I suggest to application software developers to use the vendor SDK to improve I/O loading speeds as well as other features.smacl wrote: ↑Mon Mar 25, 2019 6:08 pm Hi Dennis,
Great work, those speeds are impressive. It would be interesting taking the same data into a range of different software and seeing the comparative performance. It is also worth considering that some software, e.g. potree and anything else building out of core data structures, compromise initial import speed to improve performance scalability across huge data
....
What the above tells me is that point cloud I/O is considerably more CPU bound than disk bound when translating from a point cloud data transfer format, yet opening native formats for any given program typically is not.
Threadripper
-
- V.I.P Member
- Posts: 958
- Joined: Sun Nov 01, 2009 11:18 pm
- 14
- Full Name: Dennis Hirota
- Company Details: Sam O Hirota Inc
- Company Position Title: President
- Country: USA
- Linkedin Profile: Yes
- Location: Hawaii, USA
- Has thanked: 87 times
- Been thanked: 379 times
Re: Threadripper
-
- V.I.P Member
- Posts: 958
- Joined: Sun Nov 01, 2009 11:18 pm
- 14
- Full Name: Dennis Hirota
- Company Details: Sam O Hirota Inc
- Company Position Title: President
- Country: USA
- Linkedin Profile: Yes
- Location: Hawaii, USA
- Has thanked: 87 times
- Been thanked: 379 times
Re: Threadripper
I did the same imported file (516M pts, 17GB LAS) into Recap Pro to see how Autodesk was processing it. It looked multi-threaded at times, but appeared the usual single-threaded processing. The import, indexing, launch, and display (a few seconds) took 94 minutes without registration since they were registered using Riegl's GNSS RTK. A non-productive 90 minute wait compared to the Sequoia import and visualization in the same 4K display.
I spoke to the last remaining person that I know at Autodesk to see when we might see an improvement in I/O speed. The response was that Autodesk is moving to the cloud with their partner applications, so do not expect to see local processing with what we are discussing in this thread any time soon.
It looks like one may have to become a software developer to load and process your information locally faster, or get Shane to do it for us.
-
- V.I.P Member
- Posts: 537
- Joined: Mon Jun 16, 2014 1:45 pm
- 9
- Full Name: James Worrell
- Company Details: Bennett and Francis
- Company Position Title: Director
- Country: Australia
- Linkedin Profile: Yes
- Location: Brisbane, Queensland, Australia
- Has thanked: 14 times
- Been thanked: 87 times
- Contact:
Re: Threadripper
If I keep talking about it - it might become a reality ;-p
We need to move to multi-node processing. Things like tiling, meshing, machine learning/AI workloads, pre-processing, noise filtering, initial conversion/ingest, publishing conversion/compression, running constraints .. these are all workloads that would benefit with multi-node engines each given discrete units of work.
ContextCapture Centre Edition, Sequoia - these systems are using queueing and engines running on multiple boxes. There is your inspiration.
RAID works - redundant array of INEXPENSIVE disks .. we really want the same for poing cloud processing. A basic box with reasonable GPU with some solid state drives and a decent whack or RAM (mostly for caching clouds) - and a smart queue manager - and throughput with idle boxes could be enormous.
Your office of 20 workstations suddenly gets a whole lot more productive if working as a single engine. Naturally you would want priorty queing - so ingest might be lower in the queue than say running constraints with meshing higher up etc.
As for the engine - Microsoft Project Orleans would be interesting. 10gbit/sec networking would be help - or at least dual links.
We need to move to multi-node processing. Things like tiling, meshing, machine learning/AI workloads, pre-processing, noise filtering, initial conversion/ingest, publishing conversion/compression, running constraints .. these are all workloads that would benefit with multi-node engines each given discrete units of work.
ContextCapture Centre Edition, Sequoia - these systems are using queueing and engines running on multiple boxes. There is your inspiration.
RAID works - redundant array of INEXPENSIVE disks .. we really want the same for poing cloud processing. A basic box with reasonable GPU with some solid state drives and a decent whack or RAM (mostly for caching clouds) - and a smart queue manager - and throughput with idle boxes could be enormous.
Your office of 20 workstations suddenly gets a whole lot more productive if working as a single engine. Naturally you would want priorty queing - so ingest might be lower in the queue than say running constraints with meshing higher up etc.
As for the engine - Microsoft Project Orleans would be interesting. 10gbit/sec networking would be help - or at least dual links.
- smacl
- Global Moderator
- Posts: 1409
- Joined: Tue Jan 25, 2011 5:12 pm
- 13
- Full Name: Shane MacLaughlin
- Company Details: Atlas Computers Ltd
- Company Position Title: Managing Director
- Country: Ireland
- Linkedin Profile: Yes
- Location: Ireland
- Has thanked: 627 times
- Been thanked: 657 times
- Contact:
Re: Threadripper
It certainly makes a lot of sense and will no doubt happen, but for us poor code monkeys it means rewriting a lot of complex code in many cases. Some processes, such as photo stitching and rendering video naturally break down into separate computational units as they're working with locally independent data. Other processes, such as building spatial indices (e.g. TIN meshes and octrees) are more difficult to solve using a 'divide and conquer' mechanisms as the data is heavily interdependent and likely to get modified by multiple processes at the same time. In this scenario the synchronization and communications can actually make the multi-threaded solution slower than the single threaded one if not very carefully implemented and tested.jamesworrell wrote: ↑Tue Mar 26, 2019 6:38 amIf I keep talking about it - it might become a reality ;-p
Local cloud computation clusters are definitely the near future, but we still have a bit to go yet. For most people at this point in time, I recommend multiple mid-range workstations rather than fewer very expensive ones, as the processing time is only an issue it if is tying up the operator. If the operator can remain productive using another workstation, this isn't a problem. Personally, I find remote control over the LAN a great benefit here and am typically using three PCs at a time. I like what sequoia are doing in terms of bundling a couple of compute licenses in with a main license and may nab this idea for SCC, as it doesn't seem entirely reasonable to lock up an expensive license for importing/exporting/publishing and similar grunt background work.
-
- V.I.P Member
- Posts: 1237
- Joined: Mon Jan 04, 2010 7:51 pm
- 14
- Full Name: Jed Frechette
- Company Details: Lidar Guys
- Company Position Title: CEO and Lidar Supervisor
- Country: USA
- Linkedin Profile: Yes
- Location: Albuquerque, NM
- Has thanked: 62 times
- Been thanked: 220 times
- Contact:
Re: Threadripper
Stop talking and start doing. We're in the early stages of getting this setup internally. Everything you need to do it is available right now. All you need is a job que manager and processing software that can do something useful in batch mode.jamesworrell wrote: ↑Tue Mar 26, 2019 6:38 am If I keep talking about it - it might become a reality ;-p
We need to move to multi-node processing.
I suppose each application could develop its own job que system, but that seems very inefficient compared to using a general purpose job manager to orchestrate batch jobs. Job managers like Deadline and OpenCue already have all the tools needed to prioritize jobs based on certain criteria, make sure they go to compute nodes with appropriate hardware, and retry them when they fail, etc.
The other component is having processing software where you can say:
Code: Select all
program --do --something
Although I doubt they made the design decision with distributed processing in mind, one of the things I noticed while testing Scene 2019 is that that the multithreaded preprocessing they've introduced seems to be implemented as a bunch of subprocess calls to individual worker programs. If they exposed a scripting interface that allowed the user to do the same thing it would work really well with this type of distributed compute model.
Jed
- smacl
- Global Moderator
- Posts: 1409
- Joined: Tue Jan 25, 2011 5:12 pm
- 13
- Full Name: Shane MacLaughlin
- Company Details: Atlas Computers Ltd
- Company Position Title: Managing Director
- Country: Ireland
- Linkedin Profile: Yes
- Location: Ireland
- Has thanked: 627 times
- Been thanked: 657 times
- Contact:
Re: Threadripper
Additionally, you could look to see if your program has an automation interface and drive it through your preferred scripting language, or simply use an automation tool such as AutoIT, see https://www.autoitscript.com/site/autoit/ The only problem with chaining programs together on the command line is that they tend to start and finish by reading your point cloud from a disk. In some cases this can be a very big overhead, in which case scripting makes more sense as your point cloud stays in memory in its native format.jedfrechette wrote: ↑Tue Mar 26, 2019 2:44 pmThe other component is having processing software where you can say:
Code: Select all
program --do --something
- Jason Warren
- Administrator
- Posts: 4224
- Joined: Thu Aug 16, 2007 9:21 am
- 16
- Full Name: Jason Warren
- Company Details: Laser Scanning Forum Ltd
- Company Position Title: Co-Founder
- Country: UK
- Skype Name: jason_warren
- Linkedin Profile: No
- Location: Retford, UK
- Has thanked: 443 times
- Been thanked: 246 times
- Contact:
Re: Threadripper
jedfrechette wrote: ↑Tue Mar 26, 2019 2:44 pm Although I doubt they made the design decision with distributed processing in mind, one of the things I noticed while testing Scene 2019 is that that the multithreaded preprocessing they've introduced seems to be implemented as a bunch of subprocess calls to individual worker programs. If they exposed a scripting interface that allowed the user to do the same thing it would work really well with this type of distributed compute model.
A `Backburner` for Scene 2019 would be cool...
Jason Warren
Co_Founder
Dedicated to 3D Laser Scanning
LaserScanningForum
Co_Founder
Dedicated to 3D Laser Scanning
LaserScanningForum
-
- V.I.P Member
- Posts: 1037
- Joined: Tue Mar 29, 2011 7:39 pm
- 13
- Full Name: Scott Page
- Company Details: Scott Page Design- Architectural service
- Company Position Title: Owner
- Country: USA
- Linkedin Profile: No
- Location: Berkeley, CA USA
- Has thanked: 206 times
- Been thanked: 78 times
- Contact:
Re: Threadripper
I just watched FARO's recent webinar, that I'd missed. Part of it covered core numbers and speed. I did learn some new things from the 60min webinar, so I recommend it, technical glitches and all. ~ScottJason Warren wrote: ↑Tue Mar 26, 2019 5:49 pmA `Backburner` for Scene 2019 would be cool...jedfrechette wrote: ↑Tue Mar 26, 2019 2:44 pm Although I doubt they made the design decision with distributed processing in mind, one of the things I noticed while testing Scene 2019 is that that the multithreaded preprocessing they've introduced seems to be implemented as a bunch of subprocess calls to individual worker programs. If they exposed a scripting interface that allowed the user to do the same thing it would work really well with this type of distributed compute model.
Leveraging the SCENE evolution (webinar) -sign-in required
https://insights.faro.com/construction- ... gnId=3109
-
- V.I.P Member
- Posts: 537
- Joined: Mon Jun 16, 2014 1:45 pm
- 9
- Full Name: James Worrell
- Company Details: Bennett and Francis
- Company Position Title: Director
- Country: Australia
- Linkedin Profile: Yes
- Location: Brisbane, Queensland, Australia
- Has thanked: 14 times
- Been thanked: 87 times
- Contact:
Re: Threadripper
My quals are actually in information technology although I work for a surveying firm - go figure .. have coded for 30+ years. Meh, whilst it would be really interesting, I'm not in the startup game! Certainly a fun time to work on the code side of point clouds - meshes, AI, cloud compute, open graphics (eg 3d Tiles), decent wide area networking (5G), photogrammetry, big data (eg consider the full stream of data from Toyota or other laser equipped cars).
https://www.spatialsource.com.au/latest ... -bandwagon
Vendors should take a bit of a leaf out of Microsoft's book and make everything API-first. Windows Server is a great example with basically everything surfaced within powershell.
- lastools
- V.I.P Member
- Posts: 144
- Joined: Tue Mar 16, 2010 3:06 am
- 14
- Full Name: Martin Isenburg
- Company Details: rapidlasso - fast tools to catch reality
- Company Position Title: creators of LAStools and LASzip
- Country: Germany
- Skype Name: isenburg
- Linkedin Profile: Yes
- Been thanked: 1 time
- Contact:
Re: Threadripper
Hello from @LAStools,
Yep. LASzip decompression on a single core is typically CPU bound (unless the data comes straight across the Internet). But LASzip was designed with parallelism for decompression in mind such that the use of multi-cores when decompressing a single LAZ file is theoretically possible. It's just a matter of writing (and then maintaining) the code. Each "chunk" of 50000 points is compressed independently from all other chucks, so that 4, 8, 16, 48, or 128 chunks could be decompressed simultaneously if you have fast file read access and many cores. However, someone (you?) would need to implement a multi-threaded decompressor. All the code is open source so go right ahead ... (-:
Regards,
Martin @rapidlasso
Yep. LASzip decompression on a single core is typically CPU bound (unless the data comes straight across the Internet). But LASzip was designed with parallelism for decompression in mind such that the use of multi-cores when decompressing a single LAZ file is theoretically possible. It's just a matter of writing (and then maintaining) the code. Each "chunk" of 50000 points is compressed independently from all other chucks, so that 4, 8, 16, 48, or 128 chunks could be decompressed simultaneously if you have fast file read access and many cores. However, someone (you?) would need to implement a multi-threaded decompressor. All the code is open source so go right ahead ... (-:
Regards,
Martin @rapidlasso