20 photos you won’t believe they are in Egypt !

The Amazing Egypt 🙂

Aubergine

20.

siwa Siwa – سيوة

19.

dahab Dahab – دهب

18.

gara Al-Gara cave, Al-Farafra – كهف الجارة, الفرافرة

17.

marsa 3lam red sea Marsa Alam, Red sea – مرسى علم, البحر الأحمر

16.

santa Saint Catherine mountain, Sinai – جبل سانت كاترين, سيناء

15.

aswan Aswan – أسوان

14.

sinai Sinai – سيناء

13.

colored canyon, Nuwebai - الوادي الملون, نويبع colored canyon, Nuwebai – الوادي الملون, نويبع

12.

fiord 5aleeg taba Gulf of Fiord, Taba – خليج فيورد, طابا

11.

gabal mousa Mount Sinai – جبل طور سيناء

10.

mousa Mount Sinai – جبل طور سيناء

9.

nuba Nubia – النوبة

8.

ant Saint Catherine – سانت كاترين

7.

qaron lake faium Qarun lake, Fayoum – بحيرة قارون ,الفيوم

6.

saint Saint Catherine mountain – جبل سانت كاترين

5.

saintt Saint Catherine – سانت كاترين

4.

siwaa Siwa – سيوة

3.

white desert White desert, Farafra – الصحراء البيضاء ,الفرافرة

2.

white dessert White desert, Farafra – الصحراء البيضاء, الفرافرة

1.

معبد فيلة Philae temple, Aswan – معبد فيلة, أسوان

View original post

Scale-out vs Scale-up storage

Scale-out storage is a network-attached storage (NAS) architecture in which the total amount of disk space can be expanded as needed, even if some of the new drives reside in other storage arrays. If and when a given array reaches its storage limit, another array can be added to expand the system capacity.Scale-out storage differs conceptually from the older scale-up approach.

In a scale-up storage system, new hardware can be added and configured as the need arises. The main advantage of the scale-out approach is cost containment, along with more efficient use of hardware resources.

Before scale-out storage became popular, enterprises often purchased storage arrays much larger than needed, to ensure that plenty of disk space would be available for future expansion. If that expansion never occurred or turned out less than expected, much of the originally purchased disk space went to waste. With the scale-out architecture, the initial investment can be more modest; if the storage requirement expands beyond expectations, new arrays can be added as needed, without limit.

In theory, scale up storage appeals because the data center can start small and add capacity and performance as needed. Do these theoretical advantages apply to the use cases in which storage is most commonly deployed; databases and virtualization?

The problem is that scale out storage systems are more expensive to build, implement and maintain. There are many use cases for scale out storage; it is most ideal for situations where meeting a high capacity demand takes precedence over a performance demand.

In scale up architectures, all the performance and capacity potential of the storage system are provided in a single controller unit, typically upfront. Current scale out architectures provide performance and capacity as storage nodes (servers with internal capacity) are added to the infrastructure. These architectures have their ideal use case depending on performance and capacity demands. As stated above, the appeal of a scale out storage system is that performance and capacity can be added incrementally as needed.

Should we scale performance and capacity at same time?

Performance and capacity operate on different vectors and are not necessarily linked together.

In a scale up architecture, all the performance is delivered with the unit upfront where capacity is added, as needed, to the system. While performance can’t necessarily be scaled, it is delivered in its entirety up front and essentially is a fixed cost with no surprises.

One side effect of scale out storage is that the nodes typically need to be homogeneous. Each node needs to have a similar processor chip set and must leverage the exact same size SSDs. A scale up system could intermix SSDs of different sizes and even different types as new flash technology becomes available.

Is Scale-up performance really a big issue?

While the scale up lack of performance scaling is often cited by scale out advocates, the reality is that the overwhelming majority of applications can’t push current scale up systems. Additionally, some scale up systems can do a periodic controller unit upgrade. So as processing technology continues to advance, the head can be upgraded to offer more performance to the existing storage shelves. As a result, there actually is some performance scaling capability in scale up systems.

Some scale up vendors have the ability to add a scale out design to their architecture if the need ever becomes relevant. It is hard to imagine that processing technology would fall behind storage I/O performance, but if it were to happen, this is the ideal way to scale; scale up completely first, then start scaling out if performance exceeds the capabilities of the current processors.

Is Scale-out cheap or at least cheaper than Scale-up?

In storage there are two hard costs to be concerned with. The first is the initial purchase cost. In theory, this should favor a scale out storage system since it can start small. But again, current scale out designs need to have an initial cluster created or they need to deliver high availability in each node. Counting on the cluster for HA requires the purchase of potentially more performance and capacity than the customer needs because more nodes are needed initially. Building HA into each node requires added expense per node, probably equivalent to the scale up storage system.

A case could be made that a storage node could be delivered less expensively than a scale-up controller unit. This would require that the first option be chosen, that nodes are delivered with no HA and require a quorum to do that. Again, buying multiple nodes eliminates that advantage and it leads to node sprawl because nodes have to be added to address performance issues, not capacity issues.

At a minimum, the initial cost difference between the scale up and scale out implementation types may be a wash. When implementation time or time to data is factored into that equation then scale up systems have a clear advantage. It simply takes longer to install more pieces and get those pieces working together.

The second cost, incremental cost, is an area where scale out storage should have an advantage. But again the limits of current scale out designs tell a different story. The only way a scale out All-Flash system would have a cost advantage is if the need for expansion is being driven by performance instead of capacity. But as mentioned earlier, the overwhelming majority of flash vendors and customers report that they can’t exceed the performance of a single box. So any scenario that would justify a scale out deployment will probably not happen in most data centers.

Conclusion: 

A theoretical advantage to scale out is how simple it is to expand. “Like adding Lego blocks” is the common analogy. However, current scale out systems don’t actually “snap” together, they are a series of individual servers with clustering software that must be carefully networked together for maximum performance and availability. This combination makes initial implementation more complex and it makes ongoing upgrades something that needs to be carefully planned.

Scale up architectures are actually relatively simple. All the capabilities, at least from a performance perspective, are delivered upfront. There is nothing to “click” in. Capacity can be added incrementally either by inserting drives into the existing shelf or adding shelves to the existing storage controller. While adding shelves also requires planning, the capacity per shelf is high and as long as the scale-up array can do non-disruptive upgrades, no down time should result.

Scale up storage, while having the disadvantage of buying all the performance capabilities up front, has the dual advantage of more incremental capacity expansion and a less complex backend infrastructure. And leveraging data in-place storage controller upgrades can easily eliminate the lack of performance scalability.

source: Storage Switzerland, EMC, Dell, and 1010data

 

Bringing Unix Philosophy to Big Data

The Unix philosophy fundamentally changed the way we think of computing systems: instead of a sealed monolith, the system became a collection of small, easily understood programs that could be quickly connected in novel and ad hoc ways. Today, big data looks much like the operating systems landscape in the pre-Unix 1960s: complicated frameworks surrounding by a priesthood that must manage and protect a fragile system.

Bryan Cantrill in one of the best Big Data talks; describes and demonstrates Manta, a new object store featuring in situ compute that brings the Unix philosophy to big data, allowing tools like grep, awk and sed to be used in map-reduce fashion on arbitrary amounts of data describing both the design challenges in building Manta (a system built largely in node.js)

The Duel: Timo Boll vs. KUKA Robot

When robot maker Kuka announced that it would be pitting its Agilus robot against table tennis star Timo Boll last month, we expected a fair fight. Conditioned professional human athlete against a cold, merciless, bright orange mechanical arm on a small wooden field, both wielding the same armament: a miniature bat. Boll was once ranked world number one, but Kuka claimed its robot was the quickest in the world. The Agilus was named for its lightning-fast movements, and would presumably be able to rapidly spin into position and return Boll’s balls from anywhere on the table.

Those hoping for a titanic struggle between human and robot will need to wait: Kuka posted the promised video today to muted reactions. The match appears rigged. Boll drops shots to the robotic arm as he hurtles carelessly around the arena, and puts return shots in easy reach of his foe. Soon the table tennis pro is down 6-0. But Boll has a Hollywood-style epiphany — perhaps realizing he’s playing against a programmable arm — and strikes back to take the game with a powerful smash that puts the ball over the top of his opponent.

Meanwhile, the camera crew is more focused on providing Michael Bay-esque slow-motion shots of the action, cutting in and out of rallies in progress to preserve the narrative. A making-of video explains how the crew were able to get the shots — by standing next to the table and sliding a giant camera in front of Boll’s face — but steers clear of showing unedited footage of the match in progress. A match like this could’ve been an intriguing window into future questions of sportsmanship and competitive entertainment; as it is, it’s nothing more than a glorified commercial.

Fundamental Laws of Parallel Computing .. theoretical post but important to understand!!

Amdahl’s and Gustafson’s law as well as the equivalence of the two laws of parallelism that have influenced the research and practice of parallel computing during the past decades. How Amdahl’s law can be applied to multi-core chips and what implications it can have on architecture and programming model research .. this is what I am trying to explain here in this article !!

Amdahl’s Law: Amdahl’s law is probably the best known and most widely cited law defining the limits of how much parallel speedup can theoretically be achieved for a given application. It was coined by Amdahl in 1967 [1], as a supportive argument for the continuing scaling of the performance of a single core rather than aiming for massively parallel systems.

In its traditional form it provides an upper bound for potential speedup of a given application, as function of the size of the sequential fraction of that application. The
mathematical form of the law is     Speedup = 1/(1 − P + P/N)

where P is the fraction of the parallelized portion of the application and N is the number of available cores. It is immediately clear that, according to Amdahl’s law, any application with a sequential fraction will have an upper bound to how fast it can run, independent of the amount of cores that are available for its execution.

When N approaches infinite, this upper bound will be    Speedup = 1/(1−P)                    The speedup curve for various levels of parallelism is shown in the following figure:blog

The implications of Amdahl’s law were profound. Essentially it predicted that the focus shall be on getting single cores run faster—something that was within reach for the past four decades—instead of the costlier approach of parallelizing existing software which would have anyway limited the scalability, in accordance with Amdahl’s law, as long as some part of the software remained sequential. This law also triggered groundbreaking research that resulted in innovations such as out of order execution, speculative execution, pipelining, dynamic voltage and frequency scaling and, more recently, embedded DRAM. All these techniques are primarily geared towards making single threaded applications run faster and consequently, push the hard limit set by Amdahl’s law.

As multi-core chips were becoming mainstream, Amdahl’s law had the clear merit of putting the sequential applications—or sequential portions of otherwise parallelized applications—into the focus. A large amount of research is dealing with auto-parallelizing existing applications; the research into asymmetric multi-core architectures is also driven by the need to cater for both highly parallelized applications and applications with large sequential portions.

However, the shortcomings of Amdahl’s law: it assumes that the fraction of the sequential part of an application stays constant, no matter how many cores can be used for that application. This is clearly not always the case: more cores may mean more data parallelism that could dilute the significance of the sequential portion; at the same time an abundance of cores may enable speculative and run-ahead execution of the sequential part, resulting in a speedup without actually turning the sequential code into a parallel one. The first example lead to Gustafson’s law later, while the second one to a new way of using manycore chips.

Amdahl’s Law for Many-core Chips: 

The applicability of Amdahl’s law to many-core chips has first been explored, on theoretical level, in Ref. [2]. It considered three scenarios and analyzed the speedup that
can be obtained, under the same assumptions as in the original form of Amdahl’s law,
for three different types of multi-core chip architectures. The three scenarios were:

• Symmetric multi-core: all cores have equal capabilities
• Asymmetric multi-core: the chip is organized as one powerful core along with several, simpler cores, all sharing the same ISA
• Dynamic multi-core: the chip may act as one single core with increased performance or as a symmetric multi-core processor; this is a theoretical case only so far, as no such chip was designed

In order to evaluate the three scenarios, a simple model of hardware is needed. Ref [2] assumes that a chip has a certain amount of resources expressed through an abstract quantity called base processing unit ( BCU). The simplest core that can be built requires at least one BCU and speedup is reported relative to execution speed on a core built with one BCU; multiple BCUs can be grouped together—statically at chip design time or dynamically at execution time as in the dynamic case—in order to build cores with improved capabilities. In Ref. [2], the performance of the core built from n BCUs was approximated using a theoretical function perf(n), expressed as the square root of n. It’s clearly just an approximation to express the diminishing returns from adding more transistors (BCUs in our terminology) to a given core design.

In the symmetric multi-core processor case, the results were equivalent to the
original scenario outlined by Amdahl’s law—essentially, using homogeneous architectures, for the canonical type of applications, the speedup will be limited by the
sequential fraction.
The more interesting results were obtained for the other two cases. In the single
ISA asymmetrical multi-core processor case, the number of cores is reduced in
order to introduce one more powerful core along a larger set of simpler cores.

The speedup curves for programs with different degrees of parallelism on a chip with 64 BCUs, organized into different asymmetric configurations. There are two conclusions that may be drawn from this scenario:

• The speedup is higher than what Amdahl’s classical law predicts: the availability of the more complex (faster) core makes it possible to run the sequential part of the program faster.
• There’s a sweet spot for each level of parallelism beyond which the performance will decline; for example for the 95% parallel type of application this sweet spot is reached with 48 equal cores and one core running at four times higher speed.

The implication of this law is that asymmetric chip designs can help mitigate the impact of Amdahl’s law; however the challenge is that different applications may have their sweet spots at different configurations. This is what makes the third case, the fully dynamic scenario really interesting. In this scenario the HW can either act as a large pool of simple cores—when the parallel part is executed—or as a single, fast core with performance that scales as function of the number of simple cores it replaces.

The limitations set by Amdahl’s law. However, we don’t yet know how to build such a chip, thus at first sight this application of Amdahl’s law seems to be a mere theoretical exercise. There are however two techniques that could, in theory, make N cores look like
behaving as one single powerful core. The first, albeit with a limited scope of applicability,
is dynamic voltage and frequency scaling, applied to one core, while the others are switched off (or in very low power mode); the second, with a theoretically better scalability is speculative, run-ahead execution.

The run-ahead speculative execution aims at speeding up execution of single threaded applications, by speculatively executing in advance the most promising branches of execution on separate cores; if the speculation is successful, the result is earlier completion of sequential portions and thus a speedup of execution. The grand challenge of speculative execution is the accuracy of prediction: too many mis-predictions decrease the ratio of useful work per power consumed, making it less appealing while delivering a limited amount of speedup. On the other hand, it does scale with the number of cores, as more cores increase the possibility of speculating on the right branch of execution. The key to efficient speculation is to dramatically strengthen the semantic link between the execution environment and the software it is executing.

Amdahl’s law, along with Gustafson’s law that will be introduced in another post, is still at
the foundation of how we reason about computer architecture and parallel programming in general and the architecture and programming of many-core chips in special. At the core of it, it defines the key problem that needs to be addressed in order to leverage on the increased computational power of many-core chips: the portions of the program that are designed to execute within one single thread.

References:

  1. Amdahl G (1967) Validity of the Single Processor Approach to Achieving Large-Scale Computing Capabilities. American Federation of Information Processing Societies (AFIPS) Conference Proceedings 30:483-485
  2. Hill M D, Marty M R (2008) Amdahl’s Law in the Multi-core Era. IEEE Computer
  3. András Vajda, “Programming Many-Core Chips” Springer, 2011, 237 p, hardcover ISBN-10: 1441997385, ISBN-13: 978-1441997388

Never stop reading this .. Linux is obsolete – Andrew S. Tanenbaum and Linus Torvalds Debate

From the kernel architecture perspective there are two main categories of operating
systems, with a number of variations in between. Micro-kernel operating systems are characterized by running most of their services in user mode as user processes and keeping only the very basic scheduling and hardware management mechanisms in the protected kernel space. Monolithic kernels on the other hand are characterized by incorporation of most of the operating system services into the kernel space, sharing the same memory space.

The fundamental principle micro-kernel designs aim to follow is the principle of separation of mechanism and policy. In a micro-kernel, the kernel’s role is to provide the minimal support for executing processes, inter-process communication and hardware management. All the other services—and indeed, policies—are then implemented as servers in user space, communicating with applications and with other components through inter-process communication mechanisms, usually messages. Micro-kernel based operating systems are usually characterized by strong modularity, low kernel footprint and increased security and robustness, as a consequence of the strong isolation of the components and the execution of most services in user-space. On the other hand, micro-kernels traditionally require a higher overhead for access to operating services, as there will be more context switches between user-space and kernel-space modes.

Monolithic kernels excel primarily at speed as all the services of the operating system execute in kernel mode and hence can share memory and perform direct function calls, without the need for using inter-process communication mechanisms, such as message passing. As most of the OS functions are packed together, monolithic kernels tend to be bigger, more complex and hence more difficult to test and maintain, also requiring careful, holistic approach to overall design. There have been several long debates around these two different approaches, most famously between Linus Torvalds, the inventor and gate keeper of Linux (on the monolithic side) and Andrew Tannenbaum, a respected professor and author (advocate of micro-kernel architectures), dating back to 1992 with a revived exchange in 2006. The essence of the debate revolves around maintainability, security, efficiency and complexity of operating systems, with valid arguments brought forward by both camps. The argument for micro-kernels is primarily based on the emphasis on reliability and security, supported by as little data sharing as possible and strict decomposition and isolation of operating system components. The counter-argument brought forward by Torvalds builds on the fact that algorithm design for distributed, share-nothing systems is inherently more complex and hence micro-kernels, with their emphasis on isolation would suffer on the maintainability and performance front.

If you are a Linux enthusiast but have never heard of this debate then I think you have missed something really interesting. The basis of this debate were the allegations made by Andrew S. Tanenbaum ( an American computer scientist and professor of computer science at the Vrije Universiteit, Amsterdam in the Netherlands. Also best known as the author of MINIX, a free Unix-like operating system for teaching purposes) on Linux portability and kernel architecture in general. The debate was started in 1992 on the Usenet discussion group comp.os.minix. It was a heated debate that was joined by Linus Torvalds (the creator of Linux) himself and many other hackers/developers.
Here are some of the excerpts of that discussion (from google groups) :

Tanenbaum :
I was in the U.S. for a couple of weeks, so I haven’t commented much on
LINUX (not that I would have said much had I been around), but for what it is worth, I have a couple of comments now.

As most of you know, for me MINIX is a hobby, something that I do in the evening when I get bored writing books and there are no major wars, revolutions, or senate hearings being televised live on CNN.  My real job is a professor and researcher in the area of operating systems.

Linus :
You use this as an excuse for the limitations of minix? Sorry, but you loose: I’ve got more excuses than you have, and Linux still beats the pants of minix in almost all areas.  Not to mention the fact that most of the good code for PC minix seems to have been written by Bruce Evans.

Re 1: you doing minix as a hobby – look at who makes money off minix, and who gives Linux out for free.  Then talk about hobbies.  Make minix freely available, and one of my biggest gripes with it will disappear.  Linux has very much been a hobby (but a serious one: the best type) for

me: I get no money for it, and it’s not even part of any of my studies in the university.  I’ve done it all on my own time, and o n my own machine.

Re 2: your job is being a professor and researcher: That’s one hell of a good excuse for some of the brain-damages of minix. I can only hope (and assume) that Amoeba doesn’t suck like minix does.

 

Tanenbaum :

I think it is a   gross error to design an OS for any specific architecture, since that is  not going to be around all that long.

Linus :

“Portability is for people who cannot write new programs”
-me, right now (with tongue in cheek)

This war of words was re-ignited in 2006 after Tanenbaum wrote a cover story for Computer magazine titled “Can We Make Operating Systems Reliable and Secure?. Here is what Wikipedia reports about it :

This subject was revisited in 2006 after Tanenbaum wrote a cover story for Computer magazine titled “Can We Make Operating Systems Reliable and Secure?”.[3] While Tanenbaum himself has mentioned that he did not write the article to renew the debate on kernel design,[4] the juxtaposition of the article and an archived copy of the 1992 debate on the technology site Slashdot caused the subject to be rekindled.[5] Torvalds posted a rebuttal of Tanenbaum’s arguments via an online discussion forum,[6] and several technology news sites began reporting the issue.[7] This prompted Jonathan Shapiro to respond that most of the field-proven reliable and secure computer systems use a more microkernel-like approach.

Here are some of the important links that would make this whole debate an interesting read :

Cannot move file to trash, do you want to delete immediately?

When nautilus trashes something, it doesn’t want to have to move it across partitions. This is because it takes a lot longer to move between partitions, and then if you remove the partition then the trash has no place to restore to.

This isn’t a problem on drives which don’t have a seperate home partition because then nautilus isn’t sending the files to a different partition by putting them in ~/.local/share/Trash

Anywhere that is on the same partitions as your home directory is sent to ~/.local/share/Trash. This works across the entire root partition on setups which only have one partition.

On any other partition nautilus will make a .Trash-1000 folder on the root of the partition, then send all trashed files into that. This works rather well on external drives that you have full read/write access to, though it won’t work if you don’t have write permission to the root of the drive.

Because your / partition isn’t the same as your /home partition, and a .Trash-1000 doesn’t exist with write permission at the root of your system, nautilus will fail to trash files. Thus the delete key won’t work and a trash action won’t be available in the menus.

You could try using a root nautilus and deleting one file so that the /.Trash-1000 folder is created correctly, then using sudo chmod -R 777 /.Trash-1000 to give yourself permission to access a trash on the / filesystem. I cannot confirm that this will work though you could give it a try, this should be working fine