Robert Graham gives a very interesting perspective to performance and scalability.
Performance is the time taken to get a job done. For example, for a webserver it can be number of requests per second. Because if the number of requests per second is more, then the time taken to get a job done is obviously less.
Scalability is the ability to handle so many jobs with out affecting the performance that is measured. For example, if the measured performance is let’s say one second per job (request), then if the system can handle 1000 jobs at one second per job, and adding more jobs to the system increases the time taken per job (for whatever reasons), then we can say that the system is scalable upto 1000 jobs. Because, the measured performance drops down after 1000 jobs, we can say that the system is not scalable beyond 1000 jobs at a time (assuming that performance cannot be compromised with)
The average time taken to get a job done may come down for many reasons when more jobs are added. For example, the time a job spends in the queue to be processed is one of the main factors that affect the response time or the throughput of the system. What can be done to improve the scalability? We can add more cores/more threads etc.
For example, Apache runs one thread per connection. So, having 10000 connections active with the webserver may not be feasible because the system may not allow Apache to create so many threads. For Apache to handle 10000 simultaneous connections, it has to revamp its architecture.
Generally when people talk about scalability, their intention is to keep the performance fixed at a desired level and still able to handle more load.